perf(hstr): trim atom hash and benchmark hashset lookups#11874
perf(hstr): trim atom hash and benchmark hashset lookups#11874hardfist wants to merge 5 commits into
Conversation
|
Merging this PR will improve performance by 2.47%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | es/large/minify/libraries/d3 |
493.1 ms | 479.8 ms | +2.79% |
| ⚡ | es/large/all/es2018 |
112.8 ms | 110.2 ms | +2.28% |
| ⚡ | es/large/all/es3 |
180.4 ms | 176.8 ms | +2.03% |
| ⚡ | es/large/minify/libraries/lodash |
142 ms | 138.4 ms | +2.65% |
| ⚡ | es/large/minify/libraries/vue |
177.9 ms | 174.3 ms | +2.06% |
| ⚡ | es/oxc/benches/assets/parser.ts/sourceMap=false/reactDev=false |
48.8 ms | 47.7 ms | +2.22% |
| ⚡ | es/minifier/libs/d3 |
377.4 ms | 365.5 ms | +3.26% |
| ⚡ | es/minifier/libs/jquery |
89.8 ms | 87.8 ms | +2.26% |
| ⚡ | es/minifier/libs/moment |
57.8 ms | 56.5 ms | +2.4% |
| ⚡ | es/minifier/libs/react |
18.5 ms | 18.2 ms | +2.03% |
| ⚡ | es/minifier/libs/lodash |
108.3 ms | 105.1 ms | +3.1% |
| ⚡ | es/minifier/libs/vue |
135.7 ms | 132.4 ms | +2.46% |
| ⚡ | es/resolver_with_hygiene/typescript |
757.7 ms | 738.4 ms | +2.61% |
| 🆕 | compact_str[1024] |
N/A | 741.1 ns | N/A |
| 🆕 | compact_str[128] |
N/A | 741.1 ns | N/A |
| 🆕 | compact_str[32] |
N/A | 770.3 ns | N/A |
| 🆕 | compact_str[4] |
N/A | 745.3 ns | N/A |
| 🆕 | smartstring[16] |
N/A | 768.9 ns | N/A |
| 🆕 | kstring[1024] |
N/A | 611.9 ns | N/A |
| 🆕 | compact_str[512] |
N/A | 741.1 ns | N/A |
| ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing hardfist:perf/micro-optimize-hstr (9472db8) with main (aa5b539)
Footnotes
-
31 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Description:
Draft hstr micro-optimization investigation. CodSpeed/Callgrind simulation is the source of truth for this PR; Criterion wall-time is not used for the acceptance metric.
The original
calc_hashchange in03e0a131a0regressed the measured simulation region, so the current head reverts that product-code change. The current product-code optimization keeps the dynamic atom hash path intact and instead tightens the equality hot path:Atom/Wtf8Atomequality is inlined,TaggedValueequality compares the raw tagged value directly, andpartial_eq!avoids the redundant fallback after the current dynamic/inline tag cases have been resolved.This PR also includes a benchmark harness correction for the HashSet group: it uses
iter_batched_refso setup and drop/keep-alive costs are outside the measured region and the benchmark measures the lookup loop consistently for every library. That benchmark-scope correction is listed separately below and is not counted as product-code speedup.Micro-Optimization Progress
Target benchmark:
single-thread/HashSet/hstr/10000Measurement mode: CodSpeed/Callgrind simulation, measured benchmark region
Primary metric:
Ir, withaccessesandestimated_cyclestracked from the same Callgrind eventsBaseline command: local
codspeed run -m simulation --profile-folder <dir> -- cargo codspeed run --bench libs -m simulation 'single-thread/HashSet'03e0a131a0single-thread/HashSet/hstr/10000caf9244729single-thread/HashSet/hstr/10000cargo fmt --all;cargo test -p hstr --features serde;cargo clippy -p hstr --all-targets --features serde -- -D warnings;cargo clippy --all --all-targets -- -D warningsTaggedValuerepresentation. This did not meet a 5% product-code target.e8a21e83e8single-thread/HashSet/hstr/10000cargo fmt --all;cargo clippy -p hstr --all-targets --features serde -- -D warnings;cargo codspeed build -p hstr --bench libs -m simulation9472db84ffsingle-thread/HashSet/hstr/10000cargo fmt --all;cargo test -p hstr --features serde;cargo clippy -p hstr --all-targets --features serde -- -D warnings;cargo clippy --all --all-targets -- -D warnings; local CodSpeed simulationsingle-thread/HashSet/hstr/1000at -5.006% Ir and no hstr benchmark Ir regressions in the measured hstr cases.Notes:
9472db84ffis the corrected HashSet measured region with the benchmark harness fix kept and the product-code changes reverted locally. This separates hstr runtime improvement from benchmark measurement correction.03e0a131a0showedIrdecreasing by 528,959 (0.014853%), but that includes Criterion harness/setup/random-data work and is not the metric used for this PR.codspeed_criterion_compatto get measured-region Callgrind dumps. The harness change is included in this PR so local CodSpeed simulation and CI measure the same benchmark region.ThinArcheader hash reads, storing the dynamic pointer at the hash field, full tag-byte comparison,write_u32/write_usizehashing,FxHasherpreimage hashing, and accessor inlining, did not improve the simulation metric and were not kept.BREAKING CHANGE:
None.
Related issue (if exists):
None.