Skip to content

perf(hstr): trim atom hash and benchmark hashset lookups#11874

Closed
hardfist wants to merge 5 commits into
swc-project:mainfrom
hardfist:perf/micro-optimize-hstr
Closed

perf(hstr): trim atom hash and benchmark hashset lookups#11874
hardfist wants to merge 5 commits into
swc-project:mainfrom
hardfist:perf/micro-optimize-hstr

Conversation

@hardfist
Copy link
Copy Markdown
Collaborator

@hardfist hardfist commented May 20, 2026

Description:

Draft hstr micro-optimization investigation. CodSpeed/Callgrind simulation is the source of truth for this PR; Criterion wall-time is not used for the acceptance metric.

The original calc_hash change in 03e0a131a0 regressed the measured simulation region, so the current head reverts that product-code change. The current product-code optimization keeps the dynamic atom hash path intact and instead tightens the equality hot path: Atom/Wtf8Atom equality is inlined, TaggedValue equality compares the raw tagged value directly, and partial_eq! avoids the redundant fallback after the current dynamic/inline tag cases have been resolved.

This PR also includes a benchmark harness correction for the HashSet group: it uses iter_batched_ref so setup and drop/keep-alive costs are outside the measured region and the benchmark measures the lookup loop consistently for every library. That benchmark-scope correction is listed separately below and is not counted as product-code speedup.

Micro-Optimization Progress

Target benchmark: single-thread/HashSet/hstr/10000
Measurement mode: CodSpeed/Callgrind simulation, measured benchmark region
Primary metric: Ir, with accesses and estimated_cycles tracked from the same Callgrind events
Baseline command: local codspeed run -m simulation --profile-folder <dir> -- cargo codspeed run --bench libs -m simulation 'single-thread/HashSet'

Commit Benchmark Mode Ir Before Ir After Ir Delta Accesses Before Accesses After Accesses Delta Estimated Cycles Before Estimated Cycles After Estimated Cycles Delta Checks Notes
03e0a131a0 single-thread/HashSet/hstr/10000 CodSpeed/Callgrind simulation 4,939,194 4,956,885 +17,691 (+0.358%) 7,413,610 7,434,365 +20,755 (+0.280%) 21,894,280 21,933,745 +39,465 (+0.180%) CI passed for hstr benchmark Not a simulation improvement; superseded by current head.
caf9244729 single-thread/HashSet/hstr/10000 CodSpeed/Callgrind simulation 4,939,194 4,898,269 -40,925 (-0.829%) 7,413,610 7,371,354 -42,256 (-0.570%) 21,894,280 21,863,064 -31,216 (-0.143%) cargo fmt --all; cargo test -p hstr --features serde; cargo clippy -p hstr --all-targets --features serde -- -D warnings; cargo clippy --all --all-targets -- -D warnings Product-code improvement from avoiding repeated atom tag extraction and using direct pointer casts in the default TaggedValue representation. This did not meet a 5% product-code target.
e8a21e83e8 single-thread/HashSet/hstr/10000 CodSpeed/Callgrind simulation 4,898,269 3,984,583 -913,686 (-18.653%) 7,371,354 4,701,453 -2,669,901 (-36.220%) 21,863,064 17,408,473 -4,454,591 (-20.375%) cargo fmt --all; cargo clippy -p hstr --all-targets --features serde -- -D warnings; cargo codspeed build -p hstr --bench libs -m simulation Benchmark-scope correction: moves HashSet setup/drop/keep-alive work out of the measured region for all libraries. This is not counted as product-code speedup.
9472db84ff single-thread/HashSet/hstr/10000 CodSpeed/Callgrind simulation 4,025,411 3,794,702 -230,709 (-5.731%) 4,744,054 4,387,796 -356,258 (-7.510%) 17,457,889 17,108,326 -349,563 (-2.002%) cargo fmt --all; cargo test -p hstr --features serde; cargo clippy -p hstr --all-targets --features serde -- -D warnings; cargo clippy --all --all-targets -- -D warnings; local CodSpeed simulation Product-code delta only, using the corrected HashSet measured region for both before and after. The same run shows single-thread/HashSet/hstr/1000 at -5.006% Ir and no hstr benchmark Ir regressions in the measured hstr cases.

Notes:

  • The final product-code baseline for 9472db84ff is the corrected HashSet measured region with the benchmark harness fix kept and the product-code changes reverted locally. This separates hstr runtime improvement from benchmark measurement correction.
  • A full-process direct Callgrind run for 03e0a131a0 showed Ir decreasing by 528,959 (0.014853%), but that includes Criterion harness/setup/random-data work and is not the metric used for this PR.
  • Local simulation required porting the hstr bench harness to codspeed_criterion_compat to get measured-region Callgrind dumps. The harness change is included in this PR so local CodSpeed simulation and CI measure the same benchmark region.
  • Additional attempted product-code candidates, including direct raw ThinArc header hash reads, storing the dynamic pointer at the hash field, full tag-byte comparison, write_u32/write_usize hashing, FxHasher preimage hashing, and accessor inlining, did not improve the simulation metric and were not kept.

BREAKING CHANGE:

None.

Related issue (if exists):

None.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 20, 2026

⚠️ No Changeset found

Latest commit: e8a21e8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will improve performance by 2.47%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 13 improved benchmarks
✅ 206 untouched benchmarks
🆕 228 new benchmarks
⏩ 31 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
es/large/minify/libraries/d3 493.1 ms 479.8 ms +2.79%
es/large/all/es2018 112.8 ms 110.2 ms +2.28%
es/large/all/es3 180.4 ms 176.8 ms +2.03%
es/large/minify/libraries/lodash 142 ms 138.4 ms +2.65%
es/large/minify/libraries/vue 177.9 ms 174.3 ms +2.06%
es/oxc/benches/assets/parser.ts/sourceMap=false/reactDev=false 48.8 ms 47.7 ms +2.22%
es/minifier/libs/d3 377.4 ms 365.5 ms +3.26%
es/minifier/libs/jquery 89.8 ms 87.8 ms +2.26%
es/minifier/libs/moment 57.8 ms 56.5 ms +2.4%
es/minifier/libs/react 18.5 ms 18.2 ms +2.03%
es/minifier/libs/lodash 108.3 ms 105.1 ms +3.1%
es/minifier/libs/vue 135.7 ms 132.4 ms +2.46%
es/resolver_with_hygiene/typescript 757.7 ms 738.4 ms +2.61%
🆕 compact_str[1024] N/A 741.1 ns N/A
🆕 compact_str[128] N/A 741.1 ns N/A
🆕 compact_str[32] N/A 770.3 ns N/A
🆕 compact_str[4] N/A 745.3 ns N/A
🆕 smartstring[16] N/A 768.9 ns N/A
🆕 kstring[1024] N/A 611.9 ns N/A
🆕 compact_str[512] N/A 741.1 ns N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing hardfist:perf/micro-optimize-hstr (9472db8) with main (aa5b539)

Open in CodSpeed

Footnotes

  1. 31 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@hardfist hardfist changed the title perf(hstr): hash dynamic atom bytes directly perf(hstr): trim atom hash and equality hot paths May 20, 2026
@hardfist hardfist changed the title perf(hstr): trim atom hash and equality hot paths perf(hstr): trim atom hash and benchmark hashset lookups May 20, 2026
@hardfist hardfist closed this May 20, 2026
@github-actions github-actions Bot added this to the Planned milestone May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant