Skip to content

bench: add reps (median ± spread) to compare_bench.py#30

Merged
cjfields merged 1 commit into
mainfrom
bench/compare-reps
Jun 8, 2026
Merged

bench: add reps (median ± spread) to compare_bench.py#30
cjfields merged 1 commit into
mainfrom
bench/compare-reps

Conversation

@cjfields

@cjfields cjfields commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Makes within-node benchmark noise visible. A label's PATH can now be a
comma-separated list of rep CSVs/dirs — repeated runs of the same config —
reduced to the median per metric, with the relative half-range shown as
median ±N%. Δ% compares medians.

Motivated by this investigation: a scary "+19% wall" turned out to be
concurrent-job contention on a shared node, not code. Reps surface that kind of
run-to-run variance directly instead of letting a single noisy run mislead.

Changes

  • parse_run: LABEL=PATH[,PATH...] — each path a CSV file or a directory.
  • load_reps: median + spread (relative half-range) per (key, metric) across reps.
  • print_table: median ±N% cells when reps>1; banner shows label (n=N).
  • --json: emits {nreps, median, spread} per run.

Single-path usage is unchanged (no spread shown). Pure stdlib; benchmark tooling only.

Example

compare_bench.py --metric maxrss_kb \
    --baseline main=r1_main,r2_main,r3_main \
    --compare branch=r1_br,r2_br,r3_br

### maxrss_kb  (↓ better)
                     main              branch      Δ%
dada         13.25 GB ±2%         8.80 GB ±2%  -33.6%

🤖 Generated with Claude Code

Within-node variance bit us once already (a benchmark "+19% wall" that was
concurrent-job contention, not code). Make that noise visible: a label's PATH
may now be a comma-separated list of rep CSVs/dirs — repeated runs of the same
config — reduced to the MEDIAN per metric, with the relative half-range shown as
`median ±N%`. Δ% compares medians.

- parse_run: LABEL=PATH[,PATH...]; each path a CSV or dir.
- load_reps: median + spread per (key, metric) across reps.
- print_table: `median ±N%` cells when reps>1; header shows `label (n=N)`.
- --json: emits {nreps, median, spread} per run.

Single-path usage is unchanged (no spread shown). Pure stdlib.

Usage:
  compare_bench.py --baseline main=r1_main,r2_main,r3_main \
                   --compare branch=r1_br,r2_br,r3_br
@cjfields cjfields merged commit f3dd315 into main Jun 8, 2026
5 checks passed
@cjfields cjfields deleted the bench/compare-reps branch June 8, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant