Skip to content

feat: stream WAL verification at constant memory + signature policy#11

Merged
EulBite merged 2 commits into
mainfrom
streaming-verify-signature-policy
Jun 1, 2026
Merged

feat: stream WAL verification at constant memory + signature policy#11
EulBite merged 2 commits into
mainfrom
streaming-verify-signature-policy

Conversation

@EulBite

@EulBite EulBite commented Jun 1, 2026

Copy link
Copy Markdown
Owner

What

Two verifier-side changes that let an offline auditor verify a very large WAL (months of a bank's log, terabytes on a disk) without running out of memory or time.

Streaming verification (constant memory)

spine-cli verify now streams the WAL one line at a time. Peak memory is flat regardless of size: it holds one line buffer plus the running chain state, never the WAL itself.

The lenient core is refactored into a public incremental LenientVerifier (new / process_line / finish). The buffered byte API (verify_wal_bytes*, used by the wasm playground and the cross-language vectors) is unchanged and drives the same state machine, so the streaming and buffered paths cannot diverge. Strict (--strict) is capped and stays buffered.

Signature policy

Per-record Ed25519 verification is the dominant cost on a large WAL (about 93% of full-verify time on the benchmark). Three policies:

  • default: verify every signature (unchanged behaviour)
  • --chain-only: verify chain, sequence, timestamps and root only (about 9x faster); retains tamper-evidence only with an authenticated --expected-root; incompatible with --trusted-pubkey / --keystore
  • --sample-signatures N: verify one record in every N (routine spot-check, explicitly not a defense against a targeted forger)

Reduced policies never weaken chain tamper-evidence: chain, sequence, timestamp, hash-format and root checks always run in full. A new signatures_skipped counter and explicit coverage warnings keep a reduced run from being mistaken for a full verify.

Measured (release CLI, single core, 709.5 MB / 1M-record WAL)

Run Wall time Peak RAM Throughput
verify (full) 35.4 s 4.4 MB 28,254 ev/s
verify --chain-only 4.0 s 4.4 MB 249,875 ev/s
verify --sample-signatures 1000 4.1 s 4.5 MB 246,609 ev/s

The 4.4 MB column does not grow with the WAL: streaming removes the memory ceiling that buffering imposed (~1.2 GB on this input).

Docs

docs/verifying-at-scale.md records the measured numbers, the threat-model trade-offs of each policy, a designed-but-not-built parallel signature path, and the server-side proof-based model (signed checkpoints, Merkle transparency log with inclusion/consistency proofs) that makes the largest scenarios sub-linear. The proof-emitting side belongs to the production server and is out of scope for this repository.

Tests

  • Core: streaming-equals-buffered parity, chain-only still detects a broken link, sampling coverage and its honest partial-coverage cost.
  • CLI: smoke tests for --chain-only / --sample-signatures, flag-conflict usage errors, and multi-segment streaming (segment boundaries must not merge records).
  • Full workspace tests, clippy -D warnings, fmt --check, and a wasm32-unknown-unknown build are green.

EulBite added 2 commits June 1, 2026 09:04
Make `spine-cli verify` stream the WAL one line at a time so peak memory
is flat regardless of size (4.4 MB on a 709 MB, 1M-record WAL, instead of
buffering the whole file). The lenient core is refactored into a public
incremental LenientVerifier (new/process_line/finish); the buffered byte
API used by the wasm playground and the cross-language vectors is
unchanged and drives the same state machine, so the streaming and
buffered paths cannot diverge.

Add a signature policy so an auditor can trade coverage for speed on a
large WAL, where per-record Ed25519 verification is about 93% of the
cost:
  - default: verify every signature (unchanged behaviour)
  - --chain-only: verify chain, sequence, timestamps and root only
    (about 9x faster); requires an authenticated --expected-root for
    tamper-evidence; incompatible with --trusted-pubkey/--keystore
  - --sample-signatures N: verify one record in every N (routine
    spot-check, not a defense against a targeted forger)

Reduced policies never weaken chain tamper-evidence: chain, sequence,
timestamp, hash-format and root checks always run in full. A new
signatures_skipped counter plus explicit coverage warnings keep a
reduced run from being mistaken for a full verify.

docs/verifying-at-scale.md records the measured numbers, the threat-model
trade-offs of each policy, and the server-side proof-based model (signed
checkpoints, Merkle transparency log) for the largest scenarios, which is
out of scope for this repository.

Tests: streaming-equals-buffered parity, chain-only still detects a
broken link, sampling coverage and its honest partial-coverage cost; CLI
smoke tests for the new flags and for multi-segment streaming. Full
workspace tests, clippy -D warnings, fmt, and a wasm32 build are green.
@EulBite EulBite merged commit 8c41220 into main Jun 1, 2026
3 checks passed
@EulBite EulBite deleted the streaming-verify-signature-policy branch June 1, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant