feat: stream WAL verification at constant memory + signature policy#11
Merged
Conversation
Make `spine-cli verify` stream the WAL one line at a time so peak memory
is flat regardless of size (4.4 MB on a 709 MB, 1M-record WAL, instead of
buffering the whole file). The lenient core is refactored into a public
incremental LenientVerifier (new/process_line/finish); the buffered byte
API used by the wasm playground and the cross-language vectors is
unchanged and drives the same state machine, so the streaming and
buffered paths cannot diverge.
Add a signature policy so an auditor can trade coverage for speed on a
large WAL, where per-record Ed25519 verification is about 93% of the
cost:
- default: verify every signature (unchanged behaviour)
- --chain-only: verify chain, sequence, timestamps and root only
(about 9x faster); requires an authenticated --expected-root for
tamper-evidence; incompatible with --trusted-pubkey/--keystore
- --sample-signatures N: verify one record in every N (routine
spot-check, not a defense against a targeted forger)
Reduced policies never weaken chain tamper-evidence: chain, sequence,
timestamp, hash-format and root checks always run in full. A new
signatures_skipped counter plus explicit coverage warnings keep a
reduced run from being mistaken for a full verify.
docs/verifying-at-scale.md records the measured numbers, the threat-model
trade-offs of each policy, and the server-side proof-based model (signed
checkpoints, Merkle transparency log) for the largest scenarios, which is
out of scope for this repository.
Tests: streaming-equals-buffered parity, chain-only still detects a
broken link, sampling coverage and its honest partial-coverage cost; CLI
smoke tests for the new flags and for multi-segment streaming. Full
workspace tests, clippy -D warnings, fmt, and a wasm32 build are green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two verifier-side changes that let an offline auditor verify a very large WAL (months of a bank's log, terabytes on a disk) without running out of memory or time.
Streaming verification (constant memory)
spine-cli verifynow streams the WAL one line at a time. Peak memory is flat regardless of size: it holds one line buffer plus the running chain state, never the WAL itself.The lenient core is refactored into a public incremental
LenientVerifier(new/process_line/finish). The buffered byte API (verify_wal_bytes*, used by the wasm playground and the cross-language vectors) is unchanged and drives the same state machine, so the streaming and buffered paths cannot diverge. Strict (--strict) is capped and stays buffered.Signature policy
Per-record Ed25519 verification is the dominant cost on a large WAL (about 93% of full-verify time on the benchmark). Three policies:
--chain-only: verify chain, sequence, timestamps and root only (about 9x faster); retains tamper-evidence only with an authenticated--expected-root; incompatible with--trusted-pubkey/--keystore--sample-signatures N: verify one record in every N (routine spot-check, explicitly not a defense against a targeted forger)Reduced policies never weaken chain tamper-evidence: chain, sequence, timestamp, hash-format and root checks always run in full. A new
signatures_skippedcounter and explicit coverage warnings keep a reduced run from being mistaken for a full verify.Measured (release CLI, single core, 709.5 MB / 1M-record WAL)
verify(full)verify --chain-onlyverify --sample-signatures 1000The 4.4 MB column does not grow with the WAL: streaming removes the memory ceiling that buffering imposed (~1.2 GB on this input).
Docs
docs/verifying-at-scale.mdrecords the measured numbers, the threat-model trade-offs of each policy, a designed-but-not-built parallel signature path, and the server-side proof-based model (signed checkpoints, Merkle transparency log with inclusion/consistency proofs) that makes the largest scenarios sub-linear. The proof-emitting side belongs to the production server and is out of scope for this repository.Tests
--chain-only/--sample-signatures, flag-conflict usage errors, and multi-segment streaming (segment boundaries must not merge records).clippy -D warnings,fmt --check, and awasm32-unknown-unknownbuild are green.