feat: stream WAL verification at constant memory + signature policy by EulBite · Pull Request #11 · EulBite/spine

EulBite · 2026-06-01T07:05:43Z

What

Two verifier-side changes that let an offline auditor verify a very large WAL (months of a bank's log, terabytes on a disk) without running out of memory or time.

Streaming verification (constant memory)

spine-cli verify now streams the WAL one line at a time. Peak memory is flat regardless of size: it holds one line buffer plus the running chain state, never the WAL itself.

The lenient core is refactored into a public incremental LenientVerifier (new / process_line / finish). The buffered byte API (verify_wal_bytes*, used by the wasm playground and the cross-language vectors) is unchanged and drives the same state machine, so the streaming and buffered paths cannot diverge. Strict (--strict) is capped and stays buffered.

Signature policy

Per-record Ed25519 verification is the dominant cost on a large WAL (about 93% of full-verify time on the benchmark). Three policies:

default: verify every signature (unchanged behaviour)
--chain-only: verify chain, sequence, timestamps and root only (about 9x faster); retains tamper-evidence only with an authenticated --expected-root; incompatible with --trusted-pubkey / --keystore
--sample-signatures N: verify one record in every N (routine spot-check, explicitly not a defense against a targeted forger)

Reduced policies never weaken chain tamper-evidence: chain, sequence, timestamp, hash-format and root checks always run in full. A new signatures_skipped counter and explicit coverage warnings keep a reduced run from being mistaken for a full verify.

Measured (release CLI, single core, 709.5 MB / 1M-record WAL)

Run	Wall time	Peak RAM	Throughput
`verify` (full)	35.4 s	4.4 MB	28,254 ev/s
`verify --chain-only`	4.0 s	4.4 MB	249,875 ev/s
`verify --sample-signatures 1000`	4.1 s	4.5 MB	246,609 ev/s

The 4.4 MB column does not grow with the WAL: streaming removes the memory ceiling that buffering imposed (~1.2 GB on this input).

Docs

docs/verifying-at-scale.md records the measured numbers, the threat-model trade-offs of each policy, a designed-but-not-built parallel signature path, and the server-side proof-based model (signed checkpoints, Merkle transparency log with inclusion/consistency proofs) that makes the largest scenarios sub-linear. The proof-emitting side belongs to the production server and is out of scope for this repository.

Tests

Core: streaming-equals-buffered parity, chain-only still detects a broken link, sampling coverage and its honest partial-coverage cost.
CLI: smoke tests for --chain-only / --sample-signatures, flag-conflict usage errors, and multi-segment streaming (segment boundaries must not merge records).
Full workspace tests, clippy -D warnings, fmt --check, and a wasm32-unknown-unknown build are green.

Make `spine-cli verify` stream the WAL one line at a time so peak memory is flat regardless of size (4.4 MB on a 709 MB, 1M-record WAL, instead of buffering the whole file). The lenient core is refactored into a public incremental LenientVerifier (new/process_line/finish); the buffered byte API used by the wasm playground and the cross-language vectors is unchanged and drives the same state machine, so the streaming and buffered paths cannot diverge. Add a signature policy so an auditor can trade coverage for speed on a large WAL, where per-record Ed25519 verification is about 93% of the cost: - default: verify every signature (unchanged behaviour) - --chain-only: verify chain, sequence, timestamps and root only (about 9x faster); requires an authenticated --expected-root for tamper-evidence; incompatible with --trusted-pubkey/--keystore - --sample-signatures N: verify one record in every N (routine spot-check, not a defense against a targeted forger) Reduced policies never weaken chain tamper-evidence: chain, sequence, timestamp, hash-format and root checks always run in full. A new signatures_skipped counter plus explicit coverage warnings keep a reduced run from being mistaken for a full verify. docs/verifying-at-scale.md records the measured numbers, the threat-model trade-offs of each policy, and the server-side proof-based model (signed checkpoints, Merkle transparency log) for the largest scenarios, which is out of scope for this repository. Tests: streaming-equals-buffered parity, chain-only still detects a broken link, sampling coverage and its honest partial-coverage cost; CLI smoke tests for the new flags and for multi-segment streaming. Full workspace tests, clippy -D warnings, fmt, and a wasm32 build are green.

…tiple_of)

EulBite added 2 commits June 1, 2026 09:04

fix: use u64::is_multiple_of for sampling check (clippy manual_is_mul…

7a23100

…tiple_of)

EulBite merged commit 8c41220 into main Jun 1, 2026
3 checks passed

EulBite deleted the streaming-verify-signature-policy branch June 1, 2026 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stream WAL verification at constant memory + signature policy#11

feat: stream WAL verification at constant memory + signature policy#11
EulBite merged 2 commits into
mainfrom
streaming-verify-signature-policy

EulBite commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EulBite commented Jun 1, 2026

What

Streaming verification (constant memory)

Signature policy

Measured (release CLI, single core, 709.5 MB / 1M-record WAL)

Docs

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant