Bootstrapping an EIP-7864 binary-trie "shadow" Ethereum node for mainnet, plus an equivalence daemon that checks binary↔MPT state-value equivalence per block. The shadow runs p2p-disabled, fed by a local mainnet node; it mirrors the same state in a binary tree (keyed off raw addresses/slots via BLAKE3) instead of the keccak-keyed MPT. State roots differ by design, so the check compares values, not roots.
Built on a fork of ethrex's binary-trie work plus a memory-bound bulk-migration pipeline.
The full mainnet state has been migrated into a binary trie on a commodity (61 GB RAM) box:
Recorded migrated state at block 25,340,000
Binary trie state root: 0x7f29471437843deeb81ddeb09e1121d9c21f5f03fe737936366fd86dbc6715e5
Collection complete: 388,383,489 accounts, 1,563,779,711 storage slots, 3,666,719,467 entries
Known caveat under investigation: ~1.78% of accounts/slots lacked a preimage in our derived
set and were skipped (coverage gap, not a pipeline failure) — see planning/ and
docs/design-rationale.md. Next steps: launch the shadow node + equivalence daemon.
The MPT keys state by keccak256(address) / keccak256(slot) (one-way). EIP-7864 keys its
tree off the raw address/slot, so migrating state needs a complete set of keccak
preimages. Public snapshots don't provide one (reth stores state hashed → zero preimages;
geth's --cache.preimages is only ~77% complete on snap-synced nodes — see git history of
this README and planning/iteration_3/). The working recipe instead:
- values from a geth snapshot export (
db export snapshot+code) — authoritative current state at the snapshot block, and the MPT oracle the equivalence daemon checks against - preimages from ethPandaOps Xatu
canonical_execution_*parquet (plain addresses + slot keys recorded from block 0), keccak'd into a preimage file - ethrex
migratejoins them to bulk-build the binary trie
migrate resolves ~2 B leaves against a ~108 GB preimage file. Naively that's ~1 B random
mmap reads (the file can't fit in RAM) → days, and it crashed once. The fix doesn't touch
migrate: re-sort its input. The snapshot's storage is ordered by keccak(addr)+keccak(slot),
so keccak(slot) lookups scatter randomly. snapshot-resorter re-sorts storage by
keccak(slot) → lookups become monotonic → the mmap binary-search streams sequentially →
memory-bound, no extra RAM. Output is provably identical (migrate writes to a temp CF sorted by
tree key, so input order can't change the result). See snapshot-resorter/.
0 toolchain + builds (ethrex fork, patched geth, duckdb)
1 geth export -> snapshot.rlp + code.rlp
2 Xatu distinct -> distinct storage slots + account addresses (xatu-preimages/)
3 keccak -> preimages.rlp (preimage-builder/)
3.5 snapshot-resort -> snapshot-sorted.rlp (snapshot-resorter/)
4 migrate -> binary trie (RocksDB datadir)
5 launch binary-node (p2p off, fed by mainnet node)
6 feeder + equivalence daemon + Grafana
Full step-by-step: docs/replication.md. Design rationale + rejected approaches:
docs/design-rationale.md. Chronological research/execution journal: planning/.
snapshot-resorter/— Rust; the memory-bound resort (the key enabler). Tested + the gethdbdump round-trip validated.preimage-builder/— Rust; reads distinct addr/slot parquet → keccak256 → hash-partitioned, globally-sortedpreimages.rlpin gethdbdump format.xatu-preimages/— download (download_xatu.py) + bounded-memory distinct extraction (build_addrs_distinct.sh,build_slots_distinct.sh) over ~1 TB of Xatu parquet.reth-state-extractor/— shelved; the tool that proved reth stores no preimages (historical).docs/,planning/— canonical docs and the journal.*.log,gethdump/**/export.log— raw run evidence.
Large data artifacts (snapshots, exports, preimage files, the migrated DB, raw Xatu) are git-ignored — regenerate via the pipeline.
- ethrex (binary-trie node +
migrate):lambdaclass/ethrexwith our patch at commitb0fe293(--at-blockflag + migration). To be pushed to a public fork / upstreamed. - patched geth (adds
db export code):edg-l/go-ethereumbranchfeat/export-code(geth v1.17.2 base).