perf(l1): bal optimistic merkleization (or post-state root optimization)#6655
Conversation
|
Lines of code reportTotal lines added: Detailed view |
cea052f to
815f446
Compare
Benchmark Block Execution Results Comparison Against Main
|
01db258 to
43ae7c5
Compare
🤖 Codex Code ReviewFindings
No other consensus/gas-accounting problems stood out in the touched logic. I did not run the test suite; this is a static review of the diff. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have a complete picture of the PR. Let me write the review. PR Review:
|
Greptile SummaryThis PR decouples BAL-merkleization from EVM execution on the
Confidence Score: 4/5Safe to merge for the validation path; correctness is gated by the existing post-execution BAL hash check and the state-root comparison against the block header, so an incorrect optimistic trie results in block rejection rather than silent state corruption. The redesign is mechanically sound: per-field optionals avoid zero-fabrication for unchanged account fields, EIP-161 normalization in Stage C correctly handles account deletion without needing an explicit removed flag, and generate_witness_from_account_updates only reads address plus added_storage.keys() so the stripped optimistic_witness is complete for its purpose. Minor structural issues remain: BalSynthesisItem.address is stored but never read after construction, BalStateWorkItem.removed is an always-false dead branch, and the parallel pre-warm opens one state_trie handle per account rather than per thread. The pre-warm block in handle_merkleization_bal_from_updates (blockchain.rs) and the BalSynthesisItem struct definition (block_access_list.rs) deserve a second look.
|
| Filename | Overview |
|---|---|
| crates/blockchain/blockchain.rs | Core change: pre-scope BAL synthesis, optional channel, handle_merkleization_bal_from_updates with parallel pre-warm; BalStateWorkItem.removed is now always false (dead branch) and timing array expanded from 6 to 7 instants in the pipeline result. |
| crates/common/types/block_access_list.rs | New BalSynthesisItem struct and synthesize_bal_updates function with 13 unit tests; BalSynthesisItem.address field is redundant (duplicates the map key). |
| crates/vm/backends/levm/mod.rs | merkleizer: Sender to Option<Sender>; BAL path skips bal_to_account_updates + send; sequential path uses expect() to enforce the invariant that non-BAL callers always provide a Sender. |
| crates/vm/backends/mod.rs | One-line signature change: merkleizer: Sender to Option<Sender> to match LEVM. |
| crates/common/types/mod.rs | Re-exports BalSynthesisItem and synthesize_bal_updates from block_access_list. |
Sequence Diagram
sequenceDiagram
participant Main as Main Thread
participant Exec as EVM Exec Thread
participant Merkle as Merkleizer Thread
participant Warm as Warmer Thread
Note over Main: exec_merkle_start captured
Main->>Main: synthesize_bal_updates(bal)
Main->>Main: build optimistic_witness
Note over Main: thread::scope entered
par Parallel threads
Main->>Exec: "spawn merkleizer=None on BAL path"
Main->>Merkle: "spawn prepared=optimistic_updates"
Main->>Warm: spawn
end
Note over Merkle: merkle_start_instant captured
Merkle->>Merkle: par_iter pre-warm open_state_trie per account
Merkle->>Merkle: Stage B parallel storage root computation
Merkle->>Merkle: Stage C per-shard state trie updates EIP-161
Merkle->>Merkle: Stage D finalize root state_trie_hash
Note over Merkle: merkle_end_instant captured
Exec->>Exec: execute_block_parallel skip bal_to_account_updates
Exec->>Exec: validate_block_access_list_hash
Note over Exec: exec_end_instant captured
Main->>Main: accumulated_updates optimistic_witness OR streaming_witness
Main->>Main: verify state_trie_hash matches block header
Note over Main: exec_merkle_end_instant captured
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
crates/common/types/block_access_list.rs:1600-1608
The `address` field duplicates the map key in `FxHashMap<Address, BalSynthesisItem>`. In `handle_merkleization_bal_from_updates` the key `addr` is always used instead of `item.address`, and in `optimistic_witness` `*addr` is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.
```suggestion
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
```
### Issue 2 of 3
crates/blockchain/blockchain.rs:319-326
On the BAL synthesis path `removed` is hardcoded to `false` and `handle_merkleization_bal_from_updates` never sets it. The `if item.removed { account_state = AccountState::default(); }` branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.
```suggestion
struct BalStateWorkItem {
hashed_address: H256,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
```
### Issue 3 of 3
crates/blockchain/blockchain.rs:921-932
**Pre-warm opens one `state_trie` handle per account**
The rayon closure calls `open_state_trie(parent_state_root)` for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If `open_state_trie` acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that `open_state_trie` is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.
Reviews (1): Last reviewed commit: "perf(l1): BAL optimistic merkleization o..." | Re-trigger Greptile
| #[derive(Debug, Clone, Default)] | ||
| pub struct BalSynthesisItem { | ||
| pub address: Address, | ||
| pub balance: Option<U256>, | ||
| pub nonce: Option<u64>, | ||
| pub code_hash: Option<H256>, | ||
| pub code: Option<Code>, | ||
| pub added_storage: FxHashMap<H256, U256>, | ||
| } |
There was a problem hiding this comment.
The
address field duplicates the map key in FxHashMap<Address, BalSynthesisItem>. In handle_merkleization_bal_from_updates the key addr is always used instead of item.address, and in optimistic_witness *addr is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.
| #[derive(Debug, Clone, Default)] | |
| pub struct BalSynthesisItem { | |
| pub address: Address, | |
| pub balance: Option<U256>, | |
| pub nonce: Option<u64>, | |
| pub code_hash: Option<H256>, | |
| pub code: Option<Code>, | |
| pub added_storage: FxHashMap<H256, U256>, | |
| } | |
| #[derive(Debug, Clone, Default)] | |
| pub struct BalSynthesisItem { | |
| pub balance: Option<U256>, | |
| pub nonce: Option<u64>, | |
| pub code_hash: Option<H256>, | |
| pub code: Option<Code>, | |
| pub added_storage: FxHashMap<H256, U256>, | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/types/block_access_list.rs
Line: 1600-1608
Comment:
The `address` field duplicates the map key in `FxHashMap<Address, BalSynthesisItem>`. In `handle_merkleization_bal_from_updates` the key `addr` is always used instead of `item.address`, and in `optimistic_witness` `*addr` is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.
```suggestion
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
```
How can I resolve this? If you propose a fix, please make it concise.| struct BalStateWorkItem { | ||
| hashed_address: H256, | ||
| info: Option<AccountInfo>, | ||
| nonce: Option<u64>, | ||
| balance: Option<U256>, | ||
| code_hash: Option<H256>, | ||
| removed: bool, | ||
| /// Pre-computed storage root from Stage B, or None to keep existing. | ||
| storage_root: Option<H256>, |
There was a problem hiding this comment.
On the BAL synthesis path
removed is hardcoded to false and handle_merkleization_bal_from_updates never sets it. The if item.removed { account_state = AccountState::default(); } branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.
| struct BalStateWorkItem { | |
| hashed_address: H256, | |
| info: Option<AccountInfo>, | |
| nonce: Option<u64>, | |
| balance: Option<U256>, | |
| code_hash: Option<H256>, | |
| removed: bool, | |
| /// Pre-computed storage root from Stage B, or None to keep existing. | |
| storage_root: Option<H256>, | |
| struct BalStateWorkItem { | |
| hashed_address: H256, | |
| nonce: Option<u64>, | |
| balance: Option<U256>, | |
| code_hash: Option<H256>, | |
| /// Pre-computed storage root from Stage B, or None to keep existing. | |
| storage_root: Option<H256>, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 319-326
Comment:
On the BAL synthesis path `removed` is hardcoded to `false` and `handle_merkleization_bal_from_updates` never sets it. The `if item.removed { account_state = AccountState::default(); }` branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.
```suggestion
struct BalStateWorkItem {
hashed_address: H256,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
```
How can I resolve this? If you propose a fix, please make it concise.| // Warm parent state-trie pages for all touched accounts in parallel before | ||
| // Stage B / Stage C race for them. This replaces the prefetch that the old | ||
| // streaming path got for free via `bal_to_account_updates`. | ||
| accounts | ||
| .par_iter() | ||
| .try_for_each(|(hashed_address, _)| -> Result<(), StoreError> { | ||
| let state_trie = self.storage.open_state_trie(parent_state_root)?; | ||
| let _ = state_trie.get(hashed_address.as_bytes())?; | ||
| Ok(()) | ||
| })?; | ||
|
|
||
| // === Stage B: Parallel per-account storage root computation === |
There was a problem hiding this comment.
Pre-warm opens one
state_trie handle per account
The rayon closure calls open_state_trie(parent_state_root) for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If open_state_trie acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that open_state_trie is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 921-932
Comment:
**Pre-warm opens one `state_trie` handle per account**
The rayon closure calls `open_state_trie(parent_state_root)` for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If `open_state_trie` acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that `open_state_trie` is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.
How can I resolve this? If you propose a fix, please make it concise.10b1aa7 to
61a510f
Compare
ElFantasma
left a comment
There was a problem hiding this comment.
Three inline findings + two body observations, all minor. Not blocking.
-
validate_block_access_list_hashis the only correctness net for the optimistic merkle output, and it's gated onproduced_bal.is_some()atblockchain.rs:569-576. On the BAL path this should always fire (Amsterdam+ produces a BAL), but the correctness story is now load-bearing on a runtime invariant rather than a static one. Worth adebug_assert!(produced_bal.is_some(), "BAL validation path must produce a BAL")on the optimistic branch to surface a future regression early. -
execute_block_pipeline'sif let Some(bal) = header_baldoes NOT checkis_amsterdam(levm/mod.rs:391, pre-existing). Thedebug_assert!(is_amsterdam, …)insideexecute_block_parallelis a no-op in release. This PR adds another caller (the new optimistic merkle path) that depends on parallel execution running only on Amsterdam+. Ifadd_block_pipelineis ever called withSome(bal)on a pre-Amsterdam block in release, the parallel path runs incorrectly AND the synthesized merkle is consumed unchecked. Recommend gating the outerif let Some(bal)onis_amsterdam && header_bal.is_some(), or upgrading the inner debug_assert to a realEvmError. Practical risk is low (header BAL is feature-gated input to engine_newPayload), but the cost-to-fix is tiny.
61a510f to
e7aaec8
Compare
|
Pushed fixes for review comments in e7aaec8:
|
ElFantasma
left a comment
There was a problem hiding this comment.
Follow-up after the fix series (e7aaec8 → aa5f7ad). All 3 inline findings from the previous review landed cleanly, and the produced_bal-gate body suggestion was correctly identified as wrong and reverted in 79cf5327c (the Amsterdam BAL validation path intentionally returns None).
On a fresh-eyes pass over the post-fix code, three small drift items remain — all minor, none blocking, mostly doc-vs-code cleanup that's the natural consequence of an iterative fix series.
Decouple merkleization from EVM execution when the validation path receives a BAL: synthesize per-field deltas from the input BlockAccessList pre-execution and run merkle stages B/C/D in parallel with execution + warming. validate_block_access_list_hash remains the post-execution correctness gate. Closes #6584.
- drop unused BalSynthesisItem.address (duplicate of map key) - drop unreachable BalStateWorkItem.removed branch on synthesis path - chunk parent state-trie pre-warm: one trie open per worker (mirrors Stage B/C) instead of one per account - drop conflicting debug_assert + expect on empty slot_changes; keep defensive skip and update test - replace merkleizer .expect panic with a typed EvmError on the sequential path - gate execute_block_pipeline's header_bal branch on is_amsterdam - hard-error (no debug_assert) when bal validation path produces no BAL to hash-check
The Amsterdam BAL validation path (execute_block_pipeline header_bal branch) intentionally returns produced_bal=None: it uses the header BAL directly to drive execution and does not rebuild a BAL to hash-compare. BAL correctness on that path is enforced inside execute_block_pipeline (header-BAL index/size/withdrawal-index checks plus unread_storage_reads / unaccessed_pure_accounts gates). The gate added in the review pass conflicted with that design and made every Amsterdam block routed through add_block_pipeline fail with "BAL validation path produced no BAL; cannot validate BAL hash" (caught by ef-tests Two-pass pass-2). The hash check just below the removed block already correctly no-ops when produced_bal is None.
handle_merkleization_bal_from_updates was opening the parent state trie and walking each touched account's path. That work is identical to what warm_block_from_bal's Phase 1 (prefetch_accounts) already does on the warmer thread (open_state_trie + state_trie.get(hashed_address) per address). The two paths run concurrently and share the global rayon thread pool, so the duplicate was both wasted I/O and a source of worker contention. Removing it shortens the merkleizer thread by ~0.2 ms/block on the bal fixture (-11% on merkle's concurrent slice) and lets the warmer get its workers back. Wall time unchanged on this fixture (exec is the gate), but the merkleizer now has clean headroom for its real work (Stage B/C/D).
Left over from the merkleizer-side state prefetch removal in 5052c58 (par_chunks call site went away).
61fd8fc to
30778a1
Compare
Summary
Decouples BAL-merkleization from EVM execution on the
engine_newPayloadvalidation path. Synthesizes per-field state deltas from the input BAL pre-thread::scopeand runs merkle stages B/C/D in parallel with execution + warming. BAL-driven state prefetch lives on the warmer thread (Phase 1 ofwarm_block_from_bal), not duplicated on the merkleizer.Closes #6584. Validation path only (builder still streams). Amsterdam+ only.
Design
BalSynthesisItemcarries per-field optionals (noOption<AccountInfo>blob), so a balance-only recipient doesn't fabricate zero-nonce /EMPTY_KECCACK_HASHdeltas.handle_merkleization_bal→handle_merkleization_bal_from_updates; Stage A (channel drain) deleted on the BAL path.execute_block_pipeline/execute_block_paralleltakeOption<Sender>. BAL caller passesNoneand the EVM skipsbal_to_account_updates+merkleizer.send. Outerif let Some(bal) = header_balis also gated onis_amsterdam.warm_block_from_bal's Phase 1 (prefetch_accounts) opens the parent state trie and walks each touched account's path; the merkleizer reuses those warmed pages and does not duplicate the walk.added_storagebefore per-slot inserts; trie walks node-arena paths in order.Correctness
removed/removed_storagenot inferred from BAL; Stage Bvalue.is_zero()+ Stage C EIP-161 collapse handle selfdestruct identically to streaming.execute_block_pipelineviavalidate_header_bal_indices,validate_bal_withdrawal_index, theunread_storage_reads/unaccessed_pure_accountspost-exec checks, andvalidate_block_access_list_size. Downstreamstate_rootvs header still applies. (Pre-Amsterdam / streaming paths still return a produced BAL and the existingvalidate_block_access_list_hashcheck runs over it.)Benchmark
Fixture:
bal-devnet-7-mainnet-mix-460(460 blocks, ~30 Ggas, transfer/EVM mix).release-with-debug,import-bench --with-bal. Baseline =feat/import-bench-bal-tooling(bench tooling only, no perf change).The fixture is exec-bound, so wall savings are bounded by
max(0, merkle_total - exec_window). On mainnet-shaped blocks merkleization is a much larger share of the budget; the structural win scales there.Compare dashboard: https://edgl.dev/share/compare-feat-import-bench-bal-tooling@02a5663c-vs-perf-bal-optimistic-merkleization@ebd60c81.html
Test plan
cargo test -p ethrex-common synthesize_tests(13/13)cargo fmt --check/make lintbalgroup