Skip to content

perf(l1): bal optimistic merkleization (or post-state root optimization)#6655

Merged
edg-l merged 6 commits into
mainfrom
perf/bal-optimistic-merkleization
May 22, 2026
Merged

perf(l1): bal optimistic merkleization (or post-state root optimization)#6655
edg-l merged 6 commits into
mainfrom
perf/bal-optimistic-merkleization

Conversation

@edg-l
Copy link
Copy Markdown
Contributor

@edg-l edg-l commented May 14, 2026

Summary

Decouples BAL-merkleization from EVM execution on the engine_newPayload validation path. Synthesizes per-field state deltas from the input BAL pre-thread::scope and runs merkle stages B/C/D in parallel with execution + warming. BAL-driven state prefetch lives on the warmer thread (Phase 1 of warm_block_from_bal), not duplicated on the merkleizer.

Closes #6584. Validation path only (builder still streams). Amsterdam+ only.

Design

  • BalSynthesisItem carries per-field optionals (no Option<AccountInfo> blob), so a balance-only recipient doesn't fabricate zero-nonce / EMPTY_KECCACK_HASH deltas.
  • handle_merkleization_balhandle_merkleization_bal_from_updates; Stage A (channel drain) deleted on the BAL path.
  • EVM side: execute_block_pipeline / execute_block_parallel take Option<Sender>. BAL caller passes None and the EVM skips bal_to_account_updates + merkleizer.send. Outer if let Some(bal) = header_bal is also gated on is_amsterdam.
  • State prefetch: warm_block_from_bal's Phase 1 (prefetch_accounts) opens the parent state trie and walks each touched account's path; the merkleizer reuses those warmed pages and does not duplicate the walk.
  • Hashed-key sort on added_storage before per-slot inserts; trie walks node-arena paths in order.

Correctness

  • removed / removed_storage not inferred from BAL; Stage B value.is_zero() + Stage C EIP-161 collapse handle selfdestruct identically to streaming.
  • The Amsterdam path doesn't rebuild a BAL to hash-compare; the header BAL drives execution and is validated in execute_block_pipeline via validate_header_bal_indices, validate_bal_withdrawal_index, the unread_storage_reads / unaccessed_pure_accounts post-exec checks, and validate_block_access_list_size. Downstream state_root vs header still applies. (Pre-Amsterdam / streaming paths still return a produced BAL and the existing validate_block_access_list_hash check runs over it.)

Benchmark

Fixture: bal-devnet-7-mainnet-mix-460 (460 blocks, ~30 Ggas, transfer/EVM mix). release-with-debug, import-bench --with-bal. Baseline = feat/import-bench-bal-tooling (bench tooling only, no perf change).

metric baseline this PR delta
wall time 8.575 s 8.437 s -138 ms (-1.6%)
agg Ggas/s 3.901 3.991 +2.3%
avg ms / block 16.884 16.505 -0.38 ms (-2.2%)
merkle drain 0.479 ms 0.047 ms -90%
merkle concurrent 15.571 ms 1.587 ms merkle finishes alongside exec
merkle overlap 96.7% 97.2% +0.5 pp
exec 15.574 ms 15.650 ms flat
store 0.667 ms 0.648 ms flat
warmer 1.365 ms 1.877 ms +0.5 ms (state prefetch on warmer thread)

The fixture is exec-bound, so wall savings are bounded by max(0, merkle_total - exec_window). On mainnet-shaped blocks merkleization is a much larger share of the budget; the structural win scales there.

Compare dashboard: https://edgl.dev/share/compare-feat-import-bench-bal-tooling@02a5663c-vs-perf-bal-optimistic-merkleization@ebd60c81.html

Test plan

  • cargo test -p ethrex-common synthesize_tests (13/13)
  • cargo fmt --check / make lint
  • EF blockchain tests (two-pass parallel runner exercises this path on every Amsterdam fixture)
  • Hive bal group

@github-actions github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels May 14, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@edg-l edg-l moved this to In Progress in ethrex_l1 May 14, 2026
@edg-l edg-l changed the title perf(l1): BAL optimistic merkleization on validation path perf(l1): bal optimistic merkleization on validation path May 14, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

Lines of code report

Total lines added: 261
Total lines removed: 0
Total lines changed: 261

Detailed view
+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs          | 2551  | +5   |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/types/block_access_list.rs | 1409  | +246 |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/types/mod.rs               | 31    | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/backends/levm/mod.rs           | 2396  | +9   |
+-------------------------------------------------+-------+------+

@edg-l edg-l force-pushed the perf/bal-optimistic-merkleization branch 3 times, most recently from cea052f to 815f446 Compare May 14, 2026 12:02
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 67.015 ± 0.525 66.401 67.980 1.01 ± 0.01
head 66.273 ± 0.188 66.047 66.702 1.00

@edg-l edg-l force-pushed the perf/bal-optimistic-merkleization branch 5 times, most recently from 01db258 to 43ae7c5 Compare May 14, 2026 14:43
@edg-l edg-l marked this pull request as ready for review May 15, 2026 14:52
@edg-l edg-l requested a review from a team as a code owner May 15, 2026 14:52
@ethrex-project-sync ethrex-project-sync Bot moved this from In Progress to In Review in ethrex_l1 May 15, 2026
@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. High: blockchain.rs:478 builds optimistic_witness with only address and added_storage. That vector is later fed into blockchain.rs:1943, but the replay logic in store.rs:1869 only mutates nonce/balance/code hash when update.info is present and only removes accounts when update.removed is set. On the BAL path, cached witnesses will therefore miss balance-only, nonce-only, code-only, account-creation, and removal transitions when --precompute-witnesses is enabled. I would keep the synthesized BAL map for merkleization, but build witness updates from the prestate-aware bal_to_account_updates path (or another full AccountUpdate builder), not from this storage-only skeleton.

  2. Low: blockchain.rs:924 uses par_iter() directly, which runs on Rayon’s global pool, while the existing streaming merkle path is explicitly bounded by blockchain.rs:699 self.merkle_pool. The new BAL prewarm can therefore spill extra work onto the global pool and oversubscribe CPU/DB readers on large BALs, which is the opposite of what you want on engine_newPayload. Running that prewarm inside self.merkle_pool.install(...) or folding it into the scoped workers would keep resource usage predictable.

No other consensus/gas-accounting problems stood out in the touched logic. I did not run the test suite; this is a static review of the diff.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a complete picture of the PR. Let me write the review.


PR Review: perf(l1): BAL optimistic merkleization on validation path

Overall: The design is sound and the implementation is careful. Decoupling synthesis from execution is a legitimate optimization, the invariants are well-maintained, and the 13 unit tests cover the critical edge cases. A few items below deserve attention before merging.


Correctness

EIP-161 normalization — correct. Stage C at blockchain.rs:1123-1130 already handles the removed: false hardcode safely:

// EIP-161: remove empty accounts (zero nonce, zero balance, empty code, empty storage)
if account_state != AccountState::default() {
    state_trie.insert(...)?;
} else {
    state_trie.remove(path)?;
}

A selfdestructed account (BAL records balance = 0, no nonce/code change) will be compared against AccountState::default() using pre-state nonce/code. If it was a fresh same-tx creation, the empty check removes it; if it had pre-existing nonce/code, the trie keeps the entry with balance=0. This matches the streaming path behavior as claimed.

Witness correctness — correct. generate_witness_from_account_updates at blockchain.rs:1579-1588 only reads account_update.address and account_update.added_storage.keys() — it does not consume info, code, or removed. The synthesized witness (address + added_storage) is sufficient and matches the actual usage.

Channel invariant — correct. The invariant optimistic_updates.is_some() ↔ rx_for_merkle.is_none() is established cleanly via bal.map(...) vs if bal.is_some(). Both expect() calls are logically justified.


Issues

1. duration_since may panic in the metrics path (blockchain.rs:2150-2153)

let merkle_start_delay_ms = merkle_start_instant
    .duration_since(exec_merkle_start)   // panics if exec_merkle_start > merkle_start_instant
    .as_secs_f64()
    * 1000.0;

Instant::duration_since panics on some platforms if the argument is later than self. Although thread scheduling virtually guarantees merkle_start_instant >= exec_merkle_start, a panic inside the logging function would crash a validator node over a metric. Compare with blockchain.rs:2093 which already uses saturating_duration_since for overlap:

.saturating_duration_since(exec_end_instant)

Use saturating_duration_since here as well for consistency and safety.

2. Pre-warm opens one state_trie per account (blockchain.rs, the new parallel pre-warm block)

accounts
    .par_iter()
    .try_for_each(|(hashed_address, _)| -> Result<(), StoreError> {
        let state_trie = self.storage.open_state_trie(parent_state_root)?;  // one per item
        let _ = state_trie.get(hashed_address.as_bytes())?;
        Ok(())
    })?;

For a block with, say, 400 touched accounts, this creates 400 trie handles and immediately discards them. Stage B opens exactly 16 (one per worker). If open_state_trie does anything heavier than wrapping a pointer — e.g. looking up the root node in RocksDB — this loop creates gratuitous work that may partially negate the warm-up benefit. Consider par_chunks or a per-thread trie via rayon's local storage to amortize the open across all accounts a given thread processes.

3. Redundant address field in BalSynthesisItem (block_access_list.rs:1594)

pub struct BalSynthesisItem {
    pub address: Address,    // duplicated — always equals the FxHashMap key
    ...
}

The field is set during synthesis but never read from the struct; callers always use the map key addr. Removing it would avoid the duplication and make the struct's role clearer (it's a field-delta bag, not a self-describing entity). Not a bug, but worth cleaning up.


Minor notes

  • The debug_assert! + immediate continue guard for empty slot_changes is the right pattern; the test at synthesize_skips_when_slot_changes_empty correctly annotates the dual debug/release behaviour with #[cfg_attr(debug_assertions, should_panic(...))].

  • The greedy bin-packing weight comment (weight >= 1 invariant) is still accurate for the new synthesis path since removed / removed_storage are gone and weight is purely max(1, added_storage.len()).

  • The [Instant; 7] type alias in BlockExecutionPipelineResult vs [Instant; 8] in log_block_execution_pipeline_info (which appends stored_instant at the call site) is intentional and consistent.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR decouples BAL-merkleization from EVM execution on the engine_newPayload validation path by synthesizing per-field state deltas from the input BlockAccessList before thread::scope, then running merkle stages B/C/D in parallel with EVM execution. The correctness gate (validate_block_access_list_hash) remains post-execution; a hash mismatch discards the optimistic result via ? propagation.

  • BalSynthesisItem replaces the old Option<AccountInfo> blob with per-field optionals, preventing zero-fabrication for unchanged fields.
  • handle_merkleization_bal is replaced by handle_merkleization_bal_from_updates, eliminating Stage A (channel drain); the EVM skips bal_to_account_updates + send when merkleizer = None on the validation path.
  • Parallel state-trie pre-warm and hashed-key-sorted storage inserts are added to improve cache locality for Stage B/C.

Confidence Score: 4/5

Safe to merge for the validation path; correctness is gated by the existing post-execution BAL hash check and the state-root comparison against the block header, so an incorrect optimistic trie results in block rejection rather than silent state corruption.

The redesign is mechanically sound: per-field optionals avoid zero-fabrication for unchanged account fields, EIP-161 normalization in Stage C correctly handles account deletion without needing an explicit removed flag, and generate_witness_from_account_updates only reads address plus added_storage.keys() so the stripped optimistic_witness is complete for its purpose. Minor structural issues remain: BalSynthesisItem.address is stored but never read after construction, BalStateWorkItem.removed is an always-false dead branch, and the parallel pre-warm opens one state_trie handle per account rather than per thread.

The pre-warm block in handle_merkleization_bal_from_updates (blockchain.rs) and the BalSynthesisItem struct definition (block_access_list.rs) deserve a second look.

Important Files Changed

Filename Overview
crates/blockchain/blockchain.rs Core change: pre-scope BAL synthesis, optional channel, handle_merkleization_bal_from_updates with parallel pre-warm; BalStateWorkItem.removed is now always false (dead branch) and timing array expanded from 6 to 7 instants in the pipeline result.
crates/common/types/block_access_list.rs New BalSynthesisItem struct and synthesize_bal_updates function with 13 unit tests; BalSynthesisItem.address field is redundant (duplicates the map key).
crates/vm/backends/levm/mod.rs merkleizer: Sender to Option<Sender>; BAL path skips bal_to_account_updates + send; sequential path uses expect() to enforce the invariant that non-BAL callers always provide a Sender.
crates/vm/backends/mod.rs One-line signature change: merkleizer: Sender to Option<Sender> to match LEVM.
crates/common/types/mod.rs Re-exports BalSynthesisItem and synthesize_bal_updates from block_access_list.

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant Exec as EVM Exec Thread
    participant Merkle as Merkleizer Thread
    participant Warm as Warmer Thread

    Note over Main: exec_merkle_start captured
    Main->>Main: synthesize_bal_updates(bal)
    Main->>Main: build optimistic_witness
    Note over Main: thread::scope entered

    par Parallel threads
        Main->>Exec: "spawn merkleizer=None on BAL path"
        Main->>Merkle: "spawn prepared=optimistic_updates"
        Main->>Warm: spawn
    end

    Note over Merkle: merkle_start_instant captured
    Merkle->>Merkle: par_iter pre-warm open_state_trie per account
    Merkle->>Merkle: Stage B parallel storage root computation
    Merkle->>Merkle: Stage C per-shard state trie updates EIP-161
    Merkle->>Merkle: Stage D finalize root state_trie_hash
    Note over Merkle: merkle_end_instant captured

    Exec->>Exec: execute_block_parallel skip bal_to_account_updates
    Exec->>Exec: validate_block_access_list_hash
    Note over Exec: exec_end_instant captured

    Main->>Main: accumulated_updates optimistic_witness OR streaming_witness
    Main->>Main: verify state_trie_hash matches block header
    Note over Main: exec_merkle_end_instant captured
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
crates/common/types/block_access_list.rs:1600-1608
The `address` field duplicates the map key in `FxHashMap<Address, BalSynthesisItem>`. In `handle_merkleization_bal_from_updates` the key `addr` is always used instead of `item.address`, and in `optimistic_witness` `*addr` is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.

```suggestion
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
    pub balance: Option<U256>,
    pub nonce: Option<u64>,
    pub code_hash: Option<H256>,
    pub code: Option<Code>,
    pub added_storage: FxHashMap<H256, U256>,
}
```

### Issue 2 of 3
crates/blockchain/blockchain.rs:319-326
On the BAL synthesis path `removed` is hardcoded to `false` and `handle_merkleization_bal_from_updates` never sets it. The `if item.removed { account_state = AccountState::default(); }` branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.

```suggestion
struct BalStateWorkItem {
    hashed_address: H256,
    nonce: Option<u64>,
    balance: Option<U256>,
    code_hash: Option<H256>,
    /// Pre-computed storage root from Stage B, or None to keep existing.
    storage_root: Option<H256>,
```

### Issue 3 of 3
crates/blockchain/blockchain.rs:921-932
**Pre-warm opens one `state_trie` handle per account**

The rayon closure calls `open_state_trie(parent_state_root)` for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If `open_state_trie` acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that `open_state_trie` is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.

Reviews (1): Last reviewed commit: "perf(l1): BAL optimistic merkleization o..." | Re-trigger Greptile

Comment on lines +1600 to +1608
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub address: Address,
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The address field duplicates the map key in FxHashMap<Address, BalSynthesisItem>. In handle_merkleization_bal_from_updates the key addr is always used instead of item.address, and in optimistic_witness *addr is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.

Suggested change
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub address: Address,
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
pub balance: Option<U256>,
pub nonce: Option<u64>,
pub code_hash: Option<H256>,
pub code: Option<Code>,
pub added_storage: FxHashMap<H256, U256>,
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/types/block_access_list.rs
Line: 1600-1608

Comment:
The `address` field duplicates the map key in `FxHashMap<Address, BalSynthesisItem>`. In `handle_merkleization_bal_from_updates` the key `addr` is always used instead of `item.address`, and in `optimistic_witness` `*addr` is used too — so this field is written but never read. Removing it saves 20 bytes per account in the synthesized map.

```suggestion
#[derive(Debug, Clone, Default)]
pub struct BalSynthesisItem {
    pub balance: Option<U256>,
    pub nonce: Option<u64>,
    pub code_hash: Option<H256>,
    pub code: Option<Code>,
    pub added_storage: FxHashMap<H256, U256>,
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 319 to 326
struct BalStateWorkItem {
hashed_address: H256,
info: Option<AccountInfo>,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
removed: bool,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 On the BAL synthesis path removed is hardcoded to false and handle_merkleization_bal_from_updates never sets it. The if item.removed { account_state = AccountState::default(); } branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.

Suggested change
struct BalStateWorkItem {
hashed_address: H256,
info: Option<AccountInfo>,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
removed: bool,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
struct BalStateWorkItem {
hashed_address: H256,
nonce: Option<u64>,
balance: Option<U256>,
code_hash: Option<H256>,
/// Pre-computed storage root from Stage B, or None to keep existing.
storage_root: Option<H256>,
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 319-326

Comment:
On the BAL synthesis path `removed` is hardcoded to `false` and `handle_merkleization_bal_from_updates` never sets it. The `if item.removed { account_state = AccountState::default(); }` branch in the Stage C worker is therefore unreachable on this path. Dropping the field from the struct would make this invariant visible and avoid confusion for future readers.

```suggestion
struct BalStateWorkItem {
    hashed_address: H256,
    nonce: Option<u64>,
    balance: Option<U256>,
    code_hash: Option<H256>,
    /// Pre-computed storage root from Stage B, or None to keep existing.
    storage_root: Option<H256>,
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread crates/blockchain/blockchain.rs Outdated
Comment on lines 921 to 932
// Warm parent state-trie pages for all touched accounts in parallel before
// Stage B / Stage C race for them. This replaces the prefetch that the old
// streaming path got for free via `bal_to_account_updates`.
accounts
.par_iter()
.try_for_each(|(hashed_address, _)| -> Result<(), StoreError> {
let state_trie = self.storage.open_state_trie(parent_state_root)?;
let _ = state_trie.get(hashed_address.as_bytes())?;
Ok(())
})?;

// === Stage B: Parallel per-account storage root computation ===
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Pre-warm opens one state_trie handle per account

The rayon closure calls open_state_trie(parent_state_root) for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If open_state_trie acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that open_state_trie is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 921-932

Comment:
**Pre-warm opens one `state_trie` handle per account**

The rayon closure calls `open_state_trie(parent_state_root)` for each of the N accounts, creating N independent trie handles before Stage B opens another 16 (one per worker). If `open_state_trie` acquires any internal lock, allocates a per-handle arena, or does RocksDB snapshot work, N parallel opens could serialise or spike allocation far beyond what Stage B already requires. Worth verifying that `open_state_trie` is a lightweight read-view wrapper, or considering a single trie opened per rayon thread rather than per item.

How can I resolve this? If you propose a fix, please make it concise.

Base automatically changed from bal-devnet-7-pr to main May 18, 2026 10:17
@edg-l edg-l requested a review from a team as a code owner May 18, 2026 10:17
@edg-l edg-l force-pushed the perf/bal-optimistic-merkleization branch from 10b1aa7 to 61a510f Compare May 18, 2026 11:34
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three inline findings + two body observations, all minor. Not blocking.

  • validate_block_access_list_hash is the only correctness net for the optimistic merkle output, and it's gated on produced_bal.is_some() at blockchain.rs:569-576. On the BAL path this should always fire (Amsterdam+ produces a BAL), but the correctness story is now load-bearing on a runtime invariant rather than a static one. Worth a debug_assert!(produced_bal.is_some(), "BAL validation path must produce a BAL") on the optimistic branch to surface a future regression early.

  • execute_block_pipeline's if let Some(bal) = header_bal does NOT check is_amsterdam (levm/mod.rs:391, pre-existing). The debug_assert!(is_amsterdam, …) inside execute_block_parallel is a no-op in release. This PR adds another caller (the new optimistic merkle path) that depends on parallel execution running only on Amsterdam+. If add_block_pipeline is ever called with Some(bal) on a pre-Amsterdam block in release, the parallel path runs incorrectly AND the synthesized merkle is consumed unchecked. Recommend gating the outer if let Some(bal) on is_amsterdam && header_bal.is_some(), or upgrading the inner debug_assert to a real EvmError. Practical risk is low (header BAL is feature-gated input to engine_newPayload), but the cost-to-fix is tiny.

Comment thread crates/blockchain/blockchain.rs Outdated
Comment thread crates/common/types/block_access_list.rs Outdated
Comment thread crates/vm/backends/levm/mod.rs Outdated
@edg-l edg-l force-pushed the perf/bal-optimistic-merkleization branch from 61a510f to e7aaec8 Compare May 19, 2026 07:14
@edg-l
Copy link
Copy Markdown
Contributor Author

edg-l commented May 19, 2026

Pushed fixes for review comments in e7aaec8:

  • dropped unused BalSynthesisItem.address
  • dropped dead BalStateWorkItem.removed branch
  • pre-warm now chunks accounts across workers (one trie open per worker, matching Stage B/C) instead of one open per account
  • removed the conflicting debug_assert! + .expect() on empty slot_changes; defensive continue stays, test updated
  • replaced the sequential-path merkleizer.expect(...) with a typed EvmError
  • gated execute_block_pipeline's header_bal branch on is_amsterdam
  • when the BAL validation path produces no BAL, return a ChainError instead of skipping the hash check

@edg-l edg-l requested a review from ElFantasma May 19, 2026 07:20
@edg-l edg-l changed the title perf(l1): bal optimistic merkleization on validation path perf(l1): bal optimistic merkleization + state prefetch on validation path May 19, 2026
@edg-l edg-l changed the title perf(l1): bal optimistic merkleization + state prefetch on validation path perf(l1): bal optimistic merkleization May 19, 2026
@edg-l edg-l changed the title perf(l1): bal optimistic merkleization perf(l1): bal optimistic merkleization + state prefetch on validation path May 19, 2026
@edg-l edg-l changed the title perf(l1): bal optimistic merkleization + state prefetch on validation path perf(l1): bal optimistic merkleization May 19, 2026
@edg-l edg-l changed the title perf(l1): bal optimistic merkleization perf(l1): bal optimistic merkleization (or post-state root optimization) May 19, 2026
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up after the fix series (e7aaec8aa5f7ad). All 3 inline findings from the previous review landed cleanly, and the produced_bal-gate body suggestion was correctly identified as wrong and reverted in 79cf5327c (the Amsterdam BAL validation path intentionally returns None).

On a fresh-eyes pass over the post-fix code, three small drift items remain — all minor, none blocking, mostly doc-vs-code cleanup that's the natural consequence of an iterative fix series.

Comment thread crates/blockchain/blockchain.rs
Comment thread CHANGELOG.md
Comment thread crates/blockchain/blockchain.rs
Edgar and others added 6 commits May 22, 2026 15:10
Decouple merkleization from EVM execution when the validation path
receives a BAL: synthesize per-field deltas from the input
BlockAccessList pre-execution and run merkle stages B/C/D in parallel
with execution + warming. validate_block_access_list_hash remains the
post-execution correctness gate.

Closes #6584.
- drop unused BalSynthesisItem.address (duplicate of map key)
- drop unreachable BalStateWorkItem.removed branch on synthesis path
- chunk parent state-trie pre-warm: one trie open per worker (mirrors Stage B/C) instead of one per account
- drop conflicting debug_assert + expect on empty slot_changes; keep defensive skip and update test
- replace merkleizer .expect panic with a typed EvmError on the sequential path
- gate execute_block_pipeline's header_bal branch on is_amsterdam
- hard-error (no debug_assert) when bal validation path produces no BAL to hash-check
The Amsterdam BAL validation path (execute_block_pipeline header_bal
branch) intentionally returns produced_bal=None: it uses the header BAL
directly to drive execution and does not rebuild a BAL to hash-compare.
BAL correctness on that path is enforced inside execute_block_pipeline
(header-BAL index/size/withdrawal-index checks plus
unread_storage_reads / unaccessed_pure_accounts gates).

The gate added in the review pass conflicted with that design and made
every Amsterdam block routed through add_block_pipeline fail with
"BAL validation path produced no BAL; cannot validate BAL hash" (caught
by ef-tests Two-pass pass-2). The hash check just below the removed
block already correctly no-ops when produced_bal is None.
handle_merkleization_bal_from_updates was opening the parent state trie
and walking each touched account's path. That work is identical to what
warm_block_from_bal's Phase 1 (prefetch_accounts) already does on the
warmer thread (open_state_trie + state_trie.get(hashed_address) per
address). The two paths run concurrently and share the global rayon
thread pool, so the duplicate was both wasted I/O and a source of
worker contention.

Removing it shortens the merkleizer thread by ~0.2 ms/block on the bal
fixture (-11% on merkle's concurrent slice) and lets the warmer get
its workers back. Wall time unchanged on this fixture (exec is the
gate), but the merkleizer now has clean headroom for its real work
(Stage B/C/D).
Left over from the merkleizer-side state prefetch removal in
5052c58 (par_chunks call site went away).
@edg-l edg-l force-pushed the perf/bal-optimistic-merkleization branch from 61fd8fc to 30778a1 Compare May 22, 2026 13:12
@edg-l edg-l enabled auto-merge May 22, 2026 13:22
@edg-l edg-l added this pull request to the merge queue May 22, 2026
Merged via the queue into main with commit b0fcc77 May 22, 2026
73 of 75 checks passed
@edg-l edg-l deleted the perf/bal-optimistic-merkleization branch May 22, 2026 14:44
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 May 22, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in ethrex_performance May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

BAL Optimistic merkleization

4 participants