Skip to content

feat(l1): reorg re-injection for mempool transactions and blob sidecars#6631

Open
ilitteri wants to merge 8 commits into
mainfrom
feat/mempool-reorg-reinjection
Open

feat(l1): reorg re-injection for mempool transactions and blob sidecars#6631
ilitteri wants to merge 8 commits into
mainfrom
feat/mempool-reorg-reinjection

Conversation

@ilitteri
Copy link
Copy Markdown
Collaborator

@ilitteri ilitteri commented May 12, 2026

Motivation

When the engine API's forkchoiceUpdated moves the canonical head to a different chain branch, transactions in the orphaned blocks (but not in the new canonical chain) currently disappear from the mempool. They're still valid on the new chain but the node loses them. Every other major EL client re-injects these. This is the biggest remaining correctness gap in ethrex's mempool.

Description

Reorg detection lives in the engine API FCU handler (crates/networking/rpc/engine/fork_choice.rs). The previous canonical head + finalized hash are snapshotted before apply_fork_choice. If the new head's parent is NOT the previous head, the FCU is a genuine reorg.

  • Fast-path: when new_head.parent_hash == previous_head_hash, the FCU is a normal slot advance, not a reorg. Skip the re-injection spawn (and its ~3 header-by-hash DB reads) entirely. Genuine reorgs fall through.
  • Spawned, not awaited: re-injection is tokio::spawn-ed off the FCU response path so the engine API ack returns immediately. A near-cap 64-block reorg with ~10K orphaned txs would otherwise add ~1+ second of ECDSA recovery + KZG re-verification to the FCU response (under the 8s spec timeout but tight enough to destabilize CL timing budgets).

Re-injection via Blockchain::reinject_orphaned_transactions: for each orphaned tx not present in the new-canonical set, call the regular admission path. Failures are debug-logged and best-effort. Honors a --mempool.reorg-depth cap (default 64); deeper reorgs are skipped with a warning. Honors mempool occupancy: a deep reorg won't evict freshly-arrived txs to make room for orphaned ones.

Blob sidecar limbo:

  • On block inclusion, sidecars are moved (not cloned) from blobs_bundle_pool into blobs_bundle_limbo. The versioned-hash index cleanup runs separately so the bundle never leaves the write lock without a stable index state.
  • On reorg, orphaned blob txs pull their sidecars from limbo. If add_blob_transaction_to_pool fails for a transient reason (pool full, etc.), the sidecar is re-inserted into limbo via Mempool::insert_blob_limbo_entry rather than dropped permanently — a later re-injection attempt can still recover it.
  • On FCU finalized_block_hash advancement, limbo entries are purged by canonical-block-number range walk (no reorg_depth cap). The cap was a stale defense against malicious inputs, but the input here is a CL-supplied finalized hash, already canonical. The previous cap leaked sidecars whenever a finalization gap exceeded it (fresh-node first finalization, CL downtime catch-up).
  • CLI: --mempool.reorg-depth (default 64).

Behavioral change

  • Txs orphaned by reorgs of depth ≤ reorg_depth are re-injected with full admission re-validation. Deeper reorgs skip re-injection (logged warning).
  • Blob sidecars are retained between inclusion and finalization so blob-tx re-injection works through reorgs.
  • The FCU response time is unaffected by re-injection cost (spawned).

Deferred

End-to-end test driving engine_forkchoiceUpdated through a real multi-branch reorg. The load-bearing logic (common-ancestor, orphan subtraction, depth cap, blob-limbo lifecycle, capacity gate, best-effort failure handling, idempotence) is covered by unit tests.

ilitteri added 3 commits May 12, 2026 13:00
Adds the blockchain-crate building blocks needed to re-inject orphaned-block
transactions into the mempool after a reorg:

- find_common_ancestor walks two heads back to their lowest common
  ancestor and returns the orphaned and new-canonical branches with the
  reorg depth.
- collect_orphaned_transactions returns the set of transactions present
  in orphaned blocks but absent from the new canonical branch, preserving
  intra-block order.
- Blockchain::reinject_orphaned_transactions runs the full admission
  path (add_transaction_to_pool / add_blob_transaction_to_pool) for
  each orphaned tx, logging and skipping failures rather than propagating
  them. Reorgs deeper than the configured cap are skipped with a warning.
- Blockchain::purge_finalized_blob_limbo clears limbo entries for
  blocks that have just been finalized.

Blob sidecars for EIP-4844 transactions are now retained after block
inclusion in a blobs_bundle_limbo map on MempoolInner. The existing
remove_block_transactions_from_pool path now moves sidecars into limbo
instead of dropping them, so they remain available for re-injection
until finalization purges the corresponding block.

A new DEFAULT_MEMPOOL_REORG_DEPTH = 64 constant and matching
BlockchainOptions::reorg_depth field expose the depth cap.
Hooks the new Blockchain helpers into the engine API forkchoiceUpdated
flow. The handler now:

- Snapshots the previous canonical head and previously-finalized hash
  before calling apply_fork_choice.
- After a successful FCU, if the new head differs from the previous head
  it calls Blockchain::reinject_orphaned_transactions to put orphaned
  transactions back in the mempool (with full admission re-validation).
- When finalization advances, it calls
  Blockchain::purge_finalized_blob_limbo to release blob sidecars for
  the now-finalized blocks (they can no longer be reorged).

Re-injection failures and limbo-purge failures are logged at debug level
and never propagated — both are best-effort book-keeping that must not
break fork-choice processing.
Exposes the reorg-depth cap as a CLI option:

- New --mempool.reorg-depth flag (env ETHREX_MEMPOOL_REORG_DEPTH,
  default DEFAULT_MEMPOOL_REORG_DEPTH = 64) wired through both L1 and
  L2 initializers.
- Documented in docs/CLI.md alongside --mempool.maxsize.

Adds unit tests covering:

- common-ancestor walk with same-height branches, unequal-height
  branches, and the degenerate same-block case;
- orphaned-tx subtraction: a transaction that appears in both the
  orphaned and new-canonical branches is not re-injected;
- depth cap: a reorg deeper than the configured cap is skipped;
- best-effort behaviour: an EIP-4844 tx with no sidecar in the limbo is
  silently skipped, without erroring out the wider re-injection call;
- blob limbo lifecycle: included blob sidecars move into limbo,
  purge_blob_limbo_entries drops them, take_blob_limbo_entry returns
  them for re-injection, and purging unrelated hashes is a no-op.

An end-to-end test driven by a real engine_forkchoiceUpdated round trip
would require multi-branch block execution fixtures and is deferred for
follow-up work.
Copilot AI review requested due to automatic review settings May 12, 2026 16:01
@ilitteri ilitteri requested review from a team, ManuelBilbao and avilagaston9 as code owners May 12, 2026 16:01
@github-actions github-actions Bot added the L1 Ethereum client label May 12, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

This PR implements mempool transaction re-injection after reorgs with proper blob sidecar retention (limbo). The implementation is generally correct and secure, with appropriate bounds on reorg depth.

Issues Found:

1. Performance: Unnecessary clone of blob bundles in mempool

  • File: crates/blockchain/mempool.rs
  • Line: 251
  • Issue: inner.blobs_bundle_pool.get(hash).cloned() performs a deep clone of BlobsBundle (up to ~768KB for max blobs). This happens for every blob transaction included in a block.
  • Fix: Use remove to take ownership without cloning:
    // Before:
    if let Some(bundle) = inner.blobs_bundle_pool.get(hash).cloned() {
        inner.blobs_bundle_limbo.insert(*hash, bundle);
    }
    
    // After:
    if let Some(bundle) = inner.blobs_bundle_pool.remove(hash) {
        inner.blobs_bundle_limbo.insert(*hash, bundle);
    }

2. Test compilation without c-kzg feature

  • File: test/tests/blockchain/reorg_reinjection_tests.rs
  • Issue: The test file imports EIP4844Transaction, BlobsBundle, and BYTES_PER_BLOB unconditionally. If the c-kzg feature is disabled, these types may not be available.
  • Fix: Gate the blob-specific tests with #[cfg(feature = "c-kzg")], or ensure the test module is only compiled when the feature is enabled.

3. Missing metric for re-injected transactions

  • File: crates/networking/rpc/engine/fork_choice.rs
  • Line: 338-348
  • Suggestion: Consider adding a Prometheus metric (e.g., reorg_reinjected_transactions) to track how many transactions are re-injected. This aids in monitoring deep reorgs.

Positive observations:

  • The reorg depth cap (default 64) prevents DoS via unbounded chain walks.
  • Blob sidecars are properly retained in limbo with bounded memory usage (max ~

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context to write a thorough review.


PR Review: feat(l1): reorg re-injection for mempool transactions and blob sidecars

Overview

This PR fills a real correctness gap: transactions from orphaned blocks now re-enter the mempool after a reorg rather than being silently lost. The implementation is well-structured, the blob-limbo design is creative, and the unit tests cover the key code paths. The overall approach mirrors what geth/reth do. A few issues need attention before merge.


Critical — Depth Cap Applied After Unbounded Walk

crates/blockchain/fork_choice.rs, find_common_ancestor

The depth cap in reinject_orphaned_transactions (blockchain.rs line ~173) is checked only after find_common_ancestor returns. However, find_common_ancestor has no depth limit of its own: for a 1 000-block reorg (or two hashes with no common ancestor at all), the function will make thousands of sequential DB lookups before the cap ever fires.

This is exploitable via a forged/malicious FCU call: an attacker can supply two head hashes that share a very deep (or non-existent) ancestor, forcing O(N) storage queries on the FCU hot path before any guard kicks in.

Fix: thread max_depth into find_common_ancestor and return Ok(None) early when orphaned.len() (or steps walked) exceeds the cap.

// suggested signature change
pub async fn find_common_ancestor(
    store: &Store,
    previous_head_hash: BlockHash,
    new_head_hash: BlockHash,
    max_depth: u64,          // new
) -> Result<Option<ReorgBranches>, StoreError>

Bug — Off-by-One in purge_finalized_blob_limbo

crates/blockchain/blockchain.rs, line ~265

let max_walk = self.options.reorg_depth;   // e.g. 64
let mut steps = 0u64;
while current_hash != ... {
    if steps > max_walk {     // fires at steps == 65, not 64
        break;
    }
    // ... process block
    steps += 1;
}

The guard steps > max_walk allows 65 iterations when max_walk == 64. The intent is clearly >= max_walk (or equivalently == max_walk before incrementing). In practice this means one extra block is walked on every purge that hits the cap, and — more importantly — if the finality gap ever exceeds max_walk, the tail of the range goes unpurged, causing the limbo to grow.


Notable — Blob Sidecar Consumed Before Admission Attempt

crates/blockchain/blockchain.rs, line ~199

let Some(blobs_bundle) = self.mempool.take_blob_limbo_entry(&tx_hash)? else {
    continue;
};
match self.add_blob_transaction_to_pool(inner, blobs_bundle).await {
    Ok(_) => reinjected += 1,
    Err(error) => { debug!(...); }   // sidecar is gone
}

take_blob_limbo_entry is destructive: once the sidecar is popped from the limbo, a subsequent failure in add_blob_transaction_to_pool permanently loses it. For transient failures (e.g., mempool temporarily full), this silently drops a valid sidecar with no recovery path.

Consider cloning or using a peek before committing to take, and only removing from limbo on success:

if let Some(bundle) = self.mempool.peek_blob_limbo_entry(&tx_hash)? {
    if self.add_blob_transaction_to_pool(inner, bundle).await.is_ok() {
        self.mempool.take_blob_limbo_entry(&tx_hash)?;
        reinjected += 1;
    }
}

Notable — blobs_bundle_limbo Has No Size Cap

crates/blockchain/mempool.rs, MempoolInner

The limbo has no upper-bound on entries. On a network where finality stalls (post-merge outage, etc.), every included block's blob sidecars accumulate indefinitely. At 6 blobs × 128 KB per block, 100 unfinalized blocks = ~75 MB in the limbo alone. Consider either:

  • A hard cap (e.g., reorg_depth × max_blobs_per_block) with LRU/FIFO eviction, or
  • At minimum a metric or warning when the limbo exceeds a threshold.

Notable — previous_finalized_hash Uses Two Sequential Async Calls

crates/networking/rpc/engine/fork_choice.rs, line ~558

let previous_finalized_hash = match context.storage.get_finalized_block_number().await? {
    Some(number) => context.storage.get_canonical_block_hash(number).await?.unwrap_or_default(),
    None => Default::default(),
};

Two round-trips when one get_finalized_block_hash() API call would suffice. This is minor but sits on the FCU critical path.


Minor — Ignored bool Return Value

crates/blockchain/blockchain.rs, remove_block_transactions_from_pool

for tx in &block.body.transactions {
    self.mempool.remove_included_transaction(&tx.hash())?;
}

remove_included_transaction returns Result<bool, StoreError> (was the tx present?), but the caller discards the bool. Either use the value to log "included tx not found in mempool" (useful for diagnostics) or change the return type to Result<(), StoreError> if the information is never needed.


Minor — Test Assertion Too Weak

test/tests/blockchain/reorg_reinjection_tests.rs, line ~1001

// The blob tx must be skipped (no sidecar). The plain tx may also fail
// because the sender has no balance, but the entire call must not error.
// We only assert no panic / no propagated error here.
let _ = reinjected;

The test documents the expected behavior in a comment but doesn't assert it. The plain EIP-1559 tx's add_transaction_to_pool is expected to fail (no balance), so reinjected should be 0. Asserting assert_eq!(reinjected, 0) would make the test self-documenting and catch regressions.


Minor — common_ancestor_handles_unequal_height_branches Comment vs. Test

test/tests/blockchain/reorg_reinjection_tests.rs, line ~839

// Old chain: genesis -> A1 -> A2.
// New chain: genesis -> A1 -> B2 -> B3.
// Shared prefix is A1, depth is 1 (only A2 is orphaned).

The test asserts branches.new_canonical.len() == 2 (B2 and B3), which is correct, but the assertion for orphaned.len() == 1 could be paired with an explicit check that branches.orphaned[0].1 == chain_a[1].hash() (i.e., A2), the same way the equal-height test verifies branch order. This would give the test the same thoroughness as its sibling.


Positive Notes

  • The ReorgBranches struct with depth() convenience method is clean and clear.
  • The fast-path check (prev_hash == new_hash before lockstep walk, depth == 0 check before cap check) avoids wasted work on the common case.
  • Feature-gating the EIP-4844 branch on c-kzg while still handling the transaction type in the #[cfg(not(feature = "c-kzg"))] arm is correct.
  • Moving sidecars to limbo atomically under the write lock (no observable intermediate state) is correct.
  • Test infrastructure (fresh_store, insert_chain, insert_block) is reusable and avoids boilerplate duplication across the nine tests.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Lines of code report

Total lines added: 352
Total lines removed: 0
Total lines changed: 352

Detailed view
+----------------------------------------------------+-------+------+
| File                                               | Lines | Diff |
+----------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/cli.rs                           | 1009  | +12  |
+----------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/initializers.rs                  | 654   | +1   |
+----------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/initializers.rs               | 378   | +1   |
+----------------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs             | 2629  | +139 |
+----------------------------------------------------+-------+------+
| ethrex/crates/blockchain/fork_choice.rs            | 257   | +109 |
+----------------------------------------------------+-------+------+
| ethrex/crates/blockchain/mempool.rs                | 494   | +41  |
+----------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/engine/fork_choice.rs | 601   | +49  |
+----------------------------------------------------+-------+------+

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds reorg-aware mempool lifecycle handling for L1 by re-injecting transactions from orphaned blocks back into the mempool, and by retaining EIP-4844 blob sidecars across block inclusion until finalization so blob transactions can also be re-injected after reorgs.

Changes:

  • Detect reorgs during Engine API forkchoiceUpdated handling and trigger best-effort re-injection of orphaned-block transactions.
  • Add a blob-sidecar “limbo” in the mempool to retain sidecars between inclusion and finalization; add purge logic on finalized advancement.
  • Introduce --mempool.reorg-depth (default 64) and add unit tests covering common-ancestor walking, orphaned tx subtraction, depth cap, and blob-limbo lifecycle.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/tests/blockchain/reorg_reinjection_tests.rs Adds unit tests for common ancestor detection, orphaned tx collection, re-injection behavior, and blob-limbo lifecycle.
test/tests/blockchain/mod.rs Registers the new reorg reinjection test module.
docs/CLI.md Documents the new --mempool.reorg-depth CLI option.
crates/networking/rpc/engine/fork_choice.rs Snapshots previous head/finalized state, removes included txs into limbo, triggers re-injection, and purges finalized limbo entries.
crates/blockchain/mempool.rs Adds blobs_bundle_limbo and APIs to move/take/purge limbo sidecars and to remove included transactions.
crates/blockchain/fork_choice.rs Adds common-ancestor reorg branch computation and orphaned-transaction collection logic.
crates/blockchain/blockchain.rs Adds reorg depth option/default, re-injection implementation, finalized-limbo purge implementation, and switches inclusion removal to the new mempool path.
cmd/ethrex/l2/initializers.rs Threads the new reorg-depth option into BlockchainOptions for L2 initialization.
cmd/ethrex/initializers.rs Threads the new reorg-depth option into BlockchainOptions for L1 initialization.
cmd/ethrex/cli.rs Adds --mempool.reorg-depth / ETHREX_MEMPOOL_REORG_DEPTH plumbing and defaults.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +246 to +254
pub fn remove_included_transaction(&self, hash: &H256) -> Result<bool, StoreError> {
let mut inner = self.write()?;
let was_present = inner.transaction_pool.contains_key(hash);
// First retain the bundle in limbo (only if the tx is a blob tx whose
// sidecar we already have). We do this before removing the transaction
// because `remove_transaction_with_lock` also drops the bundle.
if let Some(bundle) = inner.blobs_bundle_pool.get(hash).cloned() {
inner.blobs_bundle_limbo.insert(*hash, bundle);
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +2466 to +2483
Transaction::EIP4844Transaction(inner) => {
let Some(blobs_bundle) = self.mempool.take_blob_limbo_entry(&tx_hash)? else {
debug!(
tx_hash = %tx_hash,
"Blob sidecar missing from limbo; cannot re-inject orphaned EIP-4844 tx",
);
continue;
};
match self.add_blob_transaction_to_pool(inner, blobs_bundle).await {
Ok(_) => reinjected += 1,
Err(error) => {
debug!(
tx_hash = %tx_hash,
%error,
"Failed to re-inject orphaned EIP-4844 transaction",
);
}
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/blockchain/blockchain.rs Outdated
Comment on lines +2524 to +2540
// Walk from new_finalized back to previous_finalized, collecting tx
// hashes for every block we pass through. We bound the walk by the
// configured reorg depth so a misconfigured / malicious previous_hash
// cannot cause unbounded work.
let max_walk = self.options.reorg_depth;
let mut current_hash = new_finalized_hash;
let mut steps = 0u64;
let mut hashes_to_purge: Vec<H256> = Vec::new();
while current_hash != previous_finalized_hash && !current_hash.is_zero() {
if steps > max_walk {
debug!(
steps,
cap = max_walk,
"Blob limbo purge walked past the reorg-depth cap; stopping",
);
break;
}
@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. crates/blockchain/blockchain.rs: purge_finalized_blob_limbo incorrectly caps the finalized-range walk by reorg_depth. Finality can advance by much more than 64 blocks after downtime or a delayed FCU; when that happens, the loop stops early and the older limbo entries are never visited again, so those blob sidecars leak permanently. This bound makes sense for reorg reinjection, not for monotonic finalized-chain cleanup.

  2. crates/networking/rpc/engine/fork_choice.rs only removes transactions from the new head block, but crates/blockchain/fork_choice.rs applies an entire branch as canonical. On any FCU that jumps multiple blocks forward or reorgs onto a longer side branch, transactions included in intermediate newly-canonical blocks stay in the active mempool, and blob sidecars for those txs never move to limbo. That leaves the mempool state inconsistent with the canonical chain.

  3. crates/blockchain/mempool.rs clones the full BlobsBundle before moving it to limbo. Since crates/common/types/blobs_bundle.rs owns the full blob payload bytes, this adds a large copy/allocation on the FCU hot path while the mempool write lock is held. It should move the bundle out of blobs_bundle_pool instead of cloning it.

Notes

  • The helper tests in test/tests/blockchain/reorg_reinjection_tests.rs are useful, but they do not cover handle_forkchoice, so the two integration cases above currently have no regression coverage.
  • I could not run cargo test here because rustup tried to write under read-only /home/runner/.rustup/tmp.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR closes a major mempool correctness gap by re-injecting orphaned transactions back into the mempool when forkchoiceUpdated triggers a chain reorganization. Blob sidecars for EIP-4844 transactions are held in a new in-memory limbo map between block inclusion and finalization, enabling them to survive a reorg.

  • Reorg detection (blockchain.rs / fork_choice.rs): find_common_ancestor walks both chain branches to find the common ancestor, then collect_orphaned_transactions subtracts transactions already present in the new canonical branch before re-injection with full admission validation.
  • Blob limbo (mempool.rs): blobs_bundle_limbo retains sidecars from included blocks until finalization; entries are purged synchronously via purge_finalized_blob_limbo when finalized_block_hash advances.
  • Configurable depth cap (--mempool.reorg-depth, default 64): reorgs deeper than the cap skip re-injection with a warning; 9 unit tests cover the common-ancestor walk, orphaned-tx subtraction, cap enforcement, and limbo lifecycle.

Confidence Score: 4/5

Safe to merge with minor fixes; the core reorg logic is correct and well-tested, but a few small issues in the depth-cap enforcement and FCU hot path are worth addressing.

The algorithm for finding the common ancestor and collecting orphaned transactions is correct, the blob limbo lifecycle is consistent, and 9 unit tests cover the key code paths. Three non-critical issues reduce confidence: find_common_ancestor performs the full branch walk before the depth cap is checked; the purge_finalized_blob_limbo walk guard uses > instead of >=, processing one extra block per purge; and the reorg detection condition fires on every normal FCU call, adding a few unnecessary DB reads per slot.

crates/blockchain/blockchain.rs (purge off-by-one, blob sidecar consumed before re-injection succeeds) and crates/blockchain/fork_choice.rs (unbounded ancestor walk relative to depth cap).

Important Files Changed

Filename Overview
crates/blockchain/blockchain.rs Adds reinject_orphaned_transactions and purge_finalized_blob_limbo; off-by-one in the purge walk's depth cap guard (> should be >=), and blob sidecars are consumed from limbo before re-injection success is confirmed.
crates/blockchain/fork_choice.rs Adds find_common_ancestor and collect_orphaned_transactions; the ancestor walk is unbounded by the depth cap — deep reorgs perform the full traversal before the caller's cap check discards the result.
crates/blockchain/mempool.rs Adds blobs_bundle_limbo map and its three accessors; logic is sound and the limbo initializes correctly via Default.
crates/networking/rpc/engine/fork_choice.rs Wires reorg re-injection and finalized limbo purge into the FCU handler; the reorg condition fires on every normal chain extension, incurring extra DB reads per slot even when no reorg occurred.
test/tests/blockchain/reorg_reinjection_tests.rs Nine new unit tests covering common-ancestor walk, orphaned-tx subtraction, depth-cap skip, blob-limbo lifecycle, and noop purge.
cmd/ethrex/cli.rs Adds --mempool.reorg-depth CLI flag with default wired to DEFAULT_MEMPOOL_REORG_DEPTH; straightforward plumbing with no issues.

Sequence Diagram

sequenceDiagram
    participant CL as Consensus Layer
    participant FCU as FCU Handler
    participant BC as Blockchain
    participant MP as Mempool
    participant ST as Store

    CL->>FCU: forkchoiceUpdated(head, finalized)
    FCU->>ST: get_latest_canonical_block_hash()
    FCU->>ST: get_finalized_block_number()
    FCU->>ST: apply_fork_choice(head)
    ST-->>FCU: new_head
    FCU->>BC: remove_block_transactions_from_pool(block)
    BC->>MP: remove_included_transaction blob sidecars to limbo

    alt "prev_head != new_head"
        FCU->>BC: reinject_orphaned_transactions
        BC->>ST: find_common_ancestor
        alt "depth == 0"
            BC-->>FCU: Ok(0)
        else "depth > cap"
            BC-->>FCU: Ok(0) skip
        else reorg
            BC->>ST: collect_orphaned_transactions
            BC->>MP: take_blob_limbo_entry per blob tx
            BC->>BC: add_transaction_to_pool per tx
            BC-->>FCU: Ok(count)
        end
    end

    alt "new_finalized != prev_finalized"
        FCU->>BC: purge_finalized_blob_limbo
        BC->>MP: purge_blob_limbo_entries
    end

    FCU-->>CL: PayloadStatus valid
Loading

Comments Outside Diff (2)

  1. crates/blockchain/fork_choice.rs, line 347-428 (link)

    P2 Unbounded ancestor walk before the depth cap is checked

    find_common_ancestor traverses the full reorg on both branches before returning. The depth cap in reinject_orphaned_transactions is checked after this function returns, so for a reorg deeper than the configured cap the code completes the entire walk (potentially hundreds of get_block_header_by_hash DB reads on each side) and then discards the result immediately. The depth cap should either be passed into this function as a bail-out argument, or the walk should be bounded by reorg_depth during traversal to avoid unnecessary I/O on the FCU hot path.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/blockchain/fork_choice.rs
    Line: 347-428
    
    Comment:
    **Unbounded ancestor walk before the depth cap is checked**
    
    `find_common_ancestor` traverses the full reorg on both branches before returning. The depth cap in `reinject_orphaned_transactions` is checked _after_ this function returns, so for a reorg deeper than the configured cap the code completes the entire walk (potentially hundreds of `get_block_header_by_hash` DB reads on each side) and then discards the result immediately. The depth cap should either be passed into this function as a bail-out argument, or the walk should be bounded by `reorg_depth` during traversal to avoid unnecessary I/O on the FCU hot path.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. crates/blockchain/blockchain.rs, line 197-215 (link)

    P2 Blob sidecar permanently lost on failed re-injection

    take_blob_limbo_entry removes the sidecar from limbo before add_blob_transaction_to_pool is called. If admission fails for any reason (e.g., sender balance changed, or a transient store error), the sidecar is gone from limbo with no way to recover it. In a multi-step reorg scenario where the same blob tx is orphaned again by a subsequent FCU, the sidecar will be unavailable for the second re-injection attempt. Consider peeking first and only removing on success, or re-inserting the sidecar back into limbo on a failed admission.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/blockchain/blockchain.rs
    Line: 197-215
    
    Comment:
    **Blob sidecar permanently lost on failed re-injection**
    
    `take_blob_limbo_entry` removes the sidecar from limbo _before_ `add_blob_transaction_to_pool` is called. If admission fails for any reason (e.g., sender balance changed, or a transient store error), the sidecar is gone from limbo with no way to recover it. In a multi-step reorg scenario where the same blob tx is orphaned again by a subsequent FCU, the sidecar will be unavailable for the second re-injection attempt. Consider peeking first and only removing on success, or re-inserting the sidecar back into limbo on a failed admission.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
crates/blockchain/fork_choice.rs:347-428
**Unbounded ancestor walk before the depth cap is checked**

`find_common_ancestor` traverses the full reorg on both branches before returning. The depth cap in `reinject_orphaned_transactions` is checked _after_ this function returns, so for a reorg deeper than the configured cap the code completes the entire walk (potentially hundreds of `get_block_header_by_hash` DB reads on each side) and then discards the result immediately. The depth cap should either be passed into this function as a bail-out argument, or the walk should be bounded by `reorg_depth` during traversal to avoid unnecessary I/O on the FCU hot path.

### Issue 2 of 4
crates/blockchain/blockchain.rs:197-215
**Blob sidecar permanently lost on failed re-injection**

`take_blob_limbo_entry` removes the sidecar from limbo _before_ `add_blob_transaction_to_pool` is called. If admission fails for any reason (e.g., sender balance changed, or a transient store error), the sidecar is gone from limbo with no way to recover it. In a multi-step reorg scenario where the same blob tx is orphaned again by a subsequent FCU, the sidecar will be unavailable for the second re-injection attempt. Consider peeking first and only removing on success, or re-inserting the sidecar back into limbo on a failed admission.

### Issue 3 of 4
crates/networking/rpc/engine/fork_choice.rs:332-349
**Reorg detection fires on every non-reorg FCU**

The condition `previous_head_hash != head.hash()` is `true` for every normal slot (new block ≠ previous head), so `reinject_orphaned_transactions` is called on every successful FCU — not only during genuine reorgs. The comment claims the walk "exits immediately when the previous head is the parent of the new head," but the walk still performs ~3 DB reads (both tip headers + one parent header) before returning `depth == 0`. Adding a cheap pre-check using the block that was already fetched above (e.g., `block.header.parent_hash != previous_head_hash`) would limit the ancestor walk to real reorg scenarios.

### Issue 4 of 4
crates/blockchain/blockchain.rs:2528-2533
**Off-by-one in limbo purge walk**: `steps` starts at 0 and is incremented _after_ each block is processed, so the guard `steps > max_walk` first triggers when `steps == max_walk + 1`, meaning `max_walk + 1` blocks are processed before the break. This is inconsistent with the configured `reorg_depth` cap. Changing to `>=` means at most `max_walk` iterations are processed.

```suggestion
        let max_walk = self.options.reorg_depth;
        let mut current_hash = new_finalized_hash;
        let mut steps = 0u64;
        let mut hashes_to_purge: Vec<H256> = Vec::new();
        while current_hash != previous_finalized_hash && !current_hash.is_zero() {
            if steps >= max_walk {
```

Reviews (1): Last reviewed commit: "feat(l1): add --mempool.reorg-depth flag..." | Re-trigger Greptile

Comment on lines +332 to +349
if !previous_head_hash.is_zero() && previous_head_hash != head.hash() {
match context
.blockchain
.reinject_orphaned_transactions(previous_head_hash, head.hash())
.await
{
Ok(0) => {}
Ok(count) => {
debug!(
reinjected = count,
"Re-injected orphaned transactions into mempool"
);
}
Err(error) => {
debug!(%error, "Failed to re-inject orphaned transactions");
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Reorg detection fires on every non-reorg FCU

The condition previous_head_hash != head.hash() is true for every normal slot (new block ≠ previous head), so reinject_orphaned_transactions is called on every successful FCU — not only during genuine reorgs. The comment claims the walk "exits immediately when the previous head is the parent of the new head," but the walk still performs ~3 DB reads (both tip headers + one parent header) before returning depth == 0. Adding a cheap pre-check using the block that was already fetched above (e.g., block.header.parent_hash != previous_head_hash) would limit the ancestor walk to real reorg scenarios.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/engine/fork_choice.rs
Line: 332-349

Comment:
**Reorg detection fires on every non-reorg FCU**

The condition `previous_head_hash != head.hash()` is `true` for every normal slot (new block ≠ previous head), so `reinject_orphaned_transactions` is called on every successful FCU — not only during genuine reorgs. The comment claims the walk "exits immediately when the previous head is the parent of the new head," but the walk still performs ~3 DB reads (both tip headers + one parent header) before returning `depth == 0`. Adding a cheap pre-check using the block that was already fetched above (e.g., `block.header.parent_hash != previous_head_hash`) would limit the ancestor walk to real reorg scenarios.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/blockchain/blockchain.rs Outdated
Comment on lines +2528 to +2533
let max_walk = self.options.reorg_depth;
let mut current_hash = new_finalized_hash;
let mut steps = 0u64;
let mut hashes_to_purge: Vec<H256> = Vec::new();
while current_hash != previous_finalized_hash && !current_hash.is_zero() {
if steps > max_walk {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Off-by-one in limbo purge walk: steps starts at 0 and is incremented after each block is processed, so the guard steps > max_walk first triggers when steps == max_walk + 1, meaning max_walk + 1 blocks are processed before the break. This is inconsistent with the configured reorg_depth cap. Changing to >= means at most max_walk iterations are processed.

Suggested change
let max_walk = self.options.reorg_depth;
let mut current_hash = new_finalized_hash;
let mut steps = 0u64;
let mut hashes_to_purge: Vec<H256> = Vec::new();
while current_hash != previous_finalized_hash && !current_hash.is_zero() {
if steps > max_walk {
let max_walk = self.options.reorg_depth;
let mut current_hash = new_finalized_hash;
let mut steps = 0u64;
let mut hashes_to_purge: Vec<H256> = Vec::new();
while current_hash != previous_finalized_hash && !current_hash.is_zero() {
if steps >= max_walk {
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2528-2533

Comment:
**Off-by-one in limbo purge walk**: `steps` starts at 0 and is incremented _after_ each block is processed, so the guard `steps > max_walk` first triggers when `steps == max_walk + 1`, meaning `max_walk + 1` blocks are processed before the break. This is inconsistent with the configured `reorg_depth` cap. Changing to `>=` means at most `max_walk` iterations are processed.

```suggestion
        let max_walk = self.options.reorg_depth;
        let mut current_hash = new_finalized_hash;
        let mut steps = 0u64;
        let mut hashes_to_purge: Vec<H256> = Vec::new();
        while current_hash != previous_finalized_hash && !current_hash.is_zero() {
            if steps >= max_walk {
```

How can I resolve this? If you propose a fix, please make it concise.

ilitteri added 5 commits May 12, 2026 13:54
A near-cap 64-block reorg can carry ~10K orphaned txs through full
admission validation (ECDSA recovery + balance check + KZG re-verify
on blob txs). Awaiting that inline before responding to
`engine_forkchoiceUpdated` would add roughly a second of latency to
the CL handshake.

The re-injection is already documented as best-effort and its
failures are logged at debug level, so the FCU response doesn't
depend on its result. Spawn it via `tokio::spawn` so the FCU
acknowledgment returns immediately.
The previous parent-hash walk was bounded by `options.reorg_depth`
(default 64) to guard against malicious inputs. But the input here is
a CL-supplied finalized hash — already canonical, never adversarial —
and that cap leaked sidecars in two real-world cases:

1. Fresh node first-finalization advance: `previous_finalized_hash` is
   zero, the walk runs until it hits genesis or the cap. Anything
   above the cap leaks.
2. CL downtime closes a finalization gap larger than the cap. Only the
   most recent `reorg_depth` blocks get drained; everything older
   stays in limbo forever (next call uses the new
   `previous_finalized_hash` as the boundary).

Each blob sidecar is ~131 KB × up to 6 blobs per tx, so the leak can
grow to multi-GB. Switch to a canonical-number range walk from
`previous_number + 1..=new_number` (no parent_hash chasing, no cap).

Also gate reorg re-injection on `Mempool::is_full`: a deep reorg
should not evict newly-arrived txs to make room for orphaned ones.
Skipped txs are counted and surfaced via a single warn log at the end.
…l gate

Address the Phase 2 cross-check on PR #6631:
- `reinject_is_best_effort_when_admission_fails` now asserts
  `reinjected == 0` and re-runs the call to confirm the path is
  idempotent (no hidden state mutation on best-effort failures).
- New `mempool_is_full_gates_capacity` test verifies the `is_full`
  accessor used by re-injection to skip rather than evict freshly-
  arrived txs at capacity.
… re-inject failure

Phase 2 review (Copilot):
1. `remove_included_transaction` cloned the (potentially ~800 KB)
   BlobsBundle when moving it from `blobs_bundle_pool` into
   `blobs_bundle_limbo`. The clone happens on every blob-tx inclusion
   — an unnecessary allocation/copy on the hot path.
2. Reorg re-injection's blob path called `take_blob_limbo_entry`
   BEFORE attempting `add_blob_transaction_to_pool`. If admission failed
   for a transient reason (pool full, replacement rules, etc.) the
   sidecar was dropped permanently, and the blob tx could never be
   re-injected on a subsequent attempt.

Single rewrite:
- Factor the versioned-hash index cleanup out of `remove_blob_bundle`
  into a shared `clear_blob_versioned_hash_index` helper.
- `remove_included_transaction` now `.remove()`s the bundle out of
  `blobs_bundle_pool` (no clone), runs the index-only cleanup, then
  inserts the bundle into limbo. The subsequent
  `remove_transaction_with_lock` observes an empty pool entry and is
  a no-op for the bundle.
- New `Mempool::insert_blob_limbo_entry` lets re-injection re-park a
  sidecar on transient admission failure. The reorg-re-inject loop
  clones the bundle right before the admission attempt (smaller
  window than holding it across the entire pool insertion) and
  re-inserts on `Err`.
Phase 2 review (greptile P2): the FCU handler's reorg-detection block
guarded only on `previous_head_hash != head.hash()`, which is true for
EVERY successful FCU including normal slot advances. Even though
re-injection is now spawned (so the cost is off the FCU response
path), the spawn itself walks ~3 header-by-hash DB reads before
concluding "no reorg, depth = 0".

Fast-path: when `head.parent_hash == previous_head_hash`, the new
head extends the previous head directly — by definition not a reorg.
Skip the spawn entirely in that case. Genuine reorgs (where parent
differs from previous head) still fall through to the existing
spawned `reinject_orphaned_transactions` path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants