Skip to content

feat(l1): reject gapped-nonce txs under heavy mempool pressure#6609

Open
ilitteri wants to merge 2 commits into
mainfrom
feat/mempool-gap-admission
Open

feat(l1): reject gapped-nonce txs under heavy mempool pressure#6609
ilitteri wants to merge 2 commits into
mainfrom
feat/mempool-gap-admission

Conversation

@ilitteri
Copy link
Copy Markdown
Collaborator

@ilitteri ilitteri commented May 12, 2026

Motivation

Even after the per-sender slot cap (#6603) lands, a flood of small senders submitting gapped-nonce transactions can pin pool budget — those nonces are unreachable until intermediate nonces land, so they sit forever while productive contiguous-nonce txs from other senders are evicted.

Reject gapped-nonce admissions when the mempool is heavily occupied; in steady-state (low occupancy) the pool tolerates gapped nonces, but under pressure it prefers contiguous, executable nonces.

Description

  • New Mempool::occupancy_pct(&self) -> u8 returns the pool occupancy as an integer percentage in [0, 100]. Returns 0 for unlimited pools (max_mempool_size == 0) so pressure-gated rules treat unbounded capacity as "never under pressure".
  • New Mempool::is_heavily_occupied(threshold_pct: u8) -> bool uses integer arithmetic (pool_len * 100 >= max * threshold_pct) so threshold boundary comparisons are deterministic. A threshold of 100 disables the check; an unlimited pool short-circuits to false regardless of threshold.
  • Blockchain::validate_transaction rejects incoming txs whose nonce isn't the on-chain next (gapped) when occupancy is at or above the configured threshold. Replacement txs (same (sender, nonce) as a pool entry) bypass the gate since they're not gapped.
  • Reads occupancy ONCE per admission and reuses the value for both the gate decision and the MempoolError::GapAdmissionDeniedUnderPressure { occupancy_pct, nonce_gap } error message — no TOCTOU between the decision and the reported value.
  • New CLI flag --mempool.gap-admit-occupancy-threshold (env ETHREX_MEMPOOL_GAP_ADMIT_OCCUPANCY_THRESHOLD, default 90). Setting it to 100 disables the gate.

Behavioral change

When the pool is at or above the configured occupancy threshold, gapped-nonce admissions return GapAdmissionDeniedUnderPressure. Operators can disable the gate by setting the threshold to 100.

Deferred

Origin-aware exemption (skipping the gate for TxOrigin::Local) waits for #6608's TxOrigin enum to merge.

When the mempool is at or above a configurable occupancy threshold
(default 90%), incoming transactions whose nonce is not contiguous with
the sender's on-chain nonce are rejected with a new
MempoolError::GapAdmissionDeniedUnderPressure error. Replacements
(same-nonce txs already in the pool) bypass the rule since they are not
gapped. The threshold is exposed via the new
--mempool.gap-admit-occupancy-threshold CLI flag
(ETHREX_MEMPOOL_GAP_ADMIT_OCCUPANCY_THRESHOLD env var) and plumbed
through both L1 and L2 initializers. Setting it to 100 disables the
check.

Also adds Mempool::occupancy_ratio and Mempool::is_heavily_occupied
helpers, unit tests for them, and integration tests covering the
gap-admission rule's accept/reject paths and the replacement bypass.
Copilot AI review requested due to automatic review settings May 12, 2026 02:42
@ilitteri ilitteri requested review from a team, ManuelBilbao and avilagaston9 as code owners May 12, 2026 02:42
@github-actions github-actions Bot added the L1 Ethereum client label May 12, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

The PR implements a mempool admission control mechanism to prevent DoS via gapped-nonce transaction spam when the pool is under pressure. Overall, the implementation is correct, well-tested, and follows Rust best practices.

Minor optimization (non-blocking):

crates/blockchain/blockchain.rs:2533-2541 - The gap admission check acquires the mempool lock twice (once in is_heavily_occupied() and again in occupancy_ratio()). Consider refactoring to return both the boolean and the percentage from a single locked operation to reduce contention:

// Instead of:
if self.mempool.is_heavily_occupied(threshold)? { 
    let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8;
    // ...
}

// Could use:
if let Some(occupancy_pct) = self.mempool.check_occupancy_threshold(threshold)? {
    // ...
}

Code correctness verification:

  • Logic: Correctly identifies nonce gaps (nonce > sender_acc_nonce) only after the NonceTooLow check, and properly bypasses the restriction for replacements (tx_to_replace_hash.is_none()).
  • Edge cases: Handles max_mempool_size == 0 (unlimited) by treating occupancy as 0%, and threshold of 100 correctly disables the check.
  • Type safety: Appropriate use of u8 with clap's range validation (0..=100) and saturating_sub for the nonce gap calculation.
  • Error handling: The new GapAdmissionDeniedUnderPressure error variant provides useful diagnostic context (occupancy percentage and gap size).

Tests:

The integration tests in test/tests/blockchain/mempool_tests.rs comprehensively cover:

  1. Rejection above threshold
  2. Acceptance below threshold
  3. Disabled check at 100%
  4. Contiguous nonces accepted under pressure
  5. Replacements bypassing the rule

Documentation:

CLI help text and docs/CLI.md clearly explain the behavior and the 100% threshold semantics.

Conclusion: LGTM. The double-lock is a minor inefficiency but not a correctness issue.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Lines of code report

Total lines added: 121
Total lines removed: 0
Total lines changed: 121

Detailed view
+----------------------------------------+-------+------+
| File                                   | Lines | Diff |
+----------------------------------------+-------+------+
| ethrex/cmd/ethrex/cli.rs               | 1008  | +11  |
+----------------------------------------+-------+------+
| ethrex/cmd/ethrex/initializers.rs      | 654   | +1   |
+----------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/initializers.rs   | 378   | +1   |
+----------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs | 2505  | +15  |
+----------------------------------------+-------+------+
| ethrex/crates/blockchain/error.rs      | 149   | +2   |
+----------------------------------------+-------+------+
| ethrex/crates/blockchain/mempool.rs    | 544   | +91  |
+----------------------------------------+-------+------+

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a complete picture. Here is my review:


PR #6609feat(l1): reject gapped-nonce txs under heavy mempool pressure

Overview

This PR adds a configurable gate that rejects incoming transactions whose nonce is ahead of the sender's on-chain nonce when the mempool is at or above a configurable occupancy threshold (default 90%). The motivation is anti-spam: gapped-nonce txs cannot be mined until missing predecessor nonces arrive, so under heavy pressure they waste pool capacity. The implementation adds Mempool::occupancy_ratio() / is_heavily_occupied(), wires the threshold through BlockchainOptions, exposes a CLI flag, and ships unit + integration tests.

The code is cleanly structured and the intent is sound. There are, however, a few issues ranging from a meaningful correctness concern down to minor nits.


Correctness — Key Finding

The contiguity reference should be the pending-pool nonce, not the on-chain nonce.

blockchain.rs:2533 checks nonce != sender_acc_nonce (the on-chain confirmed nonce). But the mempool already maintains per-sender pending nonces via Mempool::get_nonce() (mempool.rs:358), which returns highest_pooled_nonce + 1. Using the on-chain nonce means a user who has submitted nonces 0, 1, 2 to the pool (on-chain nonce still 0) cannot submit nonce 3 under high pressure — every nonce above on-chain is treated as "gapped" even when the chain is perfectly contiguous through pending state.

Concretely:

sender on-chain nonce: 0
pool already holds sender txs: nonces 0, 1, 2
user submits: nonce 3  ← legitimate sequential tx
check fires: nonce (3) != sender_acc_nonce (0) → rejected

This is a regression for any active sender trying to queue multiple transactions, even non-spammy ones. An adversary who fills the pool to 90% now also prevents any user with pending txs from extending their queue — effectively a second-order DoS vector.

Fix: Replace the on-chain-nonce reference with the pending-pool next nonce:

// blockchain.rs, in validate_transaction
let pending_next_nonce = self.mempool.get_nonce(&sender)?
    .unwrap_or(sender_acc_nonce);
let threshold = self.options.gap_admit_occupancy_threshold;
if tx_to_replace_hash.is_none()
    && nonce != pending_next_nonce          // true gap relative to pool state
    && self.mempool.is_heavily_occupied(threshold)?
{
    ...
}

pending_next_nonce equals sender_acc_nonce when no txs are pending, so the guard for purely on-chain-gapped submissions is preserved.


Double Lock Acquisition

blockchain.rs:2534–2536:

&& self.mempool.is_heavily_occupied(threshold)?       // lock #1 acquired + released
{
    let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8; // lock #2

These are two separate read-lock acquisitions. Between the two, another thread can add or drain transactions, so the occupancy_pct in the error message can be inconsistent with what triggered the check — e.g., reporting 89% even though the check fired at ≥ 90%. A reader debugging log output will be confused.

Suggestion: expose a single fn occupancy_snapshot() -> Result<f64, MempoolError> that holds the lock once and returns the ratio, used by both the guard and the error construction.


Minor Issues

saturating_sub is unnecessary (blockchain.rs:2537):

let nonce_gap = nonce.saturating_sub(sender_acc_nonce);

At this point nonce > sender_acc_nonce is guaranteed (the NonceTooLow check earlier at line 2498 catches nonce < sender_acc_info.nonce, and the predicate nonce != sender_acc_nonce further implies strict inequality). Plain nonce - sender_acc_nonce is equally safe and removes the defensive noise.

No in-code TODO for TxOrigin exemption: The PR description mentions wiring in exemptions for locally-originated transactions in a follow-up. Without a // TODO(feat/mempool-tx-origin) comment near the gate, future readers have no signal that the rule is intentionally incomplete.

Edge case with threshold = 0 + unlimited pool: When max_mempool_size = 0, occupancy_ratio returns 0.0 as a sentinel. But is_heavily_occupied(0) evaluates 0.0 * 100.0 >= 0.0true, so gap admission is denied even for an unlimited pool. The docstring on occupancy_ratio claims the sentinel "signals that pressure-gated admission rules should treat the pool as empty" — but that is only accurate for thresholds > 0. This is a pathological configuration, but the claim in the docstring is misleading.


Tests

The test suite is comprehensive for the intended semantics: empty/half/full occupancy ratios, threshold-100 disable path, contiguous-nonce acceptance, and replacement bypass. The only gap (pun intended) is that no test covers the scenario described in the main correctness finding above — a sender with pending pool txs trying to extend their queue under high pressure. Adding that case would both document the current behavior and catch a future fix.


Summary

The implementation is well-structured and the deployment knobs are thoughtfully exposed. The primary concern is that nonce != sender_acc_nonce is a broader predicate than "gapped-nonce" — it also catches legitimate sequential submissions from users with pending pool txs. Switching the reference to mempool.get_nonce(&sender) narrows the gate to genuine nonce gaps. The double lock acquisition is a minor but real source of confusing diagnostics.


Automated review by Claude (Anthropic) · sonnet · custom prompt

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pressure-based admission control to the L1 mempool to mitigate spam from gapped-nonce transactions by rejecting them once the pool is sufficiently full, while preserving existing replacement semantics and making the threshold configurable via CLI/env.

Changes:

  • Introduces Mempool::occupancy_ratio() and Mempool::is_heavily_occupied() helpers and uses them to gate gapped-nonce admission under high occupancy.
  • Adds BlockchainOptions::gap_admit_occupancy_threshold (default 90) and wires it through L1/L2 node initialization + CLI/env + docs.
  • Adds test coverage for the new admission rule and for the mempool occupancy helpers.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/tests/blockchain/mempool_tests.rs Adds integration-style tests for gap-admission behavior at/under threshold, disabled-at-100, contiguous nonce, and replacement bypass.
docs/CLI.md Documents the new --mempool.gap-admit-occupancy-threshold flag (including env var + default) for both node modes.
crates/blockchain/mempool.rs Adds occupancy helpers and unit tests for their behavior.
crates/blockchain/error.rs Adds MempoolError::GapAdmissionDeniedUnderPressure { occupancy_pct, nonce_gap }.
crates/blockchain/blockchain.rs Adds default threshold constant, plumbs option, and enforces the new pressure gate in validate_transaction.
cmd/ethrex/l2/initializers.rs Wires CLI option into L2 BlockchainOptions.
cmd/ethrex/initializers.rs Wires CLI option into L1 BlockchainOptions.
cmd/ethrex/cli.rs Adds CLI/env flag, default, and validation for the new threshold option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/blockchain/mempool.rs Outdated
Comment on lines +402 to +403
let ratio = self.occupancy_ratio()?;
Ok(ratio * 100.0 >= threshold_pct as f64)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/blockchain/blockchain.rs Outdated
Comment on lines +2532 to +2536
if tx_to_replace_hash.is_none()
&& nonce != sender_acc_nonce
&& self.mempool.is_heavily_occupied(threshold)?
{
let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR adds a configurable occupancy-based gate that rejects incoming gapped-nonce transactions when the mempool is at or above a threshold (default 90%), guarding against cheap spam that pins pool budget under heavy load. The gate is bypassed for replacements at an already-pooled (sender, nonce) pair, and operators can set the threshold to 100 to disable it entirely.

  • Adds occupancy_ratio() and is_heavily_occupied(threshold_pct) helpers to Mempool, with unit tests; the gate is applied inside validate_transaction after find_tx_to_replace so the replacement-bypass path is naturally handled.
  • Plumbs the new gap_admit_occupancy_threshold field through BlockchainOptions → CLI flag (--mempool.gap-admit-occupancy-threshold) → both L1 and L2 node initializers, with a clap range validator (0–100) and env-var support.
  • Five async integration tests cover the primary scenarios (above/below threshold, disabled at 100, contiguous nonce, and replacement bypass).

Confidence Score: 4/5

Safe to merge for the default configuration; two edge-case issues exist in non-default operator settings.

The core admission logic, CLI plumbing, and integration tests look solid for the default configuration (threshold=90, bounded pool). Two issues exist: the occupancy percentage in the rejection error can differ from the value that triggered the check due to separate lock acquisitions, and is_heavily_occupied(0) returns true even for unlimited pools, contradicting the treat-as-empty promise documented on occupancy_ratio.

crates/blockchain/mempool.rs and crates/blockchain/blockchain.rs — both issues live in the new is_heavily_occupied helper and its call site in validate_transaction.

Important Files Changed

Filename Overview
crates/blockchain/blockchain.rs Adds gap-admission gate in validate_transaction: extracts sender_acc_nonce, calls find_tx_to_replace, then rejects gapped-nonce txs under heavy pressure; TOCTOU between the admission check and the error-message occupancy read.
crates/blockchain/mempool.rs Adds occupancy_ratio() and is_heavily_occupied() helpers plus unit tests; is_heavily_occupied(0) with unlimited pool silently breaks the treat-as-empty contract documented on occupancy_ratio.
crates/blockchain/error.rs Adds GapAdmissionDeniedUnderPressure variant to MempoolError; straightforward addition.
cmd/ethrex/cli.rs Adds --mempool.gap-admit-occupancy-threshold CLI flag with clap range validation (0-100) and env-var support; looks correct.
cmd/ethrex/initializers.rs Threads gap_admit_occupancy_threshold from CLI options into BlockchainOptions for L1; correct plumbing.
cmd/ethrex/l2/initializers.rs Threads gap_admit_occupancy_threshold into BlockchainOptions for L2 node; mirrors L1 plumbing correctly.
test/tests/blockchain/mempool_tests.rs Adds five async integration tests covering rejection above threshold, acceptance below threshold, disabled-at-100, contiguous nonce, and replacement bypass; comprehensive and well-structured.
docs/CLI.md Documents the new --mempool.gap-admit-occupancy-threshold flag for both L1 and L2 CLI sections.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[validate_transaction called] --> B[Fetch sender on-chain nonce]
    B --> C{Account found?}
    C -- No --> ERR1[Err: NotEnoughBalance]
    C -- Yes --> D{nonce < on-chain nonce or nonce == u64::MAX?}
    D -- Yes --> ERR2[Err: NonceTooLow]
    D -- No --> E{Balance sufficient?}
    E -- No --> ERR3[Err: NotEnoughBalance]
    E -- Yes --> F[find_tx_to_replace sender+nonce+tx]
    F --> G{Existing tx at same sender+nonce?}
    G -- Yes higher fees --> H[tx_to_replace_hash = Some]
    G -- No existing tx --> I[tx_to_replace_hash = None]
    G -- Lower fees --> ERR4[Err: UnderpricedReplacement]
    H --> J[Chain ID check]
    I --> J
    J --> K{tx_to_replace_hash is None AND nonce != on-chain nonce?}
    K -- Gapped no replacement --> L{is_heavily_occupied threshold%?}
    L -- Yes --> ERR5[Err: GapAdmissionDeniedUnderPressure]
    L -- No below threshold --> OK[Ok tx_to_replace_hash]
    K -- Replacement or contiguous nonce --> OK
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/blockchain/blockchain.rs:2532-2541
**TOCTOU between admission check and error-message occupancy**

`is_heavily_occupied` acquires and releases the read lock internally (via `occupancy_ratio`), then the `if`-body calls `occupancy_ratio` a second time under a fresh lock. Between those two lock acquisitions the pool can drain or fill, so the `occupancy_pct` printed in the error can be lower than the threshold — e.g. "Mempool 88% full; rejecting gapped-nonce tx" when the pool was actually at 91% when the check fired. Operators using this message to diagnose admission behaviour will see inconsistent numbers. Consider capturing the ratio from inside `is_heavily_occupied` (return it alongside the bool, or compute both from a single lock hold) so that the same snapshot drives both the decision and the diagnostic.

### Issue 2 of 2
crates/blockchain/mempool.rs:398-404
**`is_heavily_occupied(0)` always returns `true` even for unlimited pools**

When `max_mempool_size == 0` (unlimited), `occupancy_ratio()` returns `0.0` — intentionally, per the doc-comment, to signal that pressure-gated rules should treat the pool as empty. However, `is_heavily_occupied(0)` evaluates `0.0 * 100.0 >= 0.0`, which is `true`, so all gapped transactions are rejected even when the pool has no capacity limit. The "treat as empty" contract documented on `occupancy_ratio` is silently broken for the edge case of threshold=0 combined with an unlimited pool. Adding a fast-path `if inner.max_mempool_size == 0 { return Ok(false); }` at the top of `is_heavily_occupied` would make the unlimited-pool guarantee explicit and consistent with the `occupancy_ratio` doc-comment.

Reviews (1): Last reviewed commit: "feat(l1): reject gapped-nonce txs under ..." | Re-trigger Greptile

Comment thread crates/blockchain/blockchain.rs Outdated
Comment on lines +2532 to +2541
if tx_to_replace_hash.is_none()
&& nonce != sender_acc_nonce
&& self.mempool.is_heavily_occupied(threshold)?
{
let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8;
let nonce_gap = nonce.saturating_sub(sender_acc_nonce);
return Err(MempoolError::GapAdmissionDeniedUnderPressure {
occupancy_pct,
nonce_gap,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TOCTOU between admission check and error-message occupancy

is_heavily_occupied acquires and releases the read lock internally (via occupancy_ratio), then the if-body calls occupancy_ratio a second time under a fresh lock. Between those two lock acquisitions the pool can drain or fill, so the occupancy_pct printed in the error can be lower than the threshold — e.g. "Mempool 88% full; rejecting gapped-nonce tx" when the pool was actually at 91% when the check fired. Operators using this message to diagnose admission behaviour will see inconsistent numbers. Consider capturing the ratio from inside is_heavily_occupied (return it alongside the bool, or compute both from a single lock hold) so that the same snapshot drives both the decision and the diagnostic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2532-2541

Comment:
**TOCTOU between admission check and error-message occupancy**

`is_heavily_occupied` acquires and releases the read lock internally (via `occupancy_ratio`), then the `if`-body calls `occupancy_ratio` a second time under a fresh lock. Between those two lock acquisitions the pool can drain or fill, so the `occupancy_pct` printed in the error can be lower than the threshold — e.g. "Mempool 88% full; rejecting gapped-nonce tx" when the pool was actually at 91% when the check fired. Operators using this message to diagnose admission behaviour will see inconsistent numbers. Consider capturing the ratio from inside `is_heavily_occupied` (return it alongside the bool, or compute both from a single lock hold) so that the same snapshot drives both the decision and the diagnostic.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +398 to +404
pub fn is_heavily_occupied(&self, threshold_pct: u8) -> Result<bool, MempoolError> {
if threshold_pct >= 100 {
return Ok(false);
}
let ratio = self.occupancy_ratio()?;
Ok(ratio * 100.0 >= threshold_pct as f64)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 is_heavily_occupied(0) always returns true even for unlimited pools

When max_mempool_size == 0 (unlimited), occupancy_ratio() returns 0.0 — intentionally, per the doc-comment, to signal that pressure-gated rules should treat the pool as empty. However, is_heavily_occupied(0) evaluates 0.0 * 100.0 >= 0.0, which is true, so all gapped transactions are rejected even when the pool has no capacity limit. The "treat as empty" contract documented on occupancy_ratio is silently broken for the edge case of threshold=0 combined with an unlimited pool. Adding a fast-path if inner.max_mempool_size == 0 { return Ok(false); } at the top of is_heavily_occupied would make the unlimited-pool guarantee explicit and consistent with the occupancy_ratio doc-comment.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/mempool.rs
Line: 398-404

Comment:
**`is_heavily_occupied(0)` always returns `true` even for unlimited pools**

When `max_mempool_size == 0` (unlimited), `occupancy_ratio()` returns `0.0` — intentionally, per the doc-comment, to signal that pressure-gated rules should treat the pool as empty. However, `is_heavily_occupied(0)` evaluates `0.0 * 100.0 >= 0.0`, which is `true`, so all gapped transactions are rejected even when the pool has no capacity limit. The "treat as empty" contract documented on `occupancy_ratio` is silently broken for the edge case of threshold=0 combined with an unlimited pool. Adding a fast-path `if inner.max_mempool_size == 0 { return Ok(false); }` at the top of `is_heavily_occupied` would make the unlimited-pool guarantee explicit and consistent with the `occupancy_ratio` doc-comment.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: crates/blockchain/blockchain.rs:2532 rejects any nonce different from the account’s on-chain nonce once the pool crosses the threshold. That also rejects the next sequential nonce after a pending tx is already in the pool. Example: if nonce 0 is pending, eth_getTransactionCount(..., "pending") still advances to 1 via crates/networking/rpc/eth/account.rs:166 and crates/blockchain/mempool.rs:358, but submitting nonce 1 will now fail because the state nonce is still 0. That makes the node advertise a pending nonce it will not accept and breaks normal multi-tx flows under load. The gate should compare against the highest contiguous nonce already staged for the sender, not the raw state nonce. The added coverage misses this case; it only checks nonce == on-chain nonce and same-nonce replacement in test/tests/blockchain/mempool_tests.rs:616.

  2. Medium: crates/blockchain/mempool.rs:383 documents max_mempool_size == 0 as “unlimited capacity”, but crates/blockchain/mempool.rs:159 and crates/blockchain/mempool.rs:97 treat 0 as permanently full, and the CLI still accepts 0 at cmd/ethrex/cli.rs:177. Either reject zero during config parsing or implement the advertised “unlimited” behavior before relying on it in occupancy-based admission.

I couldn’t run cargo test in this sandbox because cargo/rustup need writable home directories and external dependency access. Aside from the two points above, the option plumbing and test additions look straightforward.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@ilitteri ilitteri mentioned this pull request May 12, 2026
12 tasks
Phase 2 review (Copilot + greptile):
- `is_heavily_occupied` used `ratio * 100.0 >= threshold_pct as f64`,
  which can misclassify near-boundary values due to f64 rounding.
- `is_heavily_occupied(0)` on an unlimited pool (`max_mempool_size == 0`)
  evaluated `0.0 * 100.0 >= 0.0 = true`, so EVERY gapped-nonce tx would
  be rejected even though the pool has no capacity limit.
- The gap-admission caller read occupancy twice (once via
  `is_heavily_occupied`, once for the error message), under separate
  read locks, allowing TOCTOU drift between the gate decision and
  the reported value.

Single rewrite:
- Replace `occupancy_ratio() -> f64` with `occupancy_pct() -> u8` and
  use integer arithmetic (`pool_len * 100 >= max * threshold_pct`) so
  threshold comparisons are deterministic.
- Guard `max_mempool_size == 0` in `is_heavily_occupied` so unlimited
  pools never report heavy occupancy.
- In `Blockchain::validate_transaction`, compute `occupancy_pct` once
  and reuse it for both the threshold comparison and the error
  message; gate the whole branch on `threshold < 100` so the disabled
  case short-circuits cleanly.

Test names + assertions updated accordingly; added a new
`is_heavily_occupied_on_unlimited_pool_is_false` test covering the
zero-max guard at every threshold.
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admission flow at blockchain.rs:2537-2546 is structurally identical to the cumulative-balance check I flagged in #6606 — both take a mempool read lock during validation, then the caller takes a write lock for insert. Two parallel submissions from the same sender could both pass the gate at the same occupancy snapshot, then both insert. Probably acceptable here since the gate fires under heavy load (so race timing is roughly favorable) and the worst case is two gapped txs admitted instead of one. Calling out for consistency with #6606's TOCTOU note — not duplicating the suggestion here.

/// 0 on an unlimited pool (`max_mempool_size == 0`) also returns
/// `false`, treating "no cap" as "never under pressure" so gapped
/// admission isn't blanket-rejected when capacity is unbounded.
pub fn is_heavily_occupied(&self, threshold_pct: u8) -> Result<bool, MempoolError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_heavily_occupied is exposed as a public API and has dedicated unit tests, but it's not called from anywhere outside its own tests. The admission gate at blockchain.rs:2537-2546 deliberately uses occupancy_pct() directly + inline >= threshold comparison (per the TOCTOU comment), which is the right call.

So this method's only consumer is its own tests. Two options:

  1. Drop it. Keep occupancy_pct() and let callers do the comparison inline (matching what the admission gate already does). One fewer method to maintain.
  2. Use it. Replace let occupancy_pct = self.mempool.occupancy_pct()?; if occupancy_pct >= threshold { ... } with if self.mempool.is_heavily_occupied(threshold)? { ... } — but then you lose the occupancy_pct value for the error message and would need to re-read (the TOCTOU concern the comment cites).

Leaning toward (1) — the inline comparison is cheaper and the percentage value is needed for the error message regardless. Not blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants