feat(l1): reject gapped-nonce txs under heavy mempool pressure#6609
feat(l1): reject gapped-nonce txs under heavy mempool pressure#6609ilitteri wants to merge 2 commits into
Conversation
When the mempool is at or above a configurable occupancy threshold (default 90%), incoming transactions whose nonce is not contiguous with the sender's on-chain nonce are rejected with a new MempoolError::GapAdmissionDeniedUnderPressure error. Replacements (same-nonce txs already in the pool) bypass the rule since they are not gapped. The threshold is exposed via the new --mempool.gap-admit-occupancy-threshold CLI flag (ETHREX_MEMPOOL_GAP_ADMIT_OCCUPANCY_THRESHOLD env var) and plumbed through both L1 and L2 initializers. Setting it to 100 disables the check. Also adds Mempool::occupancy_ratio and Mempool::is_heavily_occupied helpers, unit tests for them, and integration tests covering the gap-admission rule's accept/reject paths and the replacement bypass.
🤖 Kimi Code ReviewThe PR implements a mempool admission control mechanism to prevent DoS via gapped-nonce transaction spam when the pool is under pressure. Overall, the implementation is correct, well-tested, and follows Rust best practices. Minor optimization (non-blocking):
// Instead of:
if self.mempool.is_heavily_occupied(threshold)? {
let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8;
// ...
}
// Could use:
if let Some(occupancy_pct) = self.mempool.check_occupancy_threshold(threshold)? {
// ...
}Code correctness verification:
Tests: The integration tests in
Documentation: CLI help text and Conclusion: LGTM. The double-lock is a minor inefficiency but not a correctness issue. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewNow I have a complete picture. Here is my review: PR #6609 —
|
There was a problem hiding this comment.
Pull request overview
Adds a pressure-based admission control to the L1 mempool to mitigate spam from gapped-nonce transactions by rejecting them once the pool is sufficiently full, while preserving existing replacement semantics and making the threshold configurable via CLI/env.
Changes:
- Introduces
Mempool::occupancy_ratio()andMempool::is_heavily_occupied()helpers and uses them to gate gapped-nonce admission under high occupancy. - Adds
BlockchainOptions::gap_admit_occupancy_threshold(default 90) and wires it through L1/L2 node initialization + CLI/env + docs. - Adds test coverage for the new admission rule and for the mempool occupancy helpers.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/tests/blockchain/mempool_tests.rs | Adds integration-style tests for gap-admission behavior at/under threshold, disabled-at-100, contiguous nonce, and replacement bypass. |
| docs/CLI.md | Documents the new --mempool.gap-admit-occupancy-threshold flag (including env var + default) for both node modes. |
| crates/blockchain/mempool.rs | Adds occupancy helpers and unit tests for their behavior. |
| crates/blockchain/error.rs | Adds MempoolError::GapAdmissionDeniedUnderPressure { occupancy_pct, nonce_gap }. |
| crates/blockchain/blockchain.rs | Adds default threshold constant, plumbs option, and enforces the new pressure gate in validate_transaction. |
| cmd/ethrex/l2/initializers.rs | Wires CLI option into L2 BlockchainOptions. |
| cmd/ethrex/initializers.rs | Wires CLI option into L1 BlockchainOptions. |
| cmd/ethrex/cli.rs | Adds CLI/env flag, default, and validation for the new threshold option. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let ratio = self.occupancy_ratio()?; | ||
| Ok(ratio * 100.0 >= threshold_pct as f64) |
| if tx_to_replace_hash.is_none() | ||
| && nonce != sender_acc_nonce | ||
| && self.mempool.is_heavily_occupied(threshold)? | ||
| { | ||
| let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8; |
Greptile SummaryThis PR adds a configurable occupancy-based gate that rejects incoming gapped-nonce transactions when the mempool is at or above a threshold (default 90%), guarding against cheap spam that pins pool budget under heavy load. The gate is bypassed for replacements at an already-pooled
Confidence Score: 4/5Safe to merge for the default configuration; two edge-case issues exist in non-default operator settings. The core admission logic, CLI plumbing, and integration tests look solid for the default configuration (threshold=90, bounded pool). Two issues exist: the occupancy percentage in the rejection error can differ from the value that triggered the check due to separate lock acquisitions, and is_heavily_occupied(0) returns true even for unlimited pools, contradicting the treat-as-empty promise documented on occupancy_ratio. crates/blockchain/mempool.rs and crates/blockchain/blockchain.rs — both issues live in the new is_heavily_occupied helper and its call site in validate_transaction.
|
| Filename | Overview |
|---|---|
| crates/blockchain/blockchain.rs | Adds gap-admission gate in validate_transaction: extracts sender_acc_nonce, calls find_tx_to_replace, then rejects gapped-nonce txs under heavy pressure; TOCTOU between the admission check and the error-message occupancy read. |
| crates/blockchain/mempool.rs | Adds occupancy_ratio() and is_heavily_occupied() helpers plus unit tests; is_heavily_occupied(0) with unlimited pool silently breaks the treat-as-empty contract documented on occupancy_ratio. |
| crates/blockchain/error.rs | Adds GapAdmissionDeniedUnderPressure variant to MempoolError; straightforward addition. |
| cmd/ethrex/cli.rs | Adds --mempool.gap-admit-occupancy-threshold CLI flag with clap range validation (0-100) and env-var support; looks correct. |
| cmd/ethrex/initializers.rs | Threads gap_admit_occupancy_threshold from CLI options into BlockchainOptions for L1; correct plumbing. |
| cmd/ethrex/l2/initializers.rs | Threads gap_admit_occupancy_threshold into BlockchainOptions for L2 node; mirrors L1 plumbing correctly. |
| test/tests/blockchain/mempool_tests.rs | Adds five async integration tests covering rejection above threshold, acceptance below threshold, disabled-at-100, contiguous nonce, and replacement bypass; comprehensive and well-structured. |
| docs/CLI.md | Documents the new --mempool.gap-admit-occupancy-threshold flag for both L1 and L2 CLI sections. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[validate_transaction called] --> B[Fetch sender on-chain nonce]
B --> C{Account found?}
C -- No --> ERR1[Err: NotEnoughBalance]
C -- Yes --> D{nonce < on-chain nonce or nonce == u64::MAX?}
D -- Yes --> ERR2[Err: NonceTooLow]
D -- No --> E{Balance sufficient?}
E -- No --> ERR3[Err: NotEnoughBalance]
E -- Yes --> F[find_tx_to_replace sender+nonce+tx]
F --> G{Existing tx at same sender+nonce?}
G -- Yes higher fees --> H[tx_to_replace_hash = Some]
G -- No existing tx --> I[tx_to_replace_hash = None]
G -- Lower fees --> ERR4[Err: UnderpricedReplacement]
H --> J[Chain ID check]
I --> J
J --> K{tx_to_replace_hash is None AND nonce != on-chain nonce?}
K -- Gapped no replacement --> L{is_heavily_occupied threshold%?}
L -- Yes --> ERR5[Err: GapAdmissionDeniedUnderPressure]
L -- No below threshold --> OK[Ok tx_to_replace_hash]
K -- Replacement or contiguous nonce --> OK
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
crates/blockchain/blockchain.rs:2532-2541
**TOCTOU between admission check and error-message occupancy**
`is_heavily_occupied` acquires and releases the read lock internally (via `occupancy_ratio`), then the `if`-body calls `occupancy_ratio` a second time under a fresh lock. Between those two lock acquisitions the pool can drain or fill, so the `occupancy_pct` printed in the error can be lower than the threshold — e.g. "Mempool 88% full; rejecting gapped-nonce tx" when the pool was actually at 91% when the check fired. Operators using this message to diagnose admission behaviour will see inconsistent numbers. Consider capturing the ratio from inside `is_heavily_occupied` (return it alongside the bool, or compute both from a single lock hold) so that the same snapshot drives both the decision and the diagnostic.
### Issue 2 of 2
crates/blockchain/mempool.rs:398-404
**`is_heavily_occupied(0)` always returns `true` even for unlimited pools**
When `max_mempool_size == 0` (unlimited), `occupancy_ratio()` returns `0.0` — intentionally, per the doc-comment, to signal that pressure-gated rules should treat the pool as empty. However, `is_heavily_occupied(0)` evaluates `0.0 * 100.0 >= 0.0`, which is `true`, so all gapped transactions are rejected even when the pool has no capacity limit. The "treat as empty" contract documented on `occupancy_ratio` is silently broken for the edge case of threshold=0 combined with an unlimited pool. Adding a fast-path `if inner.max_mempool_size == 0 { return Ok(false); }` at the top of `is_heavily_occupied` would make the unlimited-pool guarantee explicit and consistent with the `occupancy_ratio` doc-comment.
Reviews (1): Last reviewed commit: "feat(l1): reject gapped-nonce txs under ..." | Re-trigger Greptile
| if tx_to_replace_hash.is_none() | ||
| && nonce != sender_acc_nonce | ||
| && self.mempool.is_heavily_occupied(threshold)? | ||
| { | ||
| let occupancy_pct = (self.mempool.occupancy_ratio()? * 100.0).round() as u8; | ||
| let nonce_gap = nonce.saturating_sub(sender_acc_nonce); | ||
| return Err(MempoolError::GapAdmissionDeniedUnderPressure { | ||
| occupancy_pct, | ||
| nonce_gap, | ||
| }); |
There was a problem hiding this comment.
TOCTOU between admission check and error-message occupancy
is_heavily_occupied acquires and releases the read lock internally (via occupancy_ratio), then the if-body calls occupancy_ratio a second time under a fresh lock. Between those two lock acquisitions the pool can drain or fill, so the occupancy_pct printed in the error can be lower than the threshold — e.g. "Mempool 88% full; rejecting gapped-nonce tx" when the pool was actually at 91% when the check fired. Operators using this message to diagnose admission behaviour will see inconsistent numbers. Consider capturing the ratio from inside is_heavily_occupied (return it alongside the bool, or compute both from a single lock hold) so that the same snapshot drives both the decision and the diagnostic.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2532-2541
Comment:
**TOCTOU between admission check and error-message occupancy**
`is_heavily_occupied` acquires and releases the read lock internally (via `occupancy_ratio`), then the `if`-body calls `occupancy_ratio` a second time under a fresh lock. Between those two lock acquisitions the pool can drain or fill, so the `occupancy_pct` printed in the error can be lower than the threshold — e.g. "Mempool 88% full; rejecting gapped-nonce tx" when the pool was actually at 91% when the check fired. Operators using this message to diagnose admission behaviour will see inconsistent numbers. Consider capturing the ratio from inside `is_heavily_occupied` (return it alongside the bool, or compute both from a single lock hold) so that the same snapshot drives both the decision and the diagnostic.
How can I resolve this? If you propose a fix, please make it concise.| pub fn is_heavily_occupied(&self, threshold_pct: u8) -> Result<bool, MempoolError> { | ||
| if threshold_pct >= 100 { | ||
| return Ok(false); | ||
| } | ||
| let ratio = self.occupancy_ratio()?; | ||
| Ok(ratio * 100.0 >= threshold_pct as f64) | ||
| } |
There was a problem hiding this comment.
is_heavily_occupied(0) always returns true even for unlimited pools
When max_mempool_size == 0 (unlimited), occupancy_ratio() returns 0.0 — intentionally, per the doc-comment, to signal that pressure-gated rules should treat the pool as empty. However, is_heavily_occupied(0) evaluates 0.0 * 100.0 >= 0.0, which is true, so all gapped transactions are rejected even when the pool has no capacity limit. The "treat as empty" contract documented on occupancy_ratio is silently broken for the edge case of threshold=0 combined with an unlimited pool. Adding a fast-path if inner.max_mempool_size == 0 { return Ok(false); } at the top of is_heavily_occupied would make the unlimited-pool guarantee explicit and consistent with the occupancy_ratio doc-comment.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/mempool.rs
Line: 398-404
Comment:
**`is_heavily_occupied(0)` always returns `true` even for unlimited pools**
When `max_mempool_size == 0` (unlimited), `occupancy_ratio()` returns `0.0` — intentionally, per the doc-comment, to signal that pressure-gated rules should treat the pool as empty. However, `is_heavily_occupied(0)` evaluates `0.0 * 100.0 >= 0.0`, which is `true`, so all gapped transactions are rejected even when the pool has no capacity limit. The "treat as empty" contract documented on `occupancy_ratio` is silently broken for the edge case of threshold=0 combined with an unlimited pool. Adding a fast-path `if inner.max_mempool_size == 0 { return Ok(false); }` at the top of `is_heavily_occupied` would make the unlimited-pool guarantee explicit and consistent with the `occupancy_ratio` doc-comment.
How can I resolve this? If you propose a fix, please make it concise.
🤖 Codex Code Review
I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Phase 2 review (Copilot + greptile): - `is_heavily_occupied` used `ratio * 100.0 >= threshold_pct as f64`, which can misclassify near-boundary values due to f64 rounding. - `is_heavily_occupied(0)` on an unlimited pool (`max_mempool_size == 0`) evaluated `0.0 * 100.0 >= 0.0 = true`, so EVERY gapped-nonce tx would be rejected even though the pool has no capacity limit. - The gap-admission caller read occupancy twice (once via `is_heavily_occupied`, once for the error message), under separate read locks, allowing TOCTOU drift between the gate decision and the reported value. Single rewrite: - Replace `occupancy_ratio() -> f64` with `occupancy_pct() -> u8` and use integer arithmetic (`pool_len * 100 >= max * threshold_pct`) so threshold comparisons are deterministic. - Guard `max_mempool_size == 0` in `is_heavily_occupied` so unlimited pools never report heavy occupancy. - In `Blockchain::validate_transaction`, compute `occupancy_pct` once and reuse it for both the threshold comparison and the error message; gate the whole branch on `threshold < 100` so the disabled case short-circuits cleanly. Test names + assertions updated accordingly; added a new `is_heavily_occupied_on_unlimited_pool_is_false` test covering the zero-max guard at every threshold.
ElFantasma
left a comment
There was a problem hiding this comment.
The admission flow at blockchain.rs:2537-2546 is structurally identical to the cumulative-balance check I flagged in #6606 — both take a mempool read lock during validation, then the caller takes a write lock for insert. Two parallel submissions from the same sender could both pass the gate at the same occupancy snapshot, then both insert. Probably acceptable here since the gate fires under heavy load (so race timing is roughly favorable) and the worst case is two gapped txs admitted instead of one. Calling out for consistency with #6606's TOCTOU note — not duplicating the suggestion here.
| /// 0 on an unlimited pool (`max_mempool_size == 0`) also returns | ||
| /// `false`, treating "no cap" as "never under pressure" so gapped | ||
| /// admission isn't blanket-rejected when capacity is unbounded. | ||
| pub fn is_heavily_occupied(&self, threshold_pct: u8) -> Result<bool, MempoolError> { |
There was a problem hiding this comment.
is_heavily_occupied is exposed as a public API and has dedicated unit tests, but it's not called from anywhere outside its own tests. The admission gate at blockchain.rs:2537-2546 deliberately uses occupancy_pct() directly + inline >= threshold comparison (per the TOCTOU comment), which is the right call.
So this method's only consumer is its own tests. Two options:
- Drop it. Keep
occupancy_pct()and let callers do the comparison inline (matching what the admission gate already does). One fewer method to maintain. - Use it. Replace
let occupancy_pct = self.mempool.occupancy_pct()?; if occupancy_pct >= threshold { ... }withif self.mempool.is_heavily_occupied(threshold)? { ... }— but then you lose theoccupancy_pctvalue for the error message and would need to re-read (the TOCTOU concern the comment cites).
Leaning toward (1) — the inline comparison is cheaper and the percentage value is needed for the error message regardless. Not blocking.
Motivation
Even after the per-sender slot cap (#6603) lands, a flood of small senders submitting gapped-nonce transactions can pin pool budget — those nonces are unreachable until intermediate nonces land, so they sit forever while productive contiguous-nonce txs from other senders are evicted.
Reject gapped-nonce admissions when the mempool is heavily occupied; in steady-state (low occupancy) the pool tolerates gapped nonces, but under pressure it prefers contiguous, executable nonces.
Description
Mempool::occupancy_pct(&self) -> u8returns the pool occupancy as an integer percentage in[0, 100]. Returns 0 for unlimited pools (max_mempool_size == 0) so pressure-gated rules treat unbounded capacity as "never under pressure".Mempool::is_heavily_occupied(threshold_pct: u8) -> booluses integer arithmetic (pool_len * 100 >= max * threshold_pct) so threshold boundary comparisons are deterministic. A threshold of100disables the check; an unlimited pool short-circuits tofalseregardless of threshold.Blockchain::validate_transactionrejects incoming txs whose nonce isn't the on-chain next (gapped) when occupancy is at or above the configured threshold. Replacement txs (same(sender, nonce)as a pool entry) bypass the gate since they're not gapped.MempoolError::GapAdmissionDeniedUnderPressure { occupancy_pct, nonce_gap }error message — no TOCTOU between the decision and the reported value.--mempool.gap-admit-occupancy-threshold(envETHREX_MEMPOOL_GAP_ADMIT_OCCUPANCY_THRESHOLD, default 90). Setting it to 100 disables the gate.Behavioral change
When the pool is at or above the configured occupancy threshold, gapped-nonce admissions return
GapAdmissionDeniedUnderPressure. Operators can disable the gate by setting the threshold to 100.Deferred
Origin-aware exemption (skipping the gate for
TxOrigin::Local) waits for #6608's TxOrigin enum to merge.