perf(l1): two-CF receipts migration#6598
Conversation
… iteration Change RECEIPTS key from RLP-encoded (BlockHash, u64) to raw block_hash (32B) || index (8B big-endian u64). This enables cursor-based prefix iteration by block hash, replacing the previous point-lookup loop. - Add receipt_key() helper for the new fixed-width key format - Rewrite get_receipts_for_block_from_index to use prefix_iterator - Add v1→v2 migration (batch-processes old RLP keys, crash-safe) - Bump STORE_SCHEMA_VERSION to 2 - Remove benchmark code from the previous iteration
Switch get_all_block_rpc_receipts and get_all_block_receipts from per-receipt point lookups to a single get_receipts_for_block() call, which uses prefix_iterator for cursor-based batch retrieval.
The cursor-based batch retrieval is slower for RPC because iterators bypass RocksDB block cache optimizations that point lookups benefit from. Keep cursor iteration only for p2p (get_receipts_for_block), restore per-receipt get() for the RPC handlers.
eth_getTransactionReceipt previously fetched ALL receipts for a block (N point lookups) just to return one. Now uses cursor iteration with a max_count limit to fetch only receipts 0..=index, stopping the cursor early. For a block with 200 txs and a target at index 10, this fetches 11 receipts instead of 200. - Add max_count parameter to get_receipts_for_block_from_index - Add target_index parameter to get_all_block_rpc_receipts - eth_getTransactionReceipt passes Some(index) to stop early - eth_getBlockReceipts passes None to fetch all - get_all_block_receipts uses cursor for raw receipt retrieval
Without a RocksDB prefix extractor, prefix_iterator_cf seeks to the correct position but doesn't stop at the prefix boundary. The loop was iterating through the entire remaining TRANSACTION_LOCATIONS table after finding the match, causing eth_getTransactionReceipt to take seconds instead of milliseconds.
- Skip RECEIPTS entries with unexpected key lengths instead of attempting to decode them - Add receipt count validation in get_all_block_rpc_receipts so missing receipts produce an error instead of a silent empty result - Add guard in eth_getTransactionReceipt for short receipt lists
- Remove redundant receipt length check in eth_getTransactionReceipt (already validated upstream in get_all_block_rpc_receipts) - Restructure migrate_1_to_2 to materialize old keys before writing, avoiding dependency on snapshot semantics during concurrent read/write - Add test for migrate_1_to_2 that seeds old RLP keys, runs migration, and verifies new fixed-width keys round-trip correctly
The materialize-first approach loaded all 153M old-format keys + values into a Vec (~13 GB) before re-keying. Replace with a two-phase approach: 1. Cursor scan dumps only old-format keys to a length-prefixed binary temp file in the DB directory, then closes the iterator immediately. 2. Keys are read back in batches of 10K; each batch does point lookups for values, writes new keys, and deletes old keys. This avoids both the memory spike (only one 10K batch in memory at a time) and the snapshot semantics concern (no concurrent read iterator and writes on the same column family). Crash safety is preserved: metadata stays at v1 until the migration completes, so an interrupted migration restarts from scratch. Point lookups for already-deleted keys return None and are skipped.
Replace the two-phase temp-file migration with a single-pass cursor-held approach. The RocksDB iterator holds a snapshot so it sees a consistent pre-migration view; writes are accumulated in batches of 10K and flushed without closing the iterator. This avoids both the temp file on disk and materializing all keys in memory, while keeping the code simpler.
Replace the in-place cursor-held migration (re-key + per-key deletes on `receipts`) with a two-CF approach: copy entries from `receipts` to `receipts_v2` with fixed-width keys, then the old CF gets dropped on subsequent startup. Advantages over cursor-held approach: - No per-key delete tombstones (saves 150M deletes → less compaction) - Old CF dropped atomically by RocksDB (drops SST files directly) - One read pass + one write pass, no mixed reads/writes on same CF Benchmark (150M synthetic entries): - Two-CF (this PR): 88s, 4.6 GB peak RSS - Cursor-held (prev): 487s - Temp-file dump: 221s Also adds seed_migration_test and bench_migration binaries for reproducible migration benchmarking.
…migration # Conflicts: # CHANGELOG.md
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewNow I have enough to write a thorough review. Let me compile the findings. PR #6598 Review:
|
Greptile SummaryThis PR replaces the in-place cursor-held receipts migration with a two-CF strategy: old RLP-keyed entries from
Confidence Score: 3/5The migration logic is crash-safe and the RPC changes are correct, but the old The migration logic and RPC refactoring are sound, but the old
|
| Filename | Overview |
|---|---|
| crates/storage/api/tables.rs | Adds RECEIPTS_V2 constant and doc comment; RECEIPTS incorrectly left in TABLES, preventing the old CF from ever being dropped by the cleanup logic. |
| crates/storage/migrations.rs | Adds migrate_1_to_2 that copies old RLP-keyed receipts to receipts_v2 in batches; crash-safe and idempotent, but doc-comment claims RECEIPTS is removed from TABLES when it is not. |
| crates/storage/backend/rocksdb.rs | Switches special CF options and compressible table list from RECEIPTS to RECEIPTS_V2; cleanup drop loop is correct but never fires for RECEIPTS because it is still in TABLES. |
| crates/storage/store.rs | All receipt read/write paths switched to RECEIPTS_V2 with new fixed-width key; get_receipts_for_block_from_index correctly moved to spawn_blocking but uses a linear prefix scan instead of a seek for start_index > 0. |
| crates/networking/rpc/eth/block.rs | Refactors get_all_block_rpc_receipts to accept an optional target_index and batch-fetch receipts via prefix iteration; receipt-count validation added; changes look correct. |
| crates/networking/rpc/eth/transaction.rs | Uses receipts.last() instead of receipts.get(index as usize) — functionally equivalent given the count check but fragile. |
| crates/storage/bench_migration.rs | New benchmark binary for manual migration timing; not part of the production path. |
| crates/storage/seed_migration_test.rs | New seeding binary that writes 150M synthetic old-format RLP receipt entries for benchmarking; not part of normal operation. |
Sequence Diagram
sequenceDiagram
participant Node as ethrex Node
participant RDB as RocksDBBackend::open()
participant Mig as migrations::run_pending_migrations()
participant OLD as receipts CF (v1)
participant NEW as receipts_v2 CF (v2)
Node->>RDB: open(path)
RDB->>RDB: union(existing_cfs, TABLES) to all_cfs_to_open
Note over RDB: Both receipts and receipts_v2 opened
RDB->>RDB: cleanup drop CFs not in TABLES
Note over RDB: receipts IS in TABLES so NOT dropped
Node->>Mig: run_pending_migrations(backend, path, v1)
loop batch of 10 000
Mig->>OLD: prefix_iterator read all entries
OLD-->>Mig: (rlp_key, value)
Mig->>Mig: decode H256 u64 build fixed-width key
Mig->>NEW: put_batch(new_key, value)
Mig->>Mig: write_metadata(version + 1)
end
Mig-->>Node: "Ok schema_version=2"
Note over OLD: Still alive never dropped
Node->>Node: Normal operation reads/writes to receipts_v2 only
Comments Outside Diff (1)
-
crates/storage/api/tables.rs, line 110-131 (link)RECEIPTSleft inTABLES— old CF is never droppedRECEIPTSis still in theTABLESarray at line 120. The cleanup loop inRocksDBBackend::open()only drops column families whose names are not inTABLES:if cf_name != "default" && !TABLES.contains(&cf_name.as_str()) { drop(cf) }Because
"receipts"remains inTABLES, this condition is alwaysfalseand the old CF is silently kept alive across every restart — the 16 GB of old-format data on production will never be reclaimed. The doc-comment onRECEIPTS("dropped automatically on next startup") and the migration function comment ("sinceRECEIPTSis no longer listed inTABLES") both contradict the actual code.The fix is to remove
RECEIPTSfromTABLES. The migration can still access the CF during migration becauseRocksDBBackend::open()unionsexisting_cfswithTABLESbefore opening, so a pre-migration database's"receipts"CF will still be opened fromexisting_cfs. On the startup after the successful migration (schema v2),"receipts"will be inexisting_cfsbut no longer inTABLES, and the cleanup loop will drop it.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/storage/api/tables.rs Line: 110-131 Comment: **`RECEIPTS` left in `TABLES` — old CF is never dropped** `RECEIPTS` is still in the `TABLES` array at line 120. The cleanup loop in `RocksDBBackend::open()` only drops column families whose names are **not** in `TABLES`: ``` if cf_name != "default" && !TABLES.contains(&cf_name.as_str()) { drop(cf) } ``` Because `"receipts"` remains in `TABLES`, this condition is always `false` and the old CF is silently kept alive across every restart — the 16 GB of old-format data on production will never be reclaimed. The doc-comment on `RECEIPTS` ("dropped automatically on next startup") and the migration function comment ("since `RECEIPTS` is no longer listed in `TABLES`") both contradict the actual code. The fix is to remove `RECEIPTS` from `TABLES`. The migration can still access the CF during migration because `RocksDBBackend::open()` unions `existing_cfs` with `TABLES` before opening, so a pre-migration database's `"receipts"` CF will still be opened from `existing_cfs`. On the startup after the successful migration (schema v2), `"receipts"` will be in `existing_cfs` but no longer in `TABLES`, and the cleanup loop will drop it. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
crates/storage/api/tables.rs:110-131
**`RECEIPTS` left in `TABLES` — old CF is never dropped**
`RECEIPTS` is still in the `TABLES` array at line 120. The cleanup loop in `RocksDBBackend::open()` only drops column families whose names are **not** in `TABLES`:
```
if cf_name != "default" && !TABLES.contains(&cf_name.as_str()) { drop(cf) }
```
Because `"receipts"` remains in `TABLES`, this condition is always `false` and the old CF is silently kept alive across every restart — the 16 GB of old-format data on production will never be reclaimed. The doc-comment on `RECEIPTS` ("dropped automatically on next startup") and the migration function comment ("since `RECEIPTS` is no longer listed in `TABLES`") both contradict the actual code.
The fix is to remove `RECEIPTS` from `TABLES`. The migration can still access the CF during migration because `RocksDBBackend::open()` unions `existing_cfs` with `TABLES` before opening, so a pre-migration database's `"receipts"` CF will still be opened from `existing_cfs`. On the startup after the successful migration (schema v2), `"receipts"` will be in `existing_cfs` but no longer in `TABLES`, and the cleanup loop will drop it.
### Issue 2 of 3
crates/networking/rpc/eth/transaction.rs:310
`receipts.last()` is correct only as long as `fetch_count == index + 1` and the count check above enforces that exact length. Using `receipts.get(index as usize)` is an explicit, index-stable access that doesn't silently return the wrong receipt if the count logic is ever adjusted.
```suggestion
serde_json::to_value(receipts.get(index as usize)).map_err(|error| RpcErr::Internal(error.to_string()))
```
### Issue 3 of 3
crates/storage/store.rs:1098-1136
**EIP-7975 `start_index` performs a full linear scan from index 0**
For the EIP-7975 partial-receipt path, `start_index` can be a large offset into the block. The current implementation issues a `prefix_iterator` from the beginning of the block-hash prefix and then iterates + skips every entry with `idx < start_index`. Because keys are stored as big-endian u64, they are lexicographically sorted, so a `seek` directly to the composite key `block_hash || start_index.to_be_bytes()` would jump straight to the target. With the current linear scan, a request for receipts starting at index N always reads the first N entries before returning any results.
Reviews (1): Last reviewed commit: "fix: update stale Cargo.lock files after..." | Re-trigger Greptile
|
|
||
| serde_json::to_value(receipts.get(index as usize)) | ||
| .map_err(|error| RpcErr::Internal(error.to_string())) | ||
| serde_json::to_value(receipts.last()).map_err(|error| RpcErr::Internal(error.to_string())) |
There was a problem hiding this comment.
receipts.last() is correct only as long as fetch_count == index + 1 and the count check above enforces that exact length. Using receipts.get(index as usize) is an explicit, index-stable access that doesn't silently return the wrong receipt if the count logic is ever adjusted.
| serde_json::to_value(receipts.last()).map_err(|error| RpcErr::Internal(error.to_string())) | |
| serde_json::to_value(receipts.get(index as usize)).map_err(|error| RpcErr::Internal(error.to_string())) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/eth/transaction.rs
Line: 310
Comment:
`receipts.last()` is correct only as long as `fetch_count == index + 1` and the count check above enforces that exact length. Using `receipts.get(index as usize)` is an explicit, index-stable access that doesn't silently return the wrong receipt if the count logic is ever adjusted.
```suggestion
serde_json::to_value(receipts.get(index as usize)).map_err(|error| RpcErr::Internal(error.to_string()))
```
How can I resolve this? If you propose a fix, please make it concise.| let block_hash = *block_hash; | ||
|
|
||
| let txn = self.backend.begin_read()?; | ||
| loop { | ||
| let key = (*block_hash, index).encode_to_vec(); | ||
| match txn.get(RECEIPTS, key.as_slice())? { | ||
| Some(receipt_bytes) => { | ||
| let receipt = Receipt::decode(receipt_bytes.as_slice())?; | ||
| receipts.push(receipt); | ||
| index += 1; | ||
| tokio::task::spawn_blocking(move || { | ||
| let txn = backend.begin_read()?; | ||
| let prefix = block_hash.as_bytes().to_vec(); | ||
| let iter = txn.prefix_iterator(RECEIPTS_V2, &prefix)?; | ||
| let mut receipts = Vec::new(); | ||
| for result in iter { | ||
| let (k, v) = result?; | ||
| if !k.starts_with(&prefix) { | ||
| break; | ||
| } | ||
| if k.len() != 40 { | ||
| continue; | ||
| } | ||
| // Skip entries before start_index (for eth/70 partial requests) | ||
| if start_index > 0 { | ||
| let idx_bytes: [u8; 8] = k[32..40] | ||
| .try_into() | ||
| .expect("slice is exactly 8 bytes (checked k.len() == 40)"); | ||
| let idx = u64::from_be_bytes(idx_bytes); | ||
| if idx < start_index { | ||
| continue; | ||
| } | ||
| } | ||
| receipts.push(Receipt::decode(v.as_ref())?); | ||
| if let Some(max) = max_count | ||
| && receipts.len() >= max | ||
| { | ||
| break; | ||
| } | ||
| None => break, | ||
| } | ||
| } | ||
|
|
||
| Ok(receipts) | ||
| Ok(receipts) | ||
| }) | ||
| .await | ||
| .map_err(|e| StoreError::Custom(format!("Task panicked: {e}")))? | ||
| } | ||
|
|
||
| // Snap State methods |
There was a problem hiding this comment.
EIP-7975
start_index performs a full linear scan from index 0
For the EIP-7975 partial-receipt path, start_index can be a large offset into the block. The current implementation issues a prefix_iterator from the beginning of the block-hash prefix and then iterates + skips every entry with idx < start_index. Because keys are stored as big-endian u64, they are lexicographically sorted, so a seek directly to the composite key block_hash || start_index.to_be_bytes() would jump straight to the target. With the current linear scan, a request for receipts starting at index N always reads the first N entries before returning any results.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1098-1136
Comment:
**EIP-7975 `start_index` performs a full linear scan from index 0**
For the EIP-7975 partial-receipt path, `start_index` can be a large offset into the block. The current implementation issues a `prefix_iterator` from the beginning of the block-hash prefix and then iterates + skips every entry with `idx < start_index`. Because keys are stored as big-endian u64, they are lexicographically sorted, so a `seek` directly to the composite key `block_hash || start_index.to_be_bytes()` would jump straight to the target. With the current linear scan, a request for receipts starting at index N always reads the first N entries before returning any results.
How can I resolve this? If you propose a fix, please make it concise.
Benchmark Results ComparisonNo significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
Benchmark Results: SstoreBench_no_opt
|
| PENDING_BLOCKS, | ||
| TRANSACTION_LOCATIONS, | ||
| RECEIPTS, | ||
| RECEIPTS_V2, |
There was a problem hiding this comment.
The old RECEIPTS CF won't auto-drop. The migration doc (migrations.rs:104-106) and the PR description both say:
"the old
receiptsCF is not deleted here — it will be dropped automatically by the auto-cleanup inRocksDBBackend::open()on the next startup (sinceRECEIPTSis no longer listed inTABLES)."
But RECEIPTS IS still in TABLES (line 120 above this addition). The auto-cleanup at backend/rocksdb.rs only drops CFs whose name is NOT in TABLES:
for cf_name in &existing_cfs {
if cf_name != "default" && !TABLES.contains(&cf_name.as_str()) {
warn!("Dropping obsolete column family: {}", cf_name);
let _ = db.drop_cf(cf_name) ...
}
}So with both RECEIPTS and RECEIPTS_V2 in TABLES, the old CF survives forever. After migration both CFs hold the same data — on srv1's 16 GB receipts CF, that's 16 GB of duplicate state that never goes away.
The test at migrations.rs:250-254 confirms the intent ("Old keys should still be in RECEIPTS (dropped at startup)") but doesn't actually verify the drop happens.
Fix: remove RECEIPTS, from the TABLES array (line 120). The migration code still references it via tables::RECEIPTS for read-only reading, which is fine — prefix_iterator against a CF name that's been opened (via the auto-create path) works regardless of whether the constant is in TABLES. The auto-cleanup loop will then see "receipts" is unlisted and call drop_cf on next startup.
Worth a quick test of this scenario before merge — e.g., a unit test that runs migrate + reopens the backend + asserts receipts CF is gone.
Summary
receipts) with a two-CF approach: copy entries fromreceiptstoreceipts_v2with fixed-width keys, then the old CF gets dropped on subsequent startupBenchmark (150M synthetic entries)
Files changed
tables.rs— AddRECEIPTS_V2constant, keepRECEIPTSfor migration readsrocksdb.rs— Switch CF options toRECEIPTS_V2migrations.rs— Rewritemigrate_1_to_2to copy to new CF (no deletes)store.rs— All receipt read/write ops useRECEIPTS_V2bench_migration.rsandseed_migration_test.rsbinaries for benchmarkingTest plan
cargo test -p ethrex-storage --features rocksdb -- migrationspassescargo clippy -p ethrex-storage --features rocksdbcleancargo check(full workspace) passes