fix: optimize Block::from_filecoin_tipset#7062
Conversation
WalkthroughRefactors LRU cache key trait and constructors, updates many call sites to pass cache identifiers as string literals, generalizes TipsetStateCache into ForestLruCache<K,V> and wires it into StateManager, and reworks Ethereum RPC block construction to cache and return Arc via ForestLruCache. ChangesLRU Cache Infrastructure and Ethereum Block Caching
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/state_manager/cache.rs (1)
113-121:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClean up
pendingon compute failure.When
compute().awaitreturnsErr, the key remains inpendingbecause cleanup only happens ininsert/remove. This can cause unbounded growth ofpendingfor repeated failures across unique keys.Proposed fix
- let value = compute().await?; - // Write back to cache, release lock and return value - self.insert(key.clone(), value.clone()); - Ok(value) + match compute().await { + Ok(value) => { + // Write back to cache, release lock and return value + self.insert(key.clone(), value.clone()); + Ok(value) + } + Err(err) => { + // Avoid leaking failed keys in pending map + self.cache.write().pending.remove(key); + Err(err) + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/state_manager/cache.rs` around lines 113 - 121, The failure path currently leaves the key in pending when compute().await returns Err, causing pending to grow; modify the None branch so you capture the result of compute().await into a variable, then match on it: on Ok(value) call self.insert(key.clone(), value.clone()) and return Ok(value), and on Err(err) call self.remove(&key) (or otherwise remove the key from pending) before returning Err(err); ensure pending cleanup happens on both success and failure paths and reference the existing pending, compute(), insert, and remove symbols to locate and update the code.
🧹 Nitpick comments (1)
src/state_manager/cache.rs (1)
97-99: ⚡ Quick winUse a cache-specific metric label instead of
STATE_MANAGER_TIPSET.
ForestLruCacheis now generic, but hit/miss metrics are still hardcoded to a tipset-specific label. This makes metrics inaccurate when the cache is reused for other key/value domains.Also applies to: 107-109, 115-117
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/state_manager/cache.rs` around lines 97 - 99, The LRU cache metrics are using the hardcoded label STATE_MANAGER_TIPSET which makes ForestLruCache's hit/miss counters inaccurate when reused; update ForestLruCache to accept or derive a cache-specific metric label (e.g., add a label field or generic parameter on ForestLruCache) and replace the direct uses of crate::metrics::values::STATE_MANAGER_TIPSET in the crate::metrics::LRU_CACHE_HIT and crate::metrics::LRU_CACHE_MISS calls (locations around the inc() calls at the hit/miss sites) to use that cache-specific label instead, ensuring the constructor or factory for ForestLruCache sets the appropriate label for each cache instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/state_manager/cache.rs`:
- Around line 113-121: The failure path currently leaves the key in pending when
compute().await returns Err, causing pending to grow; modify the None branch so
you capture the result of compute().await into a variable, then match on it: on
Ok(value) call self.insert(key.clone(), value.clone()) and return Ok(value), and
on Err(err) call self.remove(&key) (or otherwise remove the key from pending)
before returning Err(err); ensure pending cleanup happens on both success and
failure paths and reference the existing pending, compute(), insert, and remove
symbols to locate and update the code.
---
Nitpick comments:
In `@src/state_manager/cache.rs`:
- Around line 97-99: The LRU cache metrics are using the hardcoded label
STATE_MANAGER_TIPSET which makes ForestLruCache's hit/miss counters inaccurate
when reused; update ForestLruCache to accept or derive a cache-specific metric
label (e.g., add a label field or generic parameter on ForestLruCache) and
replace the direct uses of crate::metrics::values::STATE_MANAGER_TIPSET in the
crate::metrics::LRU_CACHE_HIT and crate::metrics::LRU_CACHE_MISS calls
(locations around the inc() calls at the hit/miss sites) to use that
cache-specific label instead, ensuring the constructor or factory for
ForestLruCache sets the appropriate label for each cache instance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1580e29f-62bf-49f9-9e75-85cf4bac19ab
📒 Files selected for processing (16)
src/beacon/drand.rssrc/chain/store/chain_store.rssrc/chain/store/index.rssrc/chain_sync/bad_block_cache.rssrc/db/blockstore_with_read_cache.rssrc/db/car/mod.rssrc/message_pool/msgpool/msg_pool.rssrc/rpc/methods/chain.rssrc/rpc/methods/eth.rssrc/rpc/methods/eth/types.rssrc/rpc/methods/f3.rssrc/state_manager/cache.rssrc/state_manager/mod.rssrc/state_migration/common/mod.rssrc/utils/cache/lru.rssrc/utils/cache/mod.rs
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
040c058 to
d1531d8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/rpc/methods/eth.rs (1)
486-499: ⚡ Quick winAdd a regression test for full/hash cache isolation.
The non-obvious contract here is that a hash-only request must never mutate the cached full-tx block. A focused test that fetches
TxInfo::Fulland thenTxInfo::Hashfor the same tipset, and asserts the original full block still containsTransactions::Full, would lock this in.Also applies to: 593-598
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rpc/methods/eth.rs` around lines 486 - 499, Add a regression test that verifies hash-only requests don't mutate cached full-tx blocks: call the existing method that uses TxInfo::Full (which triggers from_filecoin_tipset_with_full_tx and caches a full block), then call the path that uses TxInfo::Hash (which uses ETH_BLOCK_HASH_TX_CACHE and downcast_full_transaction_to_hash), and finally assert the originally returned full block still contains Transactions::Full (i.e., its transaction entries remain full, not converted to hashes). Repeat the same pattern for the other code path referenced (the similar TxInfo handling around the other cache usage) to ensure both cache branches preserve full/hash isolation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/rpc/methods/eth.rs`:
- Around line 481-484: The two caches ETH_BLOCK_HASH_TX_CACHE and
ETH_BLOCK_FULL_TX_CACHE are both using Block::block_cache_size(), doubling the
intended budget; update initialization to either split the existing
FOREST_ETH_BLOCK_CACHE_SIZE between them (e.g., use Block::block_cache_size() /
2 for each) or introduce separate configuration knobs (e.g.,
FOREST_ETH_BLOCK_HASH_CACHE_SIZE and FOREST_ETH_BLOCK_FULL_CACHE_SIZE) and use
those values when constructing ForestLruCache in ETH_BLOCK_HASH_TX_CACHE and
ETH_BLOCK_FULL_TX_CACHE (and the other similar cache inits around the same file)
so the total memory matches the intended budget and each cache uses the correct
size parameter.
---
Nitpick comments:
In `@src/rpc/methods/eth.rs`:
- Around line 486-499: Add a regression test that verifies hash-only requests
don't mutate cached full-tx blocks: call the existing method that uses
TxInfo::Full (which triggers from_filecoin_tipset_with_full_tx and caches a full
block), then call the path that uses TxInfo::Hash (which uses
ETH_BLOCK_HASH_TX_CACHE and downcast_full_transaction_to_hash), and finally
assert the originally returned full block still contains Transactions::Full
(i.e., its transaction entries remain full, not converted to hashes). Repeat the
same pattern for the other code path referenced (the similar TxInfo handling
around the other cache usage) to ensure both cache branches preserve full/hash
isolation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 220b1b85-e462-4128-bb0f-96edb4532652
📒 Files selected for processing (16)
src/beacon/drand.rssrc/chain/store/chain_store.rssrc/chain/store/index.rssrc/chain_sync/bad_block_cache.rssrc/db/blockstore_with_read_cache.rssrc/db/car/mod.rssrc/message_pool/msgpool/msg_pool.rssrc/rpc/methods/chain.rssrc/rpc/methods/eth.rssrc/rpc/methods/eth/types.rssrc/rpc/methods/f3.rssrc/state_manager/cache.rssrc/state_manager/mod.rssrc/state_migration/common/mod.rssrc/utils/cache/lru.rssrc/utils/cache/mod.rs
✅ Files skipped from review due to trivial changes (6)
- src/utils/cache/mod.rs
- src/chain_sync/bad_block_cache.rs
- src/db/car/mod.rs
- src/message_pool/msgpool/msg_pool.rs
- src/chain/store/chain_store.rs
- src/rpc/methods/chain.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- src/rpc/methods/f3.rs
- src/beacon/drand.rs
- src/rpc/methods/eth/types.rs
- src/state_manager/mod.rs
- src/utils/cache/lru.rs
- src/state_manager/cache.rs
| static ETH_BLOCK_HASH_TX_CACHE: LazyLock<ForestLruCache<CidWrapper, Arc<Block>>> = | ||
| LazyLock::new(|| { | ||
| const DEFAULT_CACHE_SIZE: NonZeroUsize = nonzero!(500usize); | ||
| let cache_size = std::env::var("FOREST_ETH_BLOCK_CACHE_SIZE") | ||
| .ok() | ||
| .and_then(|s| s.parse().ok()) | ||
| .unwrap_or(DEFAULT_CACHE_SIZE); | ||
| SizeTrackingLruCache::new_with_metrics("eth_block".into(), cache_size) | ||
| ForestLruCache::with_size("eth_block_hash_tx", Block::block_cache_size()) | ||
| }); |
There was a problem hiding this comment.
FOREST_ETH_BLOCK_CACHE_SIZE now budgets two independent caches.
Both eth_block_hash_tx and eth_block_full_tx are initialized with Block::block_cache_size(), so a setting of N now allows up to 2N cached blocks. On this path that can materially increase resident memory compared to the previous single-cache budget.
Please either split the existing budget across the two caches or introduce separate knobs with explicit names.
Also applies to: 508-510, 582-590
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/rpc/methods/eth.rs` around lines 481 - 484, The two caches
ETH_BLOCK_HASH_TX_CACHE and ETH_BLOCK_FULL_TX_CACHE are both using
Block::block_cache_size(), doubling the intended budget; update initialization
to either split the existing FOREST_ETH_BLOCK_CACHE_SIZE between them (e.g., use
Block::block_cache_size() / 2 for each) or introduce separate configuration
knobs (e.g., FOREST_ETH_BLOCK_HASH_CACHE_SIZE and
FOREST_ETH_BLOCK_FULL_CACHE_SIZE) and use those values when constructing
ForestLruCache in ETH_BLOCK_HASH_TX_CACHE and ETH_BLOCK_FULL_TX_CACHE (and the
other similar cache inits around the same file) so the total memory matches the
intended budget and each cache uses the correct size parameter.
| static ETH_BLOCK_HASH_TX_CACHE: LazyLock<ForestLruCache<CidWrapper, Arc<Block>>> = | ||
| LazyLock::new(|| { | ||
| const DEFAULT_CACHE_SIZE: NonZeroUsize = nonzero!(500usize); | ||
| let cache_size = std::env::var("FOREST_ETH_BLOCK_CACHE_SIZE") |
There was a problem hiding this comment.
this must be removed from the docs, no?
There was a problem hiding this comment.
Ah no, I see it's now used for full and hash.
LesnyRumcajs
left a comment
There was a problem hiding this comment.
Looks good, let's have some numbers to confirm the optimization.
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Outside contributions
Summary by CodeRabbit
New Features
Refactor