Skip to content

[seekdb][alloc] Optimize chunk cache idle eviction#925

Open
hnwyllmm wants to merge 1 commit into
masterfrom
task/2026062200116871942
Open

[seekdb][alloc] Optimize chunk cache idle eviction#925
hnwyllmm wants to merge 1 commit into
masterfrom
task/2026062200116871942

Conversation

@hnwyllmm

Copy link
Copy Markdown
Member

Task Description

This change aims to address the trade-off between memory usage and performance when using memory_chunk_cache_size. Setting it to 0M keeps the AChunkMgr cache unlimited, which maintains good sysbench performance but can lead to high idle memory after a workload. Setting it to a small value (e.g., 2M) bounds the cache but causes a sharp performance drop. The goal is to retain useful chunk cache during active workloads while opportunistically releasing idle cached chunks after they have not been reused for a period, without introducing new timer threads or configuration items.

Solution Description

This MR implements an opportunistic idle eviction mechanism within AChunkMgr:

  • Adds AChunk::cache_ts_ to record the timestamp when a chunk enters the AChunkMgr cache.
  • Adds AChunkList::pop_expired to remove one expired cached chunk from a slot.
  • In AChunkMgr::free_chunk, after caching the returned chunk as usual, the method attempts to evict one expired chunk from a normal slot.
  • Uses a fixed 30-second TTL (CACHE_EXPIRE_US) and a rotating cursor over normal slots for eviction attempts.
  • Uses evicting_ to prevent concurrent eviction scans.
  • Adds expired_unmaps_ to the chunk manager's diagnostic dump output.
  • Includes unit test coverage for the expired chunk eviction logic.

This branch also includes a change in context.h that forces use_pm=false when creating an ObAllocator. Test data indicates this change slightly improves QPS in one run but significantly increases RSS. This specific change requires careful review.

Passed Regressions

  • Unit test: ./build_release/deps/oblib/unittest/lib/test_chunk_mgr - All 10 tests passed.
  • Build: cmake --build build_release --target seekdb -j 8 - Passed.
  • Sysbench Performance:
    • Parent commit (242683f, memory_chunk_cache_size=0M): QPS/TPS: 77,062.28/s, avg latency: 0.21 ms, p95 latency: 0.55 ms, errors: 0.
    • Current commit (87fd9fb, memory_chunk_cache_size=0M): QPS/TPS: 105,154.90/s, errors: 0.
    • Current commit with context.h use_pm=false: QPS/TPS: 138,616.99/s, avg latency: 0.11 ms, p95 latency: 0.17 ms, errors: 0.
    • Variant without use_pm=false (only idle eviction): QPS/TPS: 133,008.36/s, avg latency: 0.12 ms, p95 latency: 0.18 ms, errors: 0.

Upgrade Compatibility

  • No on-disk format changes.
  • No new configuration items introduced.
  • Runtime behavior changes affect allocator caching and chunk manager diagnostics.
  • The context.h use_pm=false change can broadly affect memory behavior and requires careful review.

Other Information

MR Commit: Contains a single commit: 87fd9fbfde0 Optimize chunk cache idle eviction

Changed Files:

  • deps/oblib/src/lib/alloc/alloc_struct.h
  • deps/oblib/src/lib/rc/context.h
  • deps/oblib/src/lib/resource/achunk_mgr.cpp
  • deps/oblib/src/lib/resource/achunk_mgr.h
  • deps/oblib/unittest/lib/alloc/test_chunk_mgr.cpp

Conclusion from Tests: The optimization's effect is currently limited. Observations show very few fully-free chunks in the AChunkMgr freelists. Even if all cached fully-free chunks were released, the recoverable upper bound was only about 20-30 MB in the observed idle state. Most RSS is likely held by non-empty chunks, live objects, fragmentation inside ObjectSet/BlockSet, or other contexts/modules, which AChunkMgr-only eviction cannot reclaim.

Release Note

Optimizes AChunkMgr to opportunistically evict idle cached memory chunks after a fixed period (30 seconds) without requiring new configuration or timer threads. This aims to reduce persistent idle memory after workloads while maintaining performance. Includes a diagnostic addition (expired_unmaps_ to chunk mgr dump). Note: A related change in context.h forces use_pm=false for ObAllocator, which may impact memory usage and requires review.

@hnwyllmm

Copy link
Copy Markdown
Member Author

The mapping Dima issue is about optimizing the chunk reserved cache size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant