Skip to content

fix(tool): surface MessagePool service task errors in api_cmd test ctx#6966

Open
0xDevNinja wants to merge 1 commit into
ChainSafe:mainfrom
0xDevNinja:0xdevninja/issue-6505-joinset-errors
Open

fix(tool): surface MessagePool service task errors in api_cmd test ctx#6966
0xDevNinja wants to merge 1 commit into
ChainSafe:mainfrom
0xDevNinja:0xdevninja/issue-6505-joinset-errors

Conversation

@0xDevNinja
Copy link
Copy Markdown

@0xDevNinja 0xDevNinja commented Apr 24, 2026

Summary of changes

Changes introduced in this pull request:

  • Own the JoinSet in test_snapshot::ctx and generate_test_snapshot::ctx instead of passing &mut JoinSet::new() into MessagePool::new. The inline temporary was dropped on return, aborting every spawned mpool service task and silently swallowing any errors they emitted pre-cancellation.
  • Move the owned JoinSet into a detached tokio::spawn(drain_mpool_services(...)) that drains it with the canonical while let Some(result) = services.join_next().await loop from daemon/mod.rs::propagate_error, logging errors and non-cancel panics via tracing::warn!.
  • Add a shared drain_mpool_services helper in test_snapshot.rs (visibility pub(super)) and call it from both ctx() sites.

Detaching (vs. awaiting inline) is deliberate: the mpool services are long-lived, so an inline while let Some(...) would never return — that is precisely what caused codecov CI to time out and led to the earlier attempt being reverted from #6493. A detached drainer surfaces errors without blocking the caller.

Reference issue to close (if applicable)

Closes #6505

Other information and links

  • Canonical drain pattern: src/daemon/mod.rs:842-851 (propagate_error).
  • MessagePool::new signature confirmed at src/message_pool/msgpool/msg_pool.rs:~721 — takes services: &mut JoinSet<anyhow::Result<()>>, which matches the generic used here.
  • Prior related PR (different file, same pattern): test: unit test for start_rpc #6493.
  • No CHANGELOG.md entry — this is an internal test utility change and is not user-facing per the changelog guide at the top of that file.
  • No new test: the fix is observability-only (previously-silent warnings now surface via tracing); asserting on log output would be flaky. Existing rpc_regression_tests_print_uncovered passes locally; the heavier regression harness defers to CI.

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

AI Usage Disclosure

This PR was prepared with assistance from Claude Code (Anthropic). Extent:

  • Issue analysis, implementation, self-review, and PR drafting were AI-assisted.
  • All changes were compiled and tested locally by me before pushing — cargo fmt --all -- --check, cargo clippy --all-targets --no-deps -- -D warnings (with FOREST_F3_SIDECAR_FFI_BUILD_OPT_OUT=1), cargo check --lib --tests, and cargo test --lib rpc_regression_tests_print_uncovered (1/1 pass) all green.
  • I have reviewed every diff line and take full responsibility for correctness.

Summary by CodeRabbit

  • Tests
    • Added explicit tracking and draining of background message-pool tasks used during snapshot tests.
    • Background tasks are now aborted promptly after setup and then awaited so their errors or panics are surfaced.
    • Improved logging for task errors and panics to make test failures and diagnostics clearer.

Review Change Stack

@0xDevNinja 0xDevNinja requested a review from a team as a code owner April 24, 2026 06:33
@0xDevNinja 0xDevNinja requested review from LesnyRumcajs and hanabi1224 and removed request for a team April 24, 2026 06:33
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 24, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4cef0c0e-8fde-477f-9134-23be3c604d0d

📥 Commits

Reviewing files that changed from the base of the PR and between e005443 and 6309c54.

📒 Files selected for processing (2)
  • src/tool/subcommands/api_cmd/generate_test_snapshot.rs
  • src/tool/subcommands/api_cmd/test_snapshot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/tool/subcommands/api_cmd/test_snapshot.rs
  • src/tool/subcommands/api_cmd/generate_test_snapshot.rs

Walkthrough

A shared JoinSet<anyhow::Result<()>> is created and passed into MessagePool::new, immediately aborted, and then drained by a new drain_mpool_services async function that logs task errors and panics while ignoring cancellations.

Changes

JoinSet Error Handling in Test Utilities

Layer / File(s) Summary
Create shared JoinSet and abort after MessagePool construction
src/tool/subcommands/api_cmd/test_snapshot.rs, src/tool/subcommands/api_cmd/generate_test_snapshot.rs
Instantiates a shared JoinSet<anyhow::Result<()>>, passes it into MessagePool::new, and calls abort_all() immediately after construction; spawns a background task to drain the join set.
Drain JoinSet and log outcomes
src/tool/subcommands/api_cmd/test_snapshot.rs
Adds pub(super) async fn drain_mpool_services(mut services: JoinSet<anyhow::Result<()>>) which loops on join_next(), logs warnings for task Err/panic results, and ignores cancelled joins.

Sequence Diagram

sequenceDiagram
  participant ctx
  participant MessagePool
  participant JoinSet
  participant drain_task

  ctx->>JoinSet: create JoinSet
  ctx->>MessagePool: MessagePool::new(&mut JoinSet)
  ctx->>JoinSet: abort_all()
  ctx->>drain_task: spawn drain_mpool_services(JoinSet)
  drain_task->>JoinSet: join_next() loop
  JoinSet-->>drain_task: Result (Ok / Err / Cancelled)
  drain_task->>drain_task: log warnings for Err and panic, ignore Cancelled
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • hanabi1224
  • LesnyRumcajs
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: surfacing MessagePool service task errors in the api_cmd test context by implementing proper error handling.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #6505: owns the JoinSet, adopts the canonical draining pattern with error logging, applies fixes to both target files, and prevents CI timeouts via abort_all() and detached draining.
Out of Scope Changes check ✅ Passed All changes are directly related to resolving issue #6505. Modifications to both api_cmd test files focus solely on implementing proper JoinSet error handling and draining.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@akaladarshi akaladarshi added the RPC requires calibnet RPC checks to run on CI label Apr 28, 2026
@LesnyRumcajs
Copy link
Copy Markdown
Member

Seems to timeout the tests. Please first assert locally that it works.

@LesnyRumcajs LesnyRumcajs marked this pull request as draft April 29, 2026 11:03
@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch from c0581cb to 63bad6a Compare May 13, 2026 14:18
@0xDevNinja
Copy link
Copy Markdown
Author

Hey, thanks for catching that — local repro confirmed your concern. With just the detached drain_mpool_services spawn, the mpool background reactors stayed alive for the duration of each snapshot test. Running the full snapshot suite (370 tests) with --test-threads=8 had 5 tests stall on the parallel runtime under contention, exactly matching the codecov pattern from #6493.

Pushed a follow-up: services.abort_all() right after MessagePool::new(...) in both test_snapshot::ctx and generate_test_snapshot::ctx. The mpool services were never meant to run in this snapshot ctx (the inherited &mut JoinSet::new() pattern was a temporary that dropped immediately — same end state). The detached drain still polls the aborted JoinSet so any pre-abort spawn error or panic is surfaced via tracing::warn! instead of being silently dropped.

Locally:

$ FOREST_TEST_SKIP_PROOF_PARAM_CHECK=1 cargo test --lib --features cargo-test cargo_test_rpc_snapshot_test_filecoin_ -- --test-threads=8
test result: ok. 370 passed; 0 failed; 0 ignored; 0 measured; 2045 filtered out; finished in 31.87s

No over 60 seconds warnings, no panics. The single-commit branch is rebased on main and CI should pick it up shortly — happy to undraft once you take a look.

@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch from 63bad6a to 63a18e4 Compare May 14, 2026 07:50
Both `test_snapshot::ctx` and `generate_test_snapshot::ctx` passed
`&mut JoinSet::new()` to `MessagePool::new`. The temporary was dropped
on return, aborting every service task spawned by the pool and silently
swallowing any error they emitted pre-cancellation.

Own the `JoinSet` locally and move it into a detached `tokio::spawn`
that drains it with the canonical `while let Some(result) =
services.join_next().await` loop from `daemon/mod.rs::propagate_error`,
logging errors and non-cancel panics via `tracing::warn!`. Detaching
(vs. awaiting inline) avoids the codecov-timeout regression that caused
the earlier attempt to be reverted in ChainSafe#6493: the mpool services are
long-lived and would never complete the inline await.

Refs ChainSafe#6505
@0xDevNinja 0xDevNinja force-pushed the 0xdevninja/issue-6505-joinset-errors branch from 63a18e4 to 6309c54 Compare May 14, 2026 10:39
@0xDevNinja 0xDevNinja marked this pull request as ready for review May 14, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle JoinSet task errors in test utility functions

4 participants