test: green Rust Core Coverage (stale assertions + env-race serialization)#3156
Conversation
… subagent delegation The standalone `openhuman-core run` server builds its tokio runtime with the default 2 MiB per-worker-thread stack. A single agent turn is already a very large async state machine (system prompt + hundreds of tool specs + the nested provider/tool loop); delegating to a sub-agent runs another full turn one level down. Even with the inner sub-agent future boxed (`subagent_runner::ops`, see tinyhumansai#2234), that nesting overflows the 2 MiB stack and aborts the whole process: thread 'tokio-rt-worker' (...) has overflowed its stack fatal runtime error: stack overflow, aborting (SIGABRT) This takes the JSON-RPC server down mid-request. In the Playwright web E2E lane it manifests as `chat-harness-subagent` timing out (the orchestrator's final text never renders) followed by a cascade of `ECONNREFUSED` failures across every subsequent spec in the worker, because they all share the now-dead core. Set `thread_stack_size(16 MiB)` on the serve runtime so a subagent-nested agent turn fits comfortably. Reproduced and verified locally by driving the real `openhuman-core` + mock + web stack on isolated ports and running the subagent spec: - before: core aborts ("overflowed its stack"), spec fails at ~52s - after: core stays alive, spec passes in 7.4s, no overflow in core.log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… serialization) The cargo-llvm-cov core-coverage job is red on every PR for two independent reasons, both fixed here: 1. Stale assertions left by feature PRs (mostly tinyhumansai#3113's sync-flow redesign): - `toolkit_from_slug("MICROSOFT_TEAMS_*")` now returns the full toolkit `"microsoft_teams"`, not `"microsoft"` (3 sites). - memory schema/controller registries grew: legacy tree 19 -> 21, memory_sources 9 -> 10. - The `dedicated_thread` flag no longer short-circuits with a "temporarily disabled" message (see spawn_subagent::dedicated_thread_flag_no_longer_returns_disabled_error); two raw-coverage tests still asserted the old message. 2. Env-race flakes under llvm-cov: several raw-coverage tests set the process-global `OPENHUMAN_WORKSPACE` env var per-test. Under llvm-cov's slower instrumentation the parallel default lets them trample each other's workspace (flaky count/state assertions) — they pass single-threaded. Run the core llvm-cov suite with `--test-threads=1` (in both pr-ci.yml and coverage.yml) so these hermetic-only-when-serialized tests are stable. Verified locally: the fixed assertions pass single-threaded, and the racing tests pass under `--test-threads=1`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIncrease Tokio worker thread stack size, force coverage tests to single-threaded, and update multiple E2E tests: Microsoft Teams toolkit canonicalization to ChangesRuntime Configuration and Test Expectation Updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
graycyrus
left a comment
There was a problem hiding this comment.
@sanil-23 hey! the code looks good to me — the assertion updates are correct (Microsoft Teams toolkit slug fix across 3 test files, registry count bumps matching what's live in production, and the dedicated_thread behavior flip is the right call since the "temporarily disabled" short-circuit is gone). The --test-threads=1 serialization in both coverage.yml and pr-ci.yml is a clean fix for the global OPENHUMAN_WORKSPACE mutation race; slower but hermetic.
One thing: E2E lane 1/4 is currently failing and Frontend Coverage + Rust Core Coverage are still pending. Once CI is fully green I'll come back and approve this. Let me know if you need any help!
… (drop --test-threads=1) Replace the blunt `--test-threads=1` coverage-job change with the per-test `env_lock()` pattern already used elsewhere in these suites, so the llvm-cov job keeps running tests in parallel (fast) and only the handful of tests that mutate the process-global `OPENHUMAN_WORKSPACE` serialize against each other. Four of the seven affected files already had an `env_lock()` helper — the racing test in each just wasn't taking it; the other three get the same self-contained helper. Compiles clean; the previously-flaky tests now hold the lock across their workspace setup so parallel llvm-cov runs stay hermetic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/near90_closure_raw_coverage_e2e.rs`:
- Around line 393-394: Remove the duplicated env_lock() call so the test only
acquires a single MutexGuard from ROUND20_ENV_LOCK; locate the two consecutive
lines calling env_lock() (both binding to _lock) in
tests/near90_closure_raw_coverage_e2e.rs and delete the second one so only one
call to env_lock() remains, preventing a second blocking lock on the
non-reentrant Mutex.
In `@tests/owned_domain_raw_coverage_e2e.rs`:
- Around line 878-886: The new ENV_LOCK and its env_lock() create a second,
uncoordinated mutex; remove ENV_LOCK and change env_lock() to use the existing
OWNED_DOMAIN_ENV_LOCK instead so the file shares the single process-global lock.
Specifically, delete the ENV_LOCK static and update env_lock() so it calls
OWNED_DOMAIN_ENV_LOCK.get_or_init(...).lock().unwrap_or_else(|e|
e.into_inner()), keeping the same return type and behavior but reusing the
existing OWNED_DOMAIN_ENV_LOCK.
In `@tests/worker_b_raw_coverage_e2e.rs`:
- Around line 578-579: There are two consecutive calls that bind the same name:
the duplicate let _lock = env_lock(); shadows the first and is redundant; remove
the second occurrence so only a single env_lock() acquisition remains for the
critical section (locate the duplicate let _lock = env_lock(); lines around the
test case in tests/worker_b_raw_coverage_e2e.rs and delete the second binding),
ensuring the test keeps one _lock variable to guard the environment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 00401edc-5a30-4c2c-a3ef-f901eeeec5b0
📒 Files selected for processing (8)
src/core/cli.rstests/memory_threads_raw_coverage_e2e.rstests/memory_tree_sync_raw_coverage_e2e.rstests/near90_closure_raw_coverage_e2e.rstests/owned_domain_raw_coverage_e2e.rstests/tool_registry_approval_raw_coverage_e2e.rstests/tools_composio_network_leftovers_raw_coverage_e2e.rstests/worker_b_raw_coverage_e2e.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/tools_composio_network_leftovers_raw_coverage_e2e.rs
…env_lock (drop --test-threads=1)" This reverts commit 543b847.
graycyrus
left a comment
There was a problem hiding this comment.
@sanil-23 continuation — found the remaining CI failure.
Rust Core Coverage is still red: one more stale assertion
The --test-threads=1 fix works — the env-race tests all pass now. But there's a stale assertion in a file this PR didn't touch:
thread 'round23_memory_sources_status_registry_and_readers_cover_remaining_edges'
panicked at tests/memory_sources_closure_round23_raw_coverage_e2e.rs:223
assertion left == right failed: left: 0, right: 1
memory_sources_closure_round23_raw_coverage_e2e.rs:223 expects a count of 1 but gets 0 — same pattern as the other stale assertions from #3113's sync-flow redesign. Update that assertion to match current behavior and this should be the last blocker.
src/core/cli.rs — thread_stack_size(16 MiB)
I missed this in the first pass since it came in via a merged branch. The change is correct: nested async state machines for subagent delegation genuinely overflow the 2 MiB default. 16 MiB is generous but within reason given the explanation in the commit message. No concerns.
env_lock detour
The approach in commit 543b847 was correctly reverted. CodeRabbit caught real bugs (double-lock deadlock in near90_closure and worker_b, uncoordinated mutexes across test binaries in owned_domain). The --test-threads=1 CI flag is the right solution — it's simpler and doesn't risk per-binary lock coordination problems.
Fix memory_sources_closure_round23_raw_coverage_e2e.rs:223 and CI should be clean.
…sync redesign) memory_sources_closure_round23: a freshly-inserted Composio source is no longer reported by list_enabled_by_kind until it has an active connection, so the count is 0, not 1. Aligns the last stale raw-coverage assertion with the behavior shipped in tinyhumansai#3113. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @graycyrus — that's exactly it. Updated |
graycyrus
left a comment
There was a problem hiding this comment.
@sanil-23 hey! the code looks good to me — the prior stale assertion in memory_sources_closure_round23_raw_coverage_e2e.rs:223 is now fixed, and all the other test assertion updates + the Tokio stack bump are correct.
CI is still pending on a few jobs though (Rust Core Coverage, E2E lanes, Frontend Coverage, etc.), so i'll hold off on approval until those clear. once the coverage job turns green, i'll come back and approve this. let me know if you need any help!
Quick notes on what i checked:
- Stale assertions all addressed: Microsoft Teams toolkit slug → "microsoft_teams", legacy tree schemas 19→21, memory_sources 9→10, dedicated_thread behavior flip, composio enabled count 1→0.
- Workflow --test-threads=1 change is correct for serializing env-mutating coverage tests.
- Tokio thread stack increase to 16 MiB is justified and properly commented (subagent stacking issue).
- No breaking changes, security issues, or production behavior changes beyond the stack size (which prevents crashes).
|
@graycyrus — composio fix is in and CI got past it. The remaining red is not another stale assertion — it's a flaky
So full isolation needs faking |
…tion The github reader fetches commits via a real `git clone` first (only falling back to the fake `gh` on error), and `fetch_github` falls back to the real api.github.com on any gh path-mismatch. So this test was flaky: with network, the real clone returned the repo's real commits (no abc123). Fixes: - Add a failing `git` stub on PATH so the reader deterministically uses the fake `gh` (no real clone, no network). - Glob the fake-`gh` list/comments path patterns so they match the reader's actual `?per_page=..&page=..` query strings (the exact patterns missed them and silently fell through to the real API). - Drop the stale `## Comments` assertion: tinyhumansai#3113 stopped inlining comments into the rendered issue body (fetch_issue_comments is now unused); assert on the content read_issue still renders. Verified: passes deterministically (3x, ~0.3s, no network). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same fix as round21, applied to the two sibling github-reader tests: - Add a failing `git` stub on PATH so GithubReader can't reach the real `git clone` of github.com and deterministically uses the fake `gh`. - Glob the fake-`gh` list/comments path patterns so they match the reader's actual `?per_page=..&page=..` queries instead of silently falling through to the real api.github.com. - memory_sync_sources: drop the stale `## Comments` assertion (tinyhumansai#3113 stopped inlining comments into the issue body); assert on the rendered content. Verified: both pass deterministically (~0.7s, no network). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
graycyrus
left a comment
There was a problem hiding this comment.
Two new commits reviewed (github-reader test hermetic fixes). Looks solid.
Summary:
- Added fake git stub so the reader can't fall back to real
git clone— tests now use mockedghdeterministically - Updated fake-gh case patterns from exact strings to glob patterns (
\?*) so they match the reader's actual query strings with?per_page=...&page=...params - Dropped stale
## Commentsassertions (no longer inlined per #3113); now assert on the rendered issue content instead
All prior findings are resolved. Code is clean — no debug, no console.logs, changes are minimal and focused.
CI is still pending on Rust Core Coverage, Frontend Coverage, and a couple E2E lanes. Once those clear, I'll come back and APPROVE. Let me know if you run into any blockers.
owned_domain coverage asserted the responses fallback request's
/input/0/content equals "fallback please" (a plain string). The
OpenAI-compatible responses payload now sends structured content parts
([{text, type:"input_text"}], tinyhumansai#2748/tinyhumansai#3124), so assert that shape.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
round22_cron_add_tool used a hardcoded `at: 2026-05-31T00:00:00Z` schedule,
which is now in the past, so cron creation fails validation ("'at' must be
in the future"). Compute the `at` time at runtime (now + 30 days) so the
test never expires again.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/memory_sync_sources_raw_coverage_e2e.rs (1)
314-324: ⚖️ Poor tradeoffConsider extracting the git stub creation into a shared test utility.
The identical git stub creation code (lines 314-324) appears in both
tests/memory_sync_sources_raw_coverage_e2e.rsandtests/near90_closure_raw_coverage_e2e.rs:433-443. Extracting to a shared helper function would eliminate duplication.However, given this is test fixture code with limited reuse scope, the refactor is optional and can be deferred.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/memory_sync_sources_raw_coverage_e2e.rs` around lines 314 - 324, Extract the duplicated git stub creation into a shared test helper (e.g., fn create_git_stub(bin: &Path) -> PathBuf in a tests helper module) and replace the inline block that creates `git_stub` with a call to that helper; the helper should perform the same actions (write the script with std::fs::write, and on Unix set executable perms using std::os::unix::fs::PermissionsExt / set_mode, then return the PathBuf to the created stub) so both `tests::memory_sync_sources_raw_coverage_e2e` and `tests::near90_closure_raw_coverage_e2e` call `create_git_stub` instead of duplicating the code that assigns `git_stub`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/memory_sync_sources_raw_coverage_e2e.rs`:
- Around line 314-324: Extract the duplicated git stub creation into a shared
test helper (e.g., fn create_git_stub(bin: &Path) -> PathBuf in a tests helper
module) and replace the inline block that creates `git_stub` with a call to that
helper; the helper should perform the same actions (write the script with
std::fs::write, and on Unix set executable perms using
std::os::unix::fs::PermissionsExt / set_mode, then return the PathBuf to the
created stub) so both `tests::memory_sync_sources_raw_coverage_e2e` and
`tests::near90_closure_raw_coverage_e2e` call `create_git_stub` instead of
duplicating the code that assigns `git_stub`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ea4a364d-c80e-40c9-9146-be3df10039fc
📒 Files selected for processing (3)
tests/memory_sync_sources_raw_coverage_e2e.rstests/near90_closure_raw_coverage_e2e.rstests/owned_domain_raw_coverage_e2e.rs
tinyhumansai#2952 added a debug_assert in ApprovalGate::new requiring session_id to start with `session-` (the per-launch UUID shape) to guard against credential leaks. The approval coverage tests init the global gate with ids like `approval-raw-e2e-session` / `worker-b-approval-session`, which trip the assert when that test happens to initialize the global gate first (order-dependent — why it read as a flake). Prefix the test session_ids with `session-` so they satisfy the guard regardless of test order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@graycyrus Rust Core Coverage is green now. Beyond the composio assertion you flagged, CI surfaced (one per run, since it stops at the first failure) a chain of #3113/#2748-era stale + flaky tests, all now fixed:
Verified the whole suite locally ( |
…tests # Conflicts: # .github/workflows/coverage.yml # .github/workflows/pr-ci.yml # tests/memory_sources_closure_round23_raw_coverage_e2e.rs # tests/memory_sources_readers_round21_raw_coverage_e2e.rs # tests/memory_sync_sources_raw_coverage_e2e.rs # tests/memory_threads_raw_coverage_e2e.rs # tests/owned_domain_raw_coverage_e2e.rs # tests/tool_registry_approval_raw_coverage_e2e.rs # tests/tools_agent_credentials_state_raw_coverage_e2e.rs # tests/tools_composio_network_leftovers_raw_coverage_e2e.rs # tests/tools_composio_round22_raw_coverage_e2e.rs # tests/worker_b_raw_coverage_e2e.rs
The merge took main's parallel workflow, but env_lock only guards env vars — not other process-global state like the toolkit/connection registry exercised by tools_approval_channels. Under llvm-cov's slower instrumentation those non-env races flake (unknown_toolkit suggestion assertion). Restore the serialized run (proven green at 13d9cbf, endorsed by graycyrus) while keeping main's --no-fail-fast + build jobs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce fix) Replaces the --test-threads=1 serialization with a surgical lock so the coverage job stays parallel/fast. The test read the process-global connection/toolkit registry without holding env_lock, while sibling tests swap OPENHUMAN_WORKSPACE under it — under llvm-cov's slower parallel run that trampled the integrations tool's toolkit list and dropped gmail_pro/slack_bot from the unknown-toolkit suggestion. It was the only test in the file missing the lock. Workflows revert to main's parallel run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rebased on latest
|
…tion) (tinyhumansai#3156) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Green the Rust Core Coverage (cargo-llvm-cov) job, which was red on every PR. This branch has since been rebased onto latest
main: #3074 ("separate agent action sandbox from internal workspace state") independently landed the stale-assertion realignments this PR originally carried, so those are now upstream and removed here. What remains is the partmaindoes not have — and it keeps the coverage job parallel/fast (no--test-threads=1).Changes
1. Env-race fix that keeps coverage parallel —
tests/tools_approval_channels_raw_coverage_e2e.rsorchestrator_tool_synthesis_covers_agent_and_integration_delegation_edgeswas the only test in the file not holdingenv_lock. It reads the process-global connection/toolkit registry while sibling tests swapOPENHUMAN_WORKSPACEunder the lock; under llvm-cov's slower parallel run that intermittently trampled the integrations tool's toolkit list and droppedgmail_pro/slack_botfrom the unknown-toolkit suggestion (tools_approval_channels_raw_coverage_e2e.rs:1689). Added the one-linelet _lock = env_lock();it was missing.env_lockguards this non-env global race without a blanket--test-threads=1, so the suite stays parallel.2. Sub-agent stack overflow —
src/core/cli.rsGive tokio worker threads a 16 MiB stack. A single agent turn is a large async state machine; delegating to a sub-agent nests another full turn and overflows the default 2 MiB stack, SIGABRT-ing the JSON-RPC server mid-request (this is what crashed Playwright lane 1/4).
3. Hermetic github-reader test —
tests/near90_closure_raw_coverage_e2e.rsAdd a failing
gitstub so the round20 reader test can't fall through to a realgit cloneof github.com (2.6 s hermetic vs ~120 s network fallback).Verification
tools_channels_raw_coverage_e2e→ 33/33 pass (parallel); affected binaries pass; core compiles clean.657781a0— Rust Core Coverage passed at ~17 min running parallel, Playwright lanes (incl. 1/4) green.Submission Checklist
env_lockguard + hermetic git stub;N/Anew behavior (test/CI-stabilization).N/A: test + core-runtime-config only.N/A: no feature change.## Related—N/A.N/A: test/CI only.Closes #NNN—N/A: CI test-debt.Impact
Rust Core Coverage(was red on every PR) while keeping it parallel/fast.src/core/cli.rs, which prevents a sub-agent-delegation crash.Related