test: green Rust Core Coverage (stale assertions + env-race serialization) by sanil-23 · Pull Request #3156 · tinyhumansai/openhuman

sanil-23 · 2026-06-01T15:11:09Z

Summary

Green the Rust Core Coverage (cargo-llvm-cov) job, which was red on every PR. This branch has since been rebased onto latest main: #3074 ("separate agent action sandbox from internal workspace state") independently landed the stale-assertion realignments this PR originally carried, so those are now upstream and removed here. What remains is the part main does not have — and it keeps the coverage job parallel/fast (no --test-threads=1).

Changes

1. Env-race fix that keeps coverage parallel — tests/tools_approval_channels_raw_coverage_e2e.rs
orchestrator_tool_synthesis_covers_agent_and_integration_delegation_edges was the only test in the file not holding env_lock. It reads the process-global connection/toolkit registry while sibling tests swap OPENHUMAN_WORKSPACE under the lock; under llvm-cov's slower parallel run that intermittently trampled the integrations tool's toolkit list and dropped gmail_pro/slack_bot from the unknown-toolkit suggestion (tools_approval_channels_raw_coverage_e2e.rs:1689). Added the one-line let _lock = env_lock(); it was missing. env_lock guards this non-env global race without a blanket --test-threads=1, so the suite stays parallel.

2. Sub-agent stack overflow — src/core/cli.rs
Give tokio worker threads a 16 MiB stack. A single agent turn is a large async state machine; delegating to a sub-agent nests another full turn and overflows the default 2 MiB stack, SIGABRT-ing the JSON-RPC server mid-request (this is what crashed Playwright lane 1/4).

3. Hermetic github-reader test — tests/near90_closure_raw_coverage_e2e.rs
Add a failing git stub so the round20 reader test can't fall through to a real git clone of github.com (2.6 s hermetic vs ~120 s network fallback).

Verification

Locally: tools_channels_raw_coverage_e2e → 33/33 pass (parallel); affected binaries pass; core compiles clean.
CI: all required checks green on 657781a0 — Rust Core Coverage passed at ~17 min running parallel, Playwright lanes (incl. 1/4) green.

Submission Checklist

Tests added or updated — adds the missing env_lock guard + hermetic git stub; N/A new behavior (test/CI-stabilization).
Diff coverage ≥ 80% — N/A: test + core-runtime-config only.
Coverage matrix updated — N/A: no feature change.
All affected feature IDs listed under ## Related — N/A.
No new external network dependencies introduced — the git stub removes a real-network fallback.
Manual smoke checklist updated — N/A: test/CI only.
Linked issue closed via Closes #NNN — N/A: CI test-debt.

Impact

CI: stabilizes Rust Core Coverage (was red on every PR) while keeping it parallel/fast.
Product: the only runtime change is the larger tokio worker stack in src/core/cli.rs, which prevents a sub-agent-delegation crash.

… subagent delegation The standalone `openhuman-core run` server builds its tokio runtime with the default 2 MiB per-worker-thread stack. A single agent turn is already a very large async state machine (system prompt + hundreds of tool specs + the nested provider/tool loop); delegating to a sub-agent runs another full turn one level down. Even with the inner sub-agent future boxed (`subagent_runner::ops`, see tinyhumansai#2234), that nesting overflows the 2 MiB stack and aborts the whole process: thread 'tokio-rt-worker' (...) has overflowed its stack fatal runtime error: stack overflow, aborting (SIGABRT) This takes the JSON-RPC server down mid-request. In the Playwright web E2E lane it manifests as `chat-harness-subagent` timing out (the orchestrator's final text never renders) followed by a cascade of `ECONNREFUSED` failures across every subsequent spec in the worker, because they all share the now-dead core. Set `thread_stack_size(16 MiB)` on the serve runtime so a subagent-nested agent turn fits comfortably. Reproduced and verified locally by driving the real `openhuman-core` + mock + web stack on isolated ports and running the subagent spec: - before: core aborts ("overflowed its stack"), spec fails at ~52s - after: core stays alive, spec passes in 7.4s, no overflow in core.log Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… serialization) The cargo-llvm-cov core-coverage job is red on every PR for two independent reasons, both fixed here: 1. Stale assertions left by feature PRs (mostly tinyhumansai#3113's sync-flow redesign): - `toolkit_from_slug("MICROSOFT_TEAMS_*")` now returns the full toolkit `"microsoft_teams"`, not `"microsoft"` (3 sites). - memory schema/controller registries grew: legacy tree 19 -> 21, memory_sources 9 -> 10. - The `dedicated_thread` flag no longer short-circuits with a "temporarily disabled" message (see spawn_subagent::dedicated_thread_flag_no_longer_returns_disabled_error); two raw-coverage tests still asserted the old message. 2. Env-race flakes under llvm-cov: several raw-coverage tests set the process-global `OPENHUMAN_WORKSPACE` env var per-test. Under llvm-cov's slower instrumentation the parallel default lets them trample each other's workspace (flaky count/state assertions) — they pass single-threaded. Run the core llvm-cov suite with `--test-threads=1` (in both pr-ci.yml and coverage.yml) so these hermetic-only-when-serialized tests are stable. Verified locally: the fixed assertions pass single-threaded, and the racing tests pass under `--test-threads=1`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-01T15:11:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Increase Tokio worker thread stack size, force coverage tests to single-threaded, and update multiple E2E tests: Microsoft Teams toolkit canonicalization to microsoft_teams, schema/source counts, SpawnSubagentTool dedicated-thread output assertions, Composio enabled-status, and GitHub reader fixtures/URL patterns.

Changes

Runtime Configuration and Test Expectation Updates

Layer / File(s)	Summary
Tokio Worker Thread Stack Size Configuration `src/core/cli.rs`	Tokio multi-threaded runtime worker threads are configured with 16MiB stack size in `run_server_command` to prevent stack overflow when nested agent turns execute large async state machines.
CI Test Serialization via --test-threads=1 `.github/workflows/coverage.yml`, `.github/workflows/pr-ci.yml`	Both coverage and PR CI workflows are updated to run `cargo llvm-cov` with `-- --test-threads=1`, serializing test execution to avoid coverage instrumentation interference from process-global `OPENHUMAN_WORKSPACE` under parallel runs.
Toolkit Slug Canonicalization Test Expectations `tests/memory_raw_coverage_e2e.rs`, `tests/memory_threads_raw_coverage_e2e.rs`, `tests/memory_tree_sync_raw_coverage_e2e.rs`	Test assertions updated to expect `microsoft_teams` as the canonical toolkit identifier for Microsoft Teams send slugs (handles whitespace/case variants).
SpawnSubagentTool Dedicated Thread Output Behavior `tests/tools_agent_credentials_state_raw_coverage_e2e.rs`, `tests/tools_composio_network_leftovers_raw_coverage_e2e.rs`	Two E2E tests now assert the "temporarily disabled" message is absent from tool outputs for the dedicated_thread path and add comments documenting the changed expectation.
Memory Schema and Source Registry Count Assertions `tests/memory_threads_raw_coverage_e2e.rs`	Test assertions increased: controller schema count from 19 to 21 and memory sources controller schema count from 9 to 10.
Composio Source Enabled Status Behavior `tests/memory_sources_closure_round23_raw_coverage_e2e.rs`	E2E expectation updated: newly inserted Composio sources are not considered enabled until they have an active connection (enabled count changed from 1 to 0).
GitHub Reader Fixture and Fake `gh` Script `tests/memory_sources_readers_round21_raw_coverage_e2e.rs`, `tests/memory_sync_sources_raw_coverage_e2e.rs`, `tests/near90_closure_raw_coverage_e2e.rs`, `tests/memory_sync_sources_raw_coverage_e2e.rs`, `tests/owned_domain_raw_coverage_e2e.rs`	Add failing `git` stub to PATH in tests, relax `gh api` route patterns to wildcard query forms, add specific issue/PR-comments handlers, update issue-body assertions to no longer expect inlined comments, and adapt Responses-API request payload expectation to structured content parts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

tinyhumansai/openhuman#1957: Related changes to SpawnSubagentTool/subagent dedicated_thread behavior and tests.
tinyhumansai/openhuman#3047: Related GitHub reader rendering and comment-handling changes.

Suggested reviewers

graycyrus

Poem

🐰 A rabbit hops through test suites with care,
Stacks widened so agents breathe fresh air,
Threads march single-file and fixtures align,
Teams find their proper name, tidy and fine,
Tests skip the "disabled" whisper — all is fair.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: stabilizing Rust Core Coverage CI by addressing stale assertions and adding test serialization to prevent environment-race flakes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…tests

graycyrus

@sanil-23 hey! the code looks good to me — the assertion updates are correct (Microsoft Teams toolkit slug fix across 3 test files, registry count bumps matching what's live in production, and the dedicated_thread behavior flip is the right call since the "temporarily disabled" short-circuit is gone). The --test-threads=1 serialization in both coverage.yml and pr-ci.yml is a clean fix for the global OPENHUMAN_WORKSPACE mutation race; slower but hermetic.

One thing: E2E lane 1/4 is currently failing and Frontend Coverage + Rust Core Coverage are still pending. Once CI is fully green I'll come back and approve this. Let me know if you need any help!

… (drop --test-threads=1) Replace the blunt `--test-threads=1` coverage-job change with the per-test `env_lock()` pattern already used elsewhere in these suites, so the llvm-cov job keeps running tests in parallel (fast) and only the handful of tests that mutate the process-global `OPENHUMAN_WORKSPACE` serialize against each other. Four of the seven affected files already had an `env_lock()` helper — the racing test in each just wasn't taking it; the other three get the same self-contained helper. Compiles clean; the previously-flaky tests now hold the lock across their workspace setup so parallel llvm-cov runs stay hermetic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/near90_closure_raw_coverage_e2e.rs`:
- Around line 393-394: Remove the duplicated env_lock() call so the test only
acquires a single MutexGuard from ROUND20_ENV_LOCK; locate the two consecutive
lines calling env_lock() (both binding to _lock) in
tests/near90_closure_raw_coverage_e2e.rs and delete the second one so only one
call to env_lock() remains, preventing a second blocking lock on the
non-reentrant Mutex.

In `@tests/owned_domain_raw_coverage_e2e.rs`:
- Around line 878-886: The new ENV_LOCK and its env_lock() create a second,
uncoordinated mutex; remove ENV_LOCK and change env_lock() to use the existing
OWNED_DOMAIN_ENV_LOCK instead so the file shares the single process-global lock.
Specifically, delete the ENV_LOCK static and update env_lock() so it calls
OWNED_DOMAIN_ENV_LOCK.get_or_init(...).lock().unwrap_or_else(|e|
e.into_inner()), keeping the same return type and behavior but reusing the
existing OWNED_DOMAIN_ENV_LOCK.

In `@tests/worker_b_raw_coverage_e2e.rs`:
- Around line 578-579: There are two consecutive calls that bind the same name:
the duplicate let _lock = env_lock(); shadows the first and is redundant; remove
the second occurrence so only a single env_lock() acquisition remains for the
critical section (locate the duplicate let _lock = env_lock(); lines around the
test case in tests/worker_b_raw_coverage_e2e.rs and delete the second binding),
ensuring the test keeps one _lock variable to guard the environment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 00401edc-5a30-4c2c-a3ef-f901eeeec5b0

📥 Commits

Reviewing files that changed from the base of the PR and between d5d5a64 and 543b847.

📒 Files selected for processing (8)

src/core/cli.rs
tests/memory_threads_raw_coverage_e2e.rs
tests/memory_tree_sync_raw_coverage_e2e.rs
tests/near90_closure_raw_coverage_e2e.rs
tests/owned_domain_raw_coverage_e2e.rs
tests/tool_registry_approval_raw_coverage_e2e.rs
tests/tools_composio_network_leftovers_raw_coverage_e2e.rs
tests/worker_b_raw_coverage_e2e.rs

🚧 Files skipped from review as they are similar to previous changes (1)

tests/tools_composio_network_leftovers_raw_coverage_e2e.rs

…env_lock (drop --test-threads=1)" This reverts commit 543b847.

graycyrus

@sanil-23 continuation — found the remaining CI failure.

Rust Core Coverage is still red: one more stale assertion

The --test-threads=1 fix works — the env-race tests all pass now. But there's a stale assertion in a file this PR didn't touch:

thread 'round23_memory_sources_status_registry_and_readers_cover_remaining_edges'
  panicked at tests/memory_sources_closure_round23_raw_coverage_e2e.rs:223
  assertion left == right failed: left: 0, right: 1

memory_sources_closure_round23_raw_coverage_e2e.rs:223 expects a count of 1 but gets 0 — same pattern as the other stale assertions from #3113's sync-flow redesign. Update that assertion to match current behavior and this should be the last blocker.

src/core/cli.rs — thread_stack_size(16 MiB)

I missed this in the first pass since it came in via a merged branch. The change is correct: nested async state machines for subagent delegation genuinely overflow the 2 MiB default. 16 MiB is generous but within reason given the explanation in the commit message. No concerns.

env_lock detour

The approach in commit 543b847 was correctly reverted. CodeRabbit caught real bugs (double-lock deadlock in near90_closure and worker_b, uncoordinated mutexes across test binaries in owned_domain). The --test-threads=1 CI flag is the right solution — it's simpler and doesn't risk per-binary lock coordination problems.

Fix memory_sources_closure_round23_raw_coverage_e2e.rs:223 and CI should be clean.

…sync redesign) memory_sources_closure_round23: a freshly-inserted Composio source is no longer reported by list_enabled_by_kind until it has an active connection, so the count is 0, not 1. Aligns the last stale raw-coverage assertion with the behavior shipped in tinyhumansai#3113. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sanil-23 · 2026-06-01T16:22:47Z

Thanks @graycyrus — that's exactly it. Updated memory_sources_closure_round23_raw_coverage_e2e.rs:223 to assert 0 (a freshly-inserted Composio source isn't reported as enabled until it has an active connection, per #3113). Verified locally (passes single-threaded). And agreed on all three points — --test-threads=1 over the per-binary env_lock, and the thread_stack_size fix. Should be green now.

graycyrus

@sanil-23 hey! the code looks good to me — the prior stale assertion in memory_sources_closure_round23_raw_coverage_e2e.rs:223 is now fixed, and all the other test assertion updates + the Tokio stack bump are correct.

CI is still pending on a few jobs though (Rust Core Coverage, E2E lanes, Frontend Coverage, etc.), so i'll hold off on approval until those clear. once the coverage job turns green, i'll come back and approve this. let me know if you need any help!

Quick notes on what i checked:

Stale assertions all addressed: Microsoft Teams toolkit slug → "microsoft_teams", legacy tree schemas 19→21, memory_sources 9→10, dedicated_thread behavior flip, composio enabled count 1→0.
Workflow --test-threads=1 change is correct for serializing env-mutating coverage tests.
Tokio thread stack increase to 16 MiB is justified and properly commented (subagent stacking issue).
No breaking changes, security issues, or production behavior changes beyond the stack size (which prevents crashes).

sanil-23 · 2026-06-01T17:12:58Z

@graycyrus — composio fix is in and CI got past it. The remaining red is not another stale assertion — it's a flaky round21_github_reader test (and its 2 siblings memory_sync_sources / near90_closure share the pattern). Root cause:

GithubReader::list_items fetches commits via git clone → gh api → real reqwest to api.github.com, in that fallback order. These tests only stub gh (and not fully). So on a CI runner with network:

The real git clone --bare of tinyhumansai/openhuman succeeds → returns the repo's real commits (no commit:abc123) → assertion fails. (Confirmed: the failing runs take ~20s+ on this one test; passing runs are instant. Same commit c2af5bc7 passed in run 26767540663 and failed in 26767542022 — textbook flake.)
Stubbing git alone isn't enough — the gh fallback then hits real api.github.com (the fake gh only handles api/--version, not auth status, and the fixture's per_page=30 may not match the reader's call), so it still misses abc123.

So full isolation needs faking git and completing the gh stub (auth status + matching per_page) across all three tests. That's reader-testability work I didn't want to half-do blind — happy to take a crack if you'd like, but flagging since it's pre-existing and separate from the #3113 stale-assertion fixes this PR targets. All the deterministic stale assertions are now fixed.

…tion The github reader fetches commits via a real `git clone` first (only falling back to the fake `gh` on error), and `fetch_github` falls back to the real api.github.com on any gh path-mismatch. So this test was flaky: with network, the real clone returned the repo's real commits (no abc123). Fixes: - Add a failing `git` stub on PATH so the reader deterministically uses the fake `gh` (no real clone, no network). - Glob the fake-`gh` list/comments path patterns so they match the reader's actual `?per_page=..&page=..` query strings (the exact patterns missed them and silently fell through to the real API). - Drop the stale `## Comments` assertion: tinyhumansai#3113 stopped inlining comments into the rendered issue body (fetch_issue_comments is now unused); assert on the content read_issue still renders. Verified: passes deterministically (3x, ~0.3s, no network). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Same fix as round21, applied to the two sibling github-reader tests: - Add a failing `git` stub on PATH so GithubReader can't reach the real `git clone` of github.com and deterministically uses the fake `gh`. - Glob the fake-`gh` list/comments path patterns so they match the reader's actual `?per_page=..&page=..` queries instead of silently falling through to the real api.github.com. - memory_sync_sources: drop the stale `## Comments` assertion (tinyhumansai#3113 stopped inlining comments into the issue body); assert on the rendered content. Verified: both pass deterministically (~0.7s, no network). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

graycyrus

Two new commits reviewed (github-reader test hermetic fixes). Looks solid.

Summary:

Added fake git stub so the reader can't fall back to real git clone — tests now use mocked gh deterministically
Updated fake-gh case patterns from exact strings to glob patterns (\?*) so they match the reader's actual query strings with ?per_page=...&page=... params
Dropped stale ## Comments assertions (no longer inlined per #3113); now assert on the rendered issue content instead

All prior findings are resolved. Code is clean — no debug, no console.logs, changes are minimal and focused.

CI is still pending on Rust Core Coverage, Frontend Coverage, and a couple E2E lanes. Once those clear, I'll come back and APPROVE. Let me know if you run into any blockers.

owned_domain coverage asserted the responses fallback request's /input/0/content equals "fallback please" (a plain string). The OpenAI-compatible responses payload now sends structured content parts ([{text, type:"input_text"}], tinyhumansai#2748/tinyhumansai#3124), so assert that shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

round22_cron_add_tool used a hardcoded `at: 2026-05-31T00:00:00Z` schedule, which is now in the past, so cron creation fails validation ("'at' must be in the future"). Compute the `at` time at runtime (now + 30 days) so the test never expires again. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

tests/memory_sync_sources_raw_coverage_e2e.rs (1)
314-324: ⚖️ Poor tradeoff

Consider extracting the git stub creation into a shared test utility.

The identical git stub creation code (lines 314-324) appears in both tests/memory_sync_sources_raw_coverage_e2e.rs and tests/near90_closure_raw_coverage_e2e.rs:433-443. Extracting to a shared helper function would eliminate duplication.

However, given this is test fixture code with limited reuse scope, the refactor is optional and can be deferred.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/memory_sync_sources_raw_coverage_e2e.rs` around lines 314 - 324,
Extract the duplicated git stub creation into a shared test helper (e.g., fn
create_git_stub(bin: &Path) -> PathBuf in a tests helper module) and replace the
inline block that creates `git_stub` with a call to that helper; the helper
should perform the same actions (write the script with std::fs::write, and on
Unix set executable perms using std::os::unix::fs::PermissionsExt / set_mode,
then return the PathBuf to the created stub) so both
`tests::memory_sync_sources_raw_coverage_e2e` and
`tests::near90_closure_raw_coverage_e2e` call `create_git_stub` instead of
duplicating the code that assigns `git_stub`.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/memory_sync_sources_raw_coverage_e2e.rs`:
- Around line 314-324: Extract the duplicated git stub creation into a shared
test helper (e.g., fn create_git_stub(bin: &Path) -> PathBuf in a tests helper
module) and replace the inline block that creates `git_stub` with a call to that
helper; the helper should perform the same actions (write the script with
std::fs::write, and on Unix set executable perms using
std::os::unix::fs::PermissionsExt / set_mode, then return the PathBuf to the
created stub) so both `tests::memory_sync_sources_raw_coverage_e2e` and
`tests::near90_closure_raw_coverage_e2e` call `create_git_stub` instead of
duplicating the code that assigns `git_stub`.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ea4a364d-c80e-40c9-9146-be3df10039fc

📥 Commits

Reviewing files that changed from the base of the PR and between 695bacc and be29029.

📒 Files selected for processing (3)

tests/memory_sync_sources_raw_coverage_e2e.rs
tests/near90_closure_raw_coverage_e2e.rs
tests/owned_domain_raw_coverage_e2e.rs

tinyhumansai#2952 added a debug_assert in ApprovalGate::new requiring session_id to start with `session-` (the per-launch UUID shape) to guard against credential leaks. The approval coverage tests init the global gate with ids like `approval-raw-e2e-session` / `worker-b-approval-session`, which trip the assert when that test happens to initialize the global gate first (order-dependent — why it read as a flake). Prefix the test session_ids with `session-` so they satisfy the guard regardless of test order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sanil-23 · 2026-06-01T19:30:28Z

@graycyrus Rust Core Coverage is green now. Beyond the composio assertion you flagged, CI surfaced (one per run, since it stops at the first failure) a chain of #3113/#2748-era stale + flaky tests, all now fixed:

3 github-reader tests — the reader does a real git clone first then falls back to the real api.github.com on any gh mismatch, so they were flaky. Stubbed git (forces the fake-gh path) + globbed the fake-gh path patterns + dropped the now-removed ## Comments inlining assertion.
inference payload — responses-API input content is structured parts now (Add Codex OAuth provider support #2748/fix(inference): accept reasoning field alias for thinking-mode providers #3124), not a bare string.
cron — a time-bomb: hardcoded at: 2026-05-31 is now in the past; switched to runtime now + 30d.
ApprovalGate session_id — feat(approval): per-launch UUID session_id + scrub legacy bearer leaks #2952's session- prefix guard; the test ids didn't have it (order-dependent on which test inits the global gate first).

Verified the whole suite locally (--no-fail-fast --test-threads=1): only the macOS-cfg seatbelt test 'fails', which can't run on Linux. Ready for another look.

…tests # Conflicts: # .github/workflows/coverage.yml # .github/workflows/pr-ci.yml # tests/memory_sources_closure_round23_raw_coverage_e2e.rs # tests/memory_sources_readers_round21_raw_coverage_e2e.rs # tests/memory_sync_sources_raw_coverage_e2e.rs # tests/memory_threads_raw_coverage_e2e.rs # tests/owned_domain_raw_coverage_e2e.rs # tests/tool_registry_approval_raw_coverage_e2e.rs # tests/tools_agent_credentials_state_raw_coverage_e2e.rs # tests/tools_composio_network_leftovers_raw_coverage_e2e.rs # tests/tools_composio_round22_raw_coverage_e2e.rs # tests/worker_b_raw_coverage_e2e.rs

The merge took main's parallel workflow, but env_lock only guards env vars — not other process-global state like the toolkit/connection registry exercised by tools_approval_channels. Under llvm-cov's slower instrumentation those non-env races flake (unknown_toolkit suggestion assertion). Restore the serialized run (proven green at 13d9cbf, endorsed by graycyrus) while keeping main's --no-fail-fast + build jobs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ce fix) Replaces the --test-threads=1 serialization with a surgical lock so the coverage job stays parallel/fast. The test read the process-global connection/toolkit registry without holding env_lock, while sibling tests swap OPENHUMAN_WORKSPACE under it — under llvm-cov's slower parallel run that trampled the integrations tool's toolkit list and dropped gmail_pro/slack_bot from the unknown-toolkit suggestion. It was the only test in the file missing the lock. Workflows revert to main's parallel run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sanil-23 · 2026-06-01T21:26:40Z

Rebased on latest `main` — conflicts resolved, CI green ✅

Merged the latest upstream/main (which advanced past #3074 "separate agent action sandbox from internal workspace state") and resolved all 12 conflicting test files.

Conflict resolution

#3074 independently fixed the same stale raw-coverage assertions this PR targeted, and its versions are canonical (proper uuid::Uuid::new_v4() session gates, far-future cron literals, robust !output().is_empty() checks, globbed commits\?*) gh-stub matching). I took upstream/main's version for all 12 conflicts — those stale-test fixes are now upstream, so this PR no longer duplicates them.

What this PR still uniquely carries (diff vs `main`)

tests/tools_approval_channels_raw_coverage_e2e.rs — orchestrator_tool_synthesis_covers_agent_and_integration_delegation_edges was the only test in the file not holding env_lock. It reads the process-global connection/toolkit registry while sibling tests swap OPENHUMAN_WORKSPACE under the lock, so under cargo-llvm-cov's slower parallel run it intermittently saw the wrong toolkit set and dropped gmail_pro/slack_bot from the unknown-toolkit suggestion. Added the one-line let _lock = env_lock(); it was missing. This keeps the coverage job parallel and fast — env_lock covers the non-env global race that a blanket --test-threads=1 would otherwise be needed for.
src/core/cli.rs — 16 MiB tokio worker stack so nested sub-agent turns don't overflow the default 2 MiB stack and SIGABRT the JSON-RPC server (fixes the Playwright lane-1/4 subagent crash).
tests/near90_closure_raw_coverage_e2e.rs — failing git stub so the round20 github-reader test is hermetic (2.6 s vs ~120 s real-network fallback).

CI

All required checks green on 657781a0 (Rust Core Coverage passed at ~17 min, parallel; Playwright lanes incl. 1/4 green; one unrelated settings-feature-preferences E2E flake cleared on re-run).

@graycyrus — the earlier CHANGES_REQUESTED was against the reverted env_lock infra (which caused the double-lock deadlocks you flagged). That approach is gone; the only env-lock change now is the single missing-lock line above. Could you take another look? 🙏

…tion) (tinyhumansai#3156) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sanil-23 and others added 2 commits June 1, 2026 20:10

sanil-23 requested a review from a team June 1, 2026 15:11

coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. working A PR that is being worked on by the team. labels Jun 1, 2026

coderabbitai Bot previously approved these changes Jun 1, 2026

View reviewed changes

sanil-23 added 2 commits June 1, 2026 20:55

Merge branch 'fix/subagent-stack-overflow' into fix/coverage-stale-tests

b389418

Merge remote-tracking branch 'upstream/main' into fix/coverage-stale-…

4f6f891

…tests

graycyrus reviewed Jun 1, 2026

View reviewed changes

sanil-23 dismissed coderabbitai[bot]’s stale review via 543b847 June 1, 2026 15:36

coderabbitai Bot requested changes Jun 1, 2026

View reviewed changes

Comment thread tests/near90_closure_raw_coverage_e2e.rs Outdated

Comment thread tests/owned_domain_raw_coverage_e2e.rs Outdated

Comment thread tests/worker_b_raw_coverage_e2e.rs Outdated

Revert "test(coverage): serialize env-racing raw-coverage tests with …

99276a3

…env_lock (drop --test-threads=1)" This reverts commit 543b847.

sanil-23 mentioned this pull request Jun 1, 2026

fix(core): 16 MiB worker stack so subagent delegation doesn't crash the JSON-RPC server #3155

Closed

7 tasks

graycyrus requested changes Jun 1, 2026

View reviewed changes

coderabbitai Bot previously approved these changes Jun 1, 2026

View reviewed changes

graycyrus reviewed Jun 1, 2026

View reviewed changes

sanil-23 dismissed coderabbitai[bot]’s stale review via 695bacc June 1, 2026 17:47

coderabbitai Bot previously approved these changes Jun 1, 2026

View reviewed changes

sanil-23 dismissed coderabbitai[bot]’s stale review via a1b5484 June 1, 2026 17:50

graycyrus reviewed Jun 1, 2026

View reviewed changes

sanil-23 and others added 2 commits June 1, 2026 23:49

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

sanil-23 and others added 3 commits June 2, 2026 01:14

sanil-23 requested a review from graycyrus June 1, 2026 21:26

sanil-23 merged commit 511ad84 into tinyhumansai:main Jun 1, 2026
35 of 36 checks passed

senamakel pushed a commit to senamakel/openhuman that referenced this pull request Jun 2, 2026

test: green Rust Core Coverage (stale assertions + env-race serializa…

ea1a5b8

…tion) (tinyhumansai#3156) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai Bot mentioned this pull request Jun 2, 2026

refactor(subconscious): replace task system with agent-per-tick model #3079

Merged

12 tasks

sanil-23 mentioned this pull request Jun 2, 2026

bug: Rust Core Coverage CI broken on main since PR #3023 — inference_local_admin_raw_coverage_e2e regressions #3061

Closed

Conversation

sanil-23 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Submission Checklist

Impact

Related

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

sanil-23 commented Jun 1, 2026

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

sanil-23 commented Jun 1, 2026

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sanil-23 commented Jun 1, 2026

Uh oh!

sanil-23 commented Jun 1, 2026

Rebased on latest main — conflicts resolved, CI green ✅

Conflict resolution

What this PR still uniquely carries (diff vs main)

CI

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanil-23 commented Jun 1, 2026 •

edited

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Rebased on latest `main` — conflicts resolved, CI green ✅

What this PR still uniquely carries (diff vs `main`)