fix(agent): halt agent loop on first backend user-state failure (TAURI-RUST-5KG)#3334
Conversation
…s Sentry (TAURI-RUST-5KG) `web_search_tool` flooded Sentry with 1860 events / 9 users for backend 400 "Insufficient balance" — the agent retried 19 times in a single turn, and each `Ok(Err(e))` in `agent::harness::engine::tools::run_one_tool` called the always-capture `report_error`. The integrations breadcrumb classifier already knew "Insufficient balance" was a user-state error (see tinyhumansai#2809 / TAURI-RUST-4ZF), but that distinction was lost at the `?` boundary because `IntegrationClient::{post,get}` flattened every non-2xx into an opaque `anyhow::Error`. Fix it at the architectural boundary that knows the difference: - Add `BackendUserStateError` (Display + std::error::Error) and the `is_backend_user_state_error(&anyhow::Error)` helper to the integrations client. Wrap the four bail sites (post/get × non-2xx / envelope-failure) with the typed marker when `expected_error_kind` classifies the message as `BackendUserError`, `BudgetExhausted`, or `ProviderUserState`. Display string is preserved verbatim, so the ~30 existing callers that stringify the error see no behaviour change. - In `agent::harness::engine::tools::run_one_tool`, downcast the bubbled error: when the typed marker is present, log a warn breadcrumb (no Sentry capture) and surface the message to the LLM via the existing `(text, false)` failure tuple. The circuit breaker still records the failure so 19 retries with the same user-state message trip out. Real failures (5xx, transport bugs, unknown envelope shapes) keep their `report_error` Sentry path unchanged. Scope: every tool that calls the integrations backend benefits — search/parallel/tinyfish, all composio, twilio, apify, google_places, stock_prices, web3, linkedin_enrichment — not just `web_search_tool`. Tests: 7 new cases in `client_tests.rs` pin both halves of the contract: typed wrapping fires for 400/insufficient-balance, 403/toolkit-not-enabled, and 2xx-envelope user-state failures; 500s remain un-typed; Display verbatim; `is_backend_user_state_error` walks the chain so `.context()` wraps don't silently re-enable capture.
…I-RUST-5KG) The previous attempt (tinyhumansai#3318, closed) routed `BackendUserStateError` away from `report_error` so identical retries didn't flood Sentry. That silenced the alarm but left the actual bug — the agent kept retrying a terminally-failing tool 19 times per turn — running. Real fix: 1. Embed a stable `BACKEND_USER_STATE_MARKER` in the tool result text whenever `run_one_tool` sees a `BackendUserStateError` propagated from the integrations client. The marker rides through `ToolRunResult.text` exactly like the existing `POLICY_BLOCKED_MARKER` / `POLICY_DENIED_MARKER` pattern. 2. Extend `RepeatFailureGuard::record` to halt on the **first** occurrence of the marker — not the second identical attempt (`HARD_REJECT_REPEAT_THRESHOLD = 2`), not the third (`REPEAT_FAILURE_THRESHOLD = 3`), not the sixth consecutive (`NO_PROGRESS_FAILURE_THRESHOLD = 6`). The underlying condition (empty wallet, disabled toolkit, expired session) is *global* — varying the query or pivoting to a different paid tool cannot resolve it. Halt immediately with a clear "user must act" summary so the agent reports back instead of grinding. 3. Replace the closed PR's silent skip with `report_error_or_expected`. The existing breadcrumb classifier demotes the `BackendUserError` / `BudgetExhausted` / `ProviderUserState` buckets to a single warn-level breadcrumb — observability still sees one classified event per turn instead of 19 always-error captures, but errors are not *suppressed*. Net effect on TAURI-RUST-5KG (~1860 Sentry hits / 9 users): ~19 always-captured errors per turn → 1 classifier-demoted breadcrumb per turn, AND the agent stops retrying and surfaces the actionable message to the user. Includes 4 new `RepeatFailureGuard` tests pinning: halt on first occurrence; halt regardless of `(tool, args)`; precedence over the generic 3-attempt threshold; unmarked failures still use the 3-attempt threshold (no behaviour widening).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR classifies backend "user-state" failures at the integration layer, returns a typed marker error, embeds a stable marker into tool error text in run_one_tool, and makes the RepeatFailureGuard halt immediately when that marker is seen. ChangesBackend user-state marker and halt mechanism
Sequence DiagramssequenceDiagram
participant Backend as Backend Response
participant IntClient as IntegrationClient
participant Classify as classify_as_user_state
participant RunTool as run_one_tool
participant ToolLoop as RepeatFailureGuard
Backend->>IntClient: !status.is_success() or success:false
IntClient->>Classify: classify_as_user_state(message)
alt is user-state failure
Classify-->>IntClient: BackendUserStateError
IntClient-->>RunTool: Err(user_state)
RunTool->>RunTool: is_backend_user_state_error?
RunTool->>RunTool: embed BACKEND_USER_STATE_MARKER
RunTool-->>ToolLoop: tool error text + marker
else non-user-state
Classify-->>IntClient: None
IntClient-->>RunTool: Err(generic)
RunTool-->>ToolLoop: tool error text (no marker)
end
ToolLoop->>ToolLoop: detect marker in text?
alt marker present
ToolLoop->>ToolLoop: halt immediately
ToolLoop-->>ToolLoop: return Some(Stopping: ...)
else no marker
ToolLoop->>ToolLoop: apply retry threshold
ToolLoop-->>ToolLoop: return None or continue
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/harness/tool_loop.rs`:
- Around line 143-151: The halt message currently uses truncate_for_halt(result)
which may contain the internal token "[backend-user-state]"; strip or remove
that marker from the truncated text before formatting the user-visible halt
reason. In the block that constructs the message (the return Some(format!(...))
in tool_loop.rs), take the output of truncate_for_halt(result), remove the exact
substring "[backend-user-state]" (or use a safe replace/strip function) and then
interpolate the cleaned string into the format call so no internal routing token
is leaked to users.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eb8570c7-81eb-4cb2-8dad-ab1141edfdd1
📒 Files selected for processing (6)
src/openhuman/agent/harness/engine/tools.rssrc/openhuman/agent/harness/tool_loop.rssrc/openhuman/agent/harness/tool_loop_tests.rssrc/openhuman/integrations/client.rssrc/openhuman/integrations/client_tests.rssrc/openhuman/integrations/mod.rs
YellowSnnowmann
left a comment
There was a problem hiding this comment.
Review
Walkthrough
This PR correctly addresses TAURI-RUST-5KG at the root: the runaway retry loop (not just the Sentry noise). Typing the error at the HTTP boundary, embedding BACKEND_USER_STATE_MARKER, and halting RepeatFailureGuard on first occurrence is the right architecture — it mirrors the existing POLICY_BLOCKED_MARKER/POLICY_DENIED_MARKER pattern cleanly. Design is solid, CI is fully green (all 20 checks pass), and the test coverage is thorough.
One open issue needs to be addressed before merge — it's the same item CodeRabbit flagged.
🔴 Blocker — src/openhuman/agent/harness/tool_loop.rs:151
Internal routing token leaks into the user/LLM-visible halt message.
At the point the halt branch fires, result is:
[backend-user-state] Error executing web_search_tool: Backend returned 400 …: Insufficient balance
Passing it directly to truncate_for_halt(result) embeds the raw [backend-user-state] token in the message the model (and user) sees. The marker is an internal routing token — it has no meaning outside of RepeatFailureGuard and will read as noise or cause confusion.
Fix:
// before
if result.contains(BACKEND_USER_STATE_MARKER) {
return Some(format!(
"Stopping: the `{tool}` call returned a backend user-state error \
— this is a deterministic condition that requires user action …\
Reason:\n{}\n\n…",
truncate_for_halt(result),
));
}
// after
if result.contains(BACKEND_USER_STATE_MARKER) {
let clean_reason = result
.replace(BACKEND_USER_STATE_MARKER, "")
.trim()
.to_string();
return Some(format!(
"Stopping: the `{tool}` call returned a backend user-state error \
— this is a deterministic condition that requires user action …\
Reason:\n{}\n\n…",
truncate_for_halt(&clean_reason),
));
}🟡 Minor — tool_loop_tests.rs — missing negative assertion in backend_user_state_marker_halts_on_first_occurrence
The three existing assert!(msg.contains(…)) checks pass even when the marker is present in the output. Once the fix above is applied, add a negative assertion to lock in the clean-output behavior:
assert!(
!msg.contains(BACKEND_USER_STATE_MARKER),
"internal routing token must not leak into user-visible halt message; got: {msg}"
);Verified / looks good
classify_as_user_statecorrectly narrows toBackendUserError/BudgetExhausted/ProviderUserState— transport and session error kinds are excluded, keeping existing retry/re-auth machinery intact ✅is_backend_user_state_errorchain-walks anyhow so future.context()wraps don't silently re-enable Sentry capture ✅- All 4 bail sites in
client.rs(POST non-2xx, POST envelope, GET non-2xx, GET envelope) covered ✅ Displayimpl onBackendUserStateErroris identical to the originalanyhow::bail!string — additive, no stringify callers broken ✅backend_user_state_unmarked_failures_use_normal_thresholdis the critical regression guard — ordinary 3-attempt threshold untouched ✅outcome=failed_user_stateSentry tag is a clean dashboard win ✅
Just the one fix in tool_loop.rs + the test pin and this is ready.
…ssage The internal routing token was passed through truncate_for_halt(result) verbatim into the LLM/user-visible halt summary. Strip it with .replace() + .trim() before formatting so the message reads cleanly without the [backend-user-state] prefix. Also pins the clean-output invariant with a negative assertion in backend_user_state_marker_halts_on_first_occurrence — addresses @coderabbitai and @YellowSnnowmann review (TAURI-RUST-5KG).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/agent/harness/tool_loop.rs (1)
142-156: ⚡ Quick winConsider adding debug-level logging when the marker triggers a halt.
The circuit breaker has multiple halt conditions (backend user-state, hard rejects, repeat threshold, no-progress) but only the caller logs a generic "circuit breaker tripped" message. Adding a debug log here when the marker is detected would help distinguish marker-based halts from other rules during troubleshooting, especially given the global-halt semantic is new.
🔍 Example logging addition
if result.contains(BACKEND_USER_STATE_MARKER) { + tracing::debug!( + tool, + "[RepeatFailureGuard] backend user-state marker detected — halting on first occurrence" + ); let clean_reason = result .replace(BACKEND_USER_STATE_MARKER, "") .trim()As per coding guidelines: "Add substantial debug-level logs while implementing features or fixes in Rust using
log/tracingcrate with stable prefixes and correlation fields (request IDs, method names, entity IDs) for grep-friendly tracing"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/harness/tool_loop.rs` around lines 142 - 156, The BACKEND_USER_STATE_MARKER branch in the tool-halt logic (inside the if result.contains(BACKEND_USER_STATE_MARKER) block) should emit a debug-level log before returning so marker-based halts are distinguishable; add a tracing::debug! (or log::debug!) call that logs a stable prefix like "circuit_breaker:backend_user_state" and includes correlation fields (request ID, method / handler name, tool name via `tool`, and a truncated reason using `truncate_for_halt(&clean_reason)`) so operators can grep and correlate these events; place this single debug call immediately after computing `clean_reason` and before the existing return.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/agent/harness/tool_loop.rs`:
- Around line 142-156: The BACKEND_USER_STATE_MARKER branch in the tool-halt
logic (inside the if result.contains(BACKEND_USER_STATE_MARKER) block) should
emit a debug-level log before returning so marker-based halts are
distinguishable; add a tracing::debug! (or log::debug!) call that logs a stable
prefix like "circuit_breaker:backend_user_state" and includes correlation fields
(request ID, method / handler name, tool name via `tool`, and a truncated reason
using `truncate_for_halt(&clean_reason)`) so operators can grep and correlate
these events; place this single debug call immediately after computing
`clean_reason` and before the existing return.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 081fc44c-3355-4b45-a464-7c6b54c1f259
📒 Files selected for processing (2)
src/openhuman/agent/harness/tool_loop.rssrc/openhuman/agent/harness/tool_loop_tests.rs
… fires Adds a tracing::debug! call at [circuit_breaker:backend_user_state] so marker-triggered halts are distinguishable from generic repeat-threshold or no-progress halts in operator traces — addresses CodeRabbit nitpick on TAURI-RUST-5KG.
|
as we spoke. serious errors like out of balance need to fail loudly. |
Summary
BackendUserStateErroraway fromreport_errorso identical retries stopped flooding Sentry — silencing the alarm without putting out the fire. The actual bug is upstream: whenweb_search_toolhits400 Insufficient balance, the agent retries 19× per turn because the underlying condition (empty wallet) is global and the existing(tool, args)-coupled breaker can't catch varied queries.BACKEND_USER_STATE_MARKERin the failed tool result and halt the agent loop on the first occurrence with a "user must act" summary — same idea as the existingPOLICY_BLOCKED_MARKER/POLICY_DENIED_MARKERpattern, but more aggressive because the condition can never be resolved by retry.report_error_or_expectedinstead of silently skipping. The existing breadcrumb classifier already knowsBackendUserError/BudgetExhausted/ProviderUserState→ demoted to warn-level. Net: ~19 always-error captures per turn → 1 classified breadcrumb per turn, AND the agent stops retrying.Problem
Sentry TAURI-RUST-5KG —
tool=web_search_tool,operation=execute,iteration=19, ~1860 events / 9 users on Windows releases 0.56.0 → 0.57.13. The closed PR #3318 correctly identified that the integrations breadcrumb classifier already handledInsufficient balanceas a user-state error, then flattened the classification whenIntegrationClient::{post,get}bailed withanyhow!. It typed the error at the boundary (BackendUserStateError) and maderun_one_tooldowncast.But the closed PR's chosen exit was to skip
report_errorfor the user-state case. That makes Sentry quiet — and leaves the agent retrying 19 times because the per-(tool, args)REPEAT_FAILURE_THRESHOLD = 3doesn't trip when the LLM varies the search query, and the consecutiveNO_PROGRESS_FAILURE_THRESHOLD = 6resets on any interleaved success. The user objected that "ignoring an error is not fixing it" — they are right; the bug is the retry loop, not the captured event count.Solution
Three coordinated changes, building on the typed
BackendUserStateErrorboundary work from the closed PR (cherry-picked into the first commit of this PR so the boundary classification remains the source of truth):src/openhuman/agent/harness/tool_loop.rspub(crate) const BACKEND_USER_STATE_MARKER: &str = "[backend-user-state]". Module docstring spells out why it halts on first occurrence (unlikeHARD_REJECT_REPEAT_THRESHOLD, which lets the model see one error and pivot) — the condition is global, retries with different args or different paid tools can't help, only the user can.RepeatFailureGuard::recordnow checks for the marker before the existing(tool, args)repeat-threshold gate and the hard-reject branch. When present andsuccess == false, it returns a halt summary on the very first occurrence with the actionable error preserved.src/openhuman/agent/harness/engine/tools.rsOk(Err(e))branch: whenis_backend_user_state_error(&e)is true, prefix the LLM-visible result text withBACKEND_USER_STATE_MARKER(so the breaker sees it) AND callreport_error_or_expectedinstead ofreport_error. The classifier demotesBackendUserError/BudgetExhausted/ProviderUserStateto warn-level — Sentry still sees one breadcrumb per turn (we're not suppressing observability), just classified correctly. System / 5xx / non-classifiable errors continue usingreport_errorwith the always-capture path, unchanged.outcome=failed_user_statetag distinguishes the demoted path fromoutcome=failedin dashboards.src/openhuman/agent/harness/tool_loop_tests.rsFour new
RepeatFailureGuardtests, all in the same style as the existinghard_reject_*tests:backend_user_state_marker_halts_on_first_occurrencebackend_user_state_marker_halts_regardless_of_args(tool, args)repetition (the closed-PR shortcoming)backend_user_state_marker_takes_precedence_over_generic_thresholdcount=1 < REPEAT_FAILURE_THRESHOLD=3still tripsbackend_user_state_unmarked_failures_use_normal_thresholdDesign notes
ToolRunResultwith a typedfailure_kindenum so the contract mirrors the existingPOLICY_BLOCKED_MARKER/POLICY_DENIED_MARKERconvention exactly. The breaker already string-scans for those; adding a third marker is a 6-line change.run_one_tool'sOk(Err(e))branch — the same branch the integrations tools surface through via?. Other tool paths (e.g. tools that wrap errors asOk(ToolResult { is_error: true, … })) are untouched here; they don't carry the typedBackendUserStateErrorbecause the type info is already lost by the time they format the body, and the existing thresholds catch their identical retries fine. If a future tool needs the typed marker, the fix is at its boundary, not here.HARD_REJECT_REPEAT_THRESHOLD = 2. Hard rejects let the model see one block and pivot (the LLM might try a different file, different command). For backend-user-state, pivot is futile — the wallet is empty for all paid tools. The halt summary explicitly tells the model to surface, not retry.report_error_or_expectedruns the same classifier the integrations client already used at the breadcrumb site; if the body somehow doesn't classify (network bug, malformed wrap), it falls through toreport_error_messageand Sentry sees the unclassified event. Defense in depth — no silent black-hole.Submission Checklist
RepeatFailureGuardtests pin the first-occurrence halt path, the (tool, args) decoupling, the precedence ordering vs. the generic threshold, and the negative-case regression guard. 7 client tests from the cherry-picked typed-error commit cover the four wrapped bail sites + Display preservation + chain-walk discrimination + 5xx negative.docs/TEST-COVERAGE-MATRIX.md(no new user-visible feature; tightens an existing safety net)## Relateddocs/RELEASE-MANUAL-SMOKE.md)Closes #NNNin the## RelatedsectionImpact
src/openhuman/agent/harness/tool_loop.rs+src/openhuman/agent/harness/engine/tools.rs+ the cherry-pickedsrc/openhuman/integrations/client.rstyped-error boundary. No frontend, no Tauri shell, no schema changes.web_search_toolcall now stop on the first failure and surface a "out of credits — please add credits" message to the user. Identical error text still flows through; the difference is the loop terminating immediately instead of generating 18 more identical failures.is_backend_user_state_error(&e)downcast and one extraresult.contains(BACKEND_USER_STATE_MARKER)string scan per failed tool call. Both run only on the error path.[policy-blocked]today.Related
web_search_toolat all when the wallet is empty — strictly orthogonal to this PR, which addresses the retry loop directly.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/sentry-tauri-rust-5kg-halt-on-user-stateValidation Run
pnpm --filter openhuman-app format:checkpnpm typecheckcargo test --manifest-path Cargo.toml --lib agent::harness::tool_loop(26/26 pass, including 4 new),cargo test --manifest-path Cargo.toml --lib integrations::client(26/26 pass, no regression from cherry-picked typed-error commit),cargo test --manifest-path Cargo.toml --lib agent::harness(423/423 pass),cargo test --manifest-path Cargo.toml --lib observability::(142/142 pass, classifier contract preserved)cargo fmt --manifest-path Cargo.toml -- --checkclean;cargo check --manifest-path Cargo.toml --testscleanValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
BackendUserStateErrorfrom the integrations client (insufficient balance, missing required field, toolkit not enabled) now halt the loop on the first occurrence with a "user must act" summary, instead of grinding through up to 19 retries before the iteration cap.Parity Contract
REPEAT_FAILURE_THRESHOLD, 2-attemptHARD_REJECT_REPEAT_THRESHOLD, and 6-consecutiveNO_PROGRESS_FAILURE_THRESHOLDgates verbatim. The marker check is added before those gates; when absent, control falls through and behaviour is unchanged.backend_user_state_unmarked_failures_use_normal_thresholdpins that ordinary failures still need 3 identical retries (no widening);hard_reject_blocked_halts_on_first_repeat_not_thirdandhard_reject_denied_halts_on_first_repeatcontinue to pass unchanged, confirming the marker check doesn't perturb the existing hard-reject paths.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Bug Fixes
Tests