Skip to content

fix(agent): halt agent loop on first backend user-state failure (TAURI-RUST-5KG)#3334

Closed
CodeGhost21 wants to merge 4 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-5kg-halt-on-user-state
Closed

fix(agent): halt agent loop on first backend user-state failure (TAURI-RUST-5KG)#3334
CodeGhost21 wants to merge 4 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-5kg-halt-on-user-state

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented Jun 4, 2026

Summary

  • The previous attempt at TAURI-RUST-5KG (#3318, closed) routed BackendUserStateError away from report_error so identical retries stopped flooding Sentry — silencing the alarm without putting out the fire. The actual bug is upstream: when web_search_tool hits 400 Insufficient balance, the agent retries 19× per turn because the underlying condition (empty wallet) is global and the existing (tool, args)-coupled breaker can't catch varied queries.
  • Real fix: stop the runaway. Embed a BACKEND_USER_STATE_MARKER in the failed tool result and halt the agent loop on the first occurrence with a "user must act" summary — same idea as the existing POLICY_BLOCKED_MARKER / POLICY_DENIED_MARKER pattern, but more aggressive because the condition can never be resolved by retry.
  • Route the (single) terminal capture through report_error_or_expected instead of silently skipping. The existing breadcrumb classifier already knows BackendUserError / BudgetExhausted / ProviderUserState → demoted to warn-level. Net: ~19 always-error captures per turn → 1 classified breadcrumb per turn, AND the agent stops retrying.

Problem

Sentry TAURI-RUST-5KGtool=web_search_tool, operation=execute, iteration=19, ~1860 events / 9 users on Windows releases 0.56.0 → 0.57.13. The closed PR #3318 correctly identified that the integrations breadcrumb classifier already handled Insufficient balance as a user-state error, then flattened the classification when IntegrationClient::{post,get} bailed with anyhow!. It typed the error at the boundary (BackendUserStateError) and made run_one_tool downcast.

But the closed PR's chosen exit was to skip report_error for the user-state case. That makes Sentry quiet — and leaves the agent retrying 19 times because the per-(tool, args) REPEAT_FAILURE_THRESHOLD = 3 doesn't trip when the LLM varies the search query, and the consecutive NO_PROGRESS_FAILURE_THRESHOLD = 6 resets on any interleaved success. The user objected that "ignoring an error is not fixing it" — they are right; the bug is the retry loop, not the captured event count.

Solution

Three coordinated changes, building on the typed BackendUserStateError boundary work from the closed PR (cherry-picked into the first commit of this PR so the boundary classification remains the source of truth):

src/openhuman/agent/harness/tool_loop.rs

  • New pub(crate) const BACKEND_USER_STATE_MARKER: &str = "[backend-user-state]". Module docstring spells out why it halts on first occurrence (unlike HARD_REJECT_REPEAT_THRESHOLD, which lets the model see one error and pivot) — the condition is global, retries with different args or different paid tools can't help, only the user can.
  • RepeatFailureGuard::record now checks for the marker before the existing (tool, args) repeat-threshold gate and the hard-reject branch. When present and success == false, it returns a halt summary on the very first occurrence with the actionable error preserved.

src/openhuman/agent/harness/engine/tools.rs

  • Ok(Err(e)) branch: when is_backend_user_state_error(&e) is true, prefix the LLM-visible result text with BACKEND_USER_STATE_MARKER (so the breaker sees it) AND call report_error_or_expected instead of report_error. The classifier demotes BackendUserError / BudgetExhausted / ProviderUserState to warn-level — Sentry still sees one breadcrumb per turn (we're not suppressing observability), just classified correctly. System / 5xx / non-classifiable errors continue using report_error with the always-capture path, unchanged.
  • New outcome=failed_user_state tag distinguishes the demoted path from outcome=failed in dashboards.

src/openhuman/agent/harness/tool_loop_tests.rs

Four new RepeatFailureGuard tests, all in the same style as the existing hard_reject_* tests:

Test Pins
backend_user_state_marker_halts_on_first_occurrence first-occurrence halt + halt-summary shape (labels failure class, tells model to surface, preserves root cause)
backend_user_state_marker_halts_regardless_of_args not coupled to (tool, args) repetition (the closed-PR shortcoming)
backend_user_state_marker_takes_precedence_over_generic_threshold ordering: marker check fires before the count-based gates, so count=1 < REPEAT_FAILURE_THRESHOLD=3 still trips
backend_user_state_unmarked_failures_use_normal_threshold regression guard: ordinary failures still need 3 identical retries to trip (no behaviour widening)

Design notes

  • Marker pattern was chosen over extending ToolRunResult with a typed failure_kind enum so the contract mirrors the existing POLICY_BLOCKED_MARKER / POLICY_DENIED_MARKER convention exactly. The breaker already string-scans for those; adding a third marker is a 6-line change.
  • The marker is added in run_one_tool's Ok(Err(e)) branch — the same branch the integrations tools surface through via ?. Other tool paths (e.g. tools that wrap errors as Ok(ToolResult { is_error: true, … })) are untouched here; they don't carry the typed BackendUserStateError because the type info is already lost by the time they format the body, and the existing thresholds catch their identical retries fine. If a future tool needs the typed marker, the fix is at its boundary, not here.
  • Halting on first occurrence is intentionally more aggressive than HARD_REJECT_REPEAT_THRESHOLD = 2. Hard rejects let the model see one block and pivot (the LLM might try a different file, different command). For backend-user-state, pivot is futile — the wallet is empty for all paid tools. The halt summary explicitly tells the model to surface, not retry.
  • No agent-side suppression. report_error_or_expected runs the same classifier the integrations client already used at the breadcrumb site; if the body somehow doesn't classify (network bug, malformed wrap), it falls through to report_error_message and Sentry sees the unclassified event. Defense in depth — no silent black-hole.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — 4 new RepeatFailureGuard tests pin the first-occurrence halt path, the (tool, args) decoupling, the precedence ordering vs. the generic threshold, and the negative-case regression guard. 7 client tests from the cherry-picked typed-error commit cover the four wrapped bail sites + Display preservation + chain-walk discrimination + 5xx negative.
  • N/A: behaviour-only change — Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md (no new user-visible feature; tightens an existing safety net)
  • N/A: behaviour-only change — All affected feature IDs from the matrix are listed in the PR description under ## Related
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: not a release-cut surface — Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md)
  • N/A: Sentry-tracked issue (TAURI-RUST-5KG), no GitHub issue — Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime: desktop core only — src/openhuman/agent/harness/tool_loop.rs + src/openhuman/agent/harness/engine/tools.rs + the cherry-picked src/openhuman/integrations/client.rs typed-error boundary. No frontend, no Tauri shell, no schema changes.
  • User-visible: agents that previously ground through 19 retries on an empty-wallet web_search_tool call now stop on the first failure and surface a "out of credits — please add credits" message to the user. Identical error text still flows through; the difference is the loop terminating immediately instead of generating 18 more identical failures.
  • Performance: negligible — one extra is_backend_user_state_error(&e) downcast and one extra result.contains(BACKEND_USER_STATE_MARKER) string scan per failed tool call. Both run only on the error path.
  • Security/migration/compat: none. The marker prefix is purely additive in the LLM-visible result text and matches an existing string convention. Existing callers that scan tool results don't recognise the marker; they treat it as an opaque error prefix exactly as they treat [policy-blocked] today.

Related

  • Closes: N/A (Sentry TAURI-RUST-5KG, no linked GitHub issue)
  • Supersedes: #3318 (closed — silenced Sentry rather than fixing the runaway retry; the typed-error commit from that PR is preserved here as the first commit because the boundary classification is still correct)
  • Follow-up PR(s)/TODOs: a backend-driven pre-flight balance check could prevent the agent from picking web_search_tool at all when the wallet is empty — strictly orthogonal to this PR, which addresses the retry loop directly.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A (Sentry TAURI-RUST-5KG)

Commit & Branch

  • Branch: fix/sentry-tauri-rust-5kg-halt-on-user-state
  • Commit SHA: see PR head

Validation Run

  • N/A: Rust-only change — pnpm --filter openhuman-app format:check
  • N/A: Rust-only change — pnpm typecheck
  • Focused tests: cargo test --manifest-path Cargo.toml --lib agent::harness::tool_loop (26/26 pass, including 4 new), cargo test --manifest-path Cargo.toml --lib integrations::client (26/26 pass, no regression from cherry-picked typed-error commit), cargo test --manifest-path Cargo.toml --lib agent::harness (423/423 pass), cargo test --manifest-path Cargo.toml --lib observability:: (142/142 pass, classifier contract preserved)
  • Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml -- --check clean; cargo check --manifest-path Cargo.toml --tests clean
  • N/A: shell not touched — Tauri fmt/check (if changed)

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: agents that hit a typed BackendUserStateError from the integrations client (insufficient balance, missing required field, toolkit not enabled) now halt the loop on the first occurrence with a "user must act" summary, instead of grinding through up to 19 retries before the iteration cap.
  • User-visible effect: terminal "out of credits / disabled toolkit / sign in" failures surface immediately to the user instead of being preceded by 18 indistinguishable retry events. Error messages themselves are unchanged.

Parity Contract

  • Legacy behavior preserved: non-backend-user-state failures continue to use the existing 3-attempt REPEAT_FAILURE_THRESHOLD, 2-attempt HARD_REJECT_REPEAT_THRESHOLD, and 6-consecutive NO_PROGRESS_FAILURE_THRESHOLD gates verbatim. The marker check is added before those gates; when absent, control falls through and behaviour is unchanged.
  • Guard/fallback/dispatch parity checks: backend_user_state_unmarked_failures_use_normal_threshold pins that ordinary failures still need 3 identical retries (no widening); hard_reject_blocked_halts_on_first_repeat_not_third and hard_reject_denied_halts_on_first_repeat continue to pass unchanged, confirming the marker check doesn't perturb the existing hard-reject paths.

Duplicate / Superseded PR Handling

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of deterministic backend "user must act" failures (e.g., insufficient balance, toolkit disabled): operations now halt immediately on first detection and present a clear, actionable error instead of repeated retries.
    • Preserves the original actionable message while preventing repeated retry loops.
  • Tests

    • Added unit and integration tests to validate classification, halt-on-first-occurrence behavior, and correctness of user-facing error text.

…s Sentry (TAURI-RUST-5KG)

`web_search_tool` flooded Sentry with 1860 events / 9 users for backend
400 "Insufficient balance" — the agent retried 19 times in a single
turn, and each `Ok(Err(e))` in `agent::harness::engine::tools::run_one_tool`
called the always-capture `report_error`. The integrations breadcrumb
classifier already knew "Insufficient balance" was a user-state error
(see tinyhumansai#2809 / TAURI-RUST-4ZF), but that distinction was lost at the
`?` boundary because `IntegrationClient::{post,get}` flattened every
non-2xx into an opaque `anyhow::Error`.

Fix it at the architectural boundary that knows the difference:

- Add `BackendUserStateError` (Display + std::error::Error) and the
  `is_backend_user_state_error(&anyhow::Error)` helper to the
  integrations client. Wrap the four bail sites (post/get × non-2xx /
  envelope-failure) with the typed marker when
  `expected_error_kind` classifies the message as `BackendUserError`,
  `BudgetExhausted`, or `ProviderUserState`. Display string is
  preserved verbatim, so the ~30 existing callers that stringify the
  error see no behaviour change.

- In `agent::harness::engine::tools::run_one_tool`, downcast the
  bubbled error: when the typed marker is present, log a warn
  breadcrumb (no Sentry capture) and surface the message to the LLM
  via the existing `(text, false)` failure tuple. The circuit breaker
  still records the failure so 19 retries with the same user-state
  message trip out. Real failures (5xx, transport bugs, unknown
  envelope shapes) keep their `report_error` Sentry path unchanged.

Scope: every tool that calls the integrations backend benefits —
search/parallel/tinyfish, all composio, twilio, apify, google_places,
stock_prices, web3, linkedin_enrichment — not just `web_search_tool`.

Tests: 7 new cases in `client_tests.rs` pin both halves of the
contract: typed wrapping fires for 400/insufficient-balance,
403/toolkit-not-enabled, and 2xx-envelope user-state failures; 500s
remain un-typed; Display verbatim; `is_backend_user_state_error`
walks the chain so `.context()` wraps don't silently re-enable
capture.
…I-RUST-5KG)

The previous attempt (tinyhumansai#3318, closed) routed `BackendUserStateError`
away from `report_error` so identical retries didn't flood Sentry.
That silenced the alarm but left the actual bug — the agent kept
retrying a terminally-failing tool 19 times per turn — running.

Real fix:

1. Embed a stable `BACKEND_USER_STATE_MARKER` in the tool result text
   whenever `run_one_tool` sees a `BackendUserStateError` propagated
   from the integrations client. The marker rides through
   `ToolRunResult.text` exactly like the existing
   `POLICY_BLOCKED_MARKER` / `POLICY_DENIED_MARKER` pattern.

2. Extend `RepeatFailureGuard::record` to halt on the **first**
   occurrence of the marker — not the second identical attempt
   (`HARD_REJECT_REPEAT_THRESHOLD = 2`), not the third
   (`REPEAT_FAILURE_THRESHOLD = 3`), not the sixth consecutive
   (`NO_PROGRESS_FAILURE_THRESHOLD = 6`). The underlying condition
   (empty wallet, disabled toolkit, expired session) is *global* —
   varying the query or pivoting to a different paid tool cannot
   resolve it. Halt immediately with a clear "user must act" summary
   so the agent reports back instead of grinding.

3. Replace the closed PR's silent skip with
   `report_error_or_expected`. The existing breadcrumb classifier
   demotes the `BackendUserError` / `BudgetExhausted` /
   `ProviderUserState` buckets to a single warn-level breadcrumb —
   observability still sees one classified event per turn instead
   of 19 always-error captures, but errors are not *suppressed*.

Net effect on TAURI-RUST-5KG (~1860 Sentry hits / 9 users):
~19 always-captured errors per turn → 1 classifier-demoted
breadcrumb per turn, AND the agent stops retrying and surfaces
the actionable message to the user.

Includes 4 new `RepeatFailureGuard` tests pinning: halt on first
occurrence; halt regardless of `(tool, args)`; precedence over the
generic 3-attempt threshold; unmarked failures still use the
3-attempt threshold (no behaviour widening).
@CodeGhost21 CodeGhost21 requested a review from a team June 4, 2026 07:39
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 558ec2c5-afba-49e3-8478-7cfce5ef40aa

📥 Commits

Reviewing files that changed from the base of the PR and between 63332ee and 7c6e883.

📒 Files selected for processing (1)
  • src/openhuman/agent/harness/tool_loop.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/agent/harness/tool_loop.rs

📝 Walkthrough

Walkthrough

This PR classifies backend "user-state" failures at the integration layer, returns a typed marker error, embeds a stable marker into tool error text in run_one_tool, and makes the RepeatFailureGuard halt immediately when that marker is seen.

Changes

Backend user-state marker and halt mechanism

Layer / File(s) Summary
User-state error classification and type definition
src/openhuman/integrations/client.rs, src/openhuman/integrations/mod.rs
Adds BackendUserStateError, classify_as_user_state, and is_backend_user_state_error; re-exports the new symbols.
Integration request failure handling and tests
src/openhuman/integrations/client.rs, src/openhuman/integrations/client_tests.rs
IntegrationClient::post/get build a bail_message, classify failures, and return typed BackendUserStateError for matching HTTP/envelope error cases; tests cover 400/403/500, envelope success:false, display format, and wrapped error detection.
Tool execution: embed marker on user-state errors
src/openhuman/agent/harness/engine/tools.rs
run_one_tool detects typed backend user-state errors, reports them via report_error_or_expected, and embeds BACKEND_USER_STATE_MARKER into the returned tool error text; non-user-state errors use existing reporting and generic messages.
Tool loop circuit breaker with marker detection and halt
src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/harness/tool_loop_tests.rs
Adds BACKEND_USER_STATE_MARKER constant and updates RepeatFailureGuard::record to remove the marker and return an immediate Some("Stopping: ...") halt summary when present. Tests verify first-occurrence halt, argument-independence, precedence over generic repeat thresholds, and unchanged behavior for unmarked failures.

Sequence Diagrams

sequenceDiagram
    participant Backend as Backend Response
    participant IntClient as IntegrationClient
    participant Classify as classify_as_user_state
    participant RunTool as run_one_tool
    participant ToolLoop as RepeatFailureGuard

    Backend->>IntClient: !status.is_success() or success:false
    IntClient->>Classify: classify_as_user_state(message)
    alt is user-state failure
        Classify-->>IntClient: BackendUserStateError
        IntClient-->>RunTool: Err(user_state)
        RunTool->>RunTool: is_backend_user_state_error?
        RunTool->>RunTool: embed BACKEND_USER_STATE_MARKER
        RunTool-->>ToolLoop: tool error text + marker
    else non-user-state
        Classify-->>IntClient: None
        IntClient-->>RunTool: Err(generic)
        RunTool-->>ToolLoop: tool error text (no marker)
    end

    ToolLoop->>ToolLoop: detect marker in text?
    alt marker present
        ToolLoop->>ToolLoop: halt immediately
        ToolLoop-->>ToolLoop: return Some(Stopping: ...)
    else no marker
        ToolLoop->>ToolLoop: apply retry threshold
        ToolLoop-->>ToolLoop: return None or continue
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • oxoxDev
  • graycyrus

Poem

🐇 A tiny marker I did find,
tucked in an error, neatly signed.
"User must act" — the loop now knows,
it stops at once where caution grows.
Hops of code, a tidy bind.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main fix: halting the agent loop on first backend user-state failure, which is the core objective of all changes across five modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. sentry-traced-bug Bug identified via Sentry triage bug labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/tool_loop.rs`:
- Around line 143-151: The halt message currently uses truncate_for_halt(result)
which may contain the internal token "[backend-user-state]"; strip or remove
that marker from the truncated text before formatting the user-visible halt
reason. In the block that constructs the message (the return Some(format!(...))
in tool_loop.rs), take the output of truncate_for_halt(result), remove the exact
substring "[backend-user-state]" (or use a safe replace/strip function) and then
interpolate the cleaned string into the format call so no internal routing token
is leaked to users.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb8570c7-81eb-4cb2-8dad-ab1141edfdd1

📥 Commits

Reviewing files that changed from the base of the PR and between 87a91ae and 97bb991.

📒 Files selected for processing (6)
  • src/openhuman/agent/harness/engine/tools.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/harness/tool_loop_tests.rs
  • src/openhuman/integrations/client.rs
  • src/openhuman/integrations/client_tests.rs
  • src/openhuman/integrations/mod.rs

Comment thread src/openhuman/agent/harness/tool_loop.rs
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Walkthrough

This PR correctly addresses TAURI-RUST-5KG at the root: the runaway retry loop (not just the Sentry noise). Typing the error at the HTTP boundary, embedding BACKEND_USER_STATE_MARKER, and halting RepeatFailureGuard on first occurrence is the right architecture — it mirrors the existing POLICY_BLOCKED_MARKER/POLICY_DENIED_MARKER pattern cleanly. Design is solid, CI is fully green (all 20 checks pass), and the test coverage is thorough.

One open issue needs to be addressed before merge — it's the same item CodeRabbit flagged.


🔴 Blocker — src/openhuman/agent/harness/tool_loop.rs:151

Internal routing token leaks into the user/LLM-visible halt message.

At the point the halt branch fires, result is:

[backend-user-state] Error executing web_search_tool: Backend returned 400 …: Insufficient balance

Passing it directly to truncate_for_halt(result) embeds the raw [backend-user-state] token in the message the model (and user) sees. The marker is an internal routing token — it has no meaning outside of RepeatFailureGuard and will read as noise or cause confusion.

Fix:

// before
if result.contains(BACKEND_USER_STATE_MARKER) {
    return Some(format!(
        "Stopping: the `{tool}` call returned a backend user-state error \
         — this is a deterministic condition that requires user action …\
         Reason:\n{}\n\n…",
        truncate_for_halt(result),
    ));
}

// after
if result.contains(BACKEND_USER_STATE_MARKER) {
    let clean_reason = result
        .replace(BACKEND_USER_STATE_MARKER, "")
        .trim()
        .to_string();
    return Some(format!(
        "Stopping: the `{tool}` call returned a backend user-state error \
         — this is a deterministic condition that requires user action …\
         Reason:\n{}\n\n…",
        truncate_for_halt(&clean_reason),
    ));
}

🟡 Minor — tool_loop_tests.rs — missing negative assertion in backend_user_state_marker_halts_on_first_occurrence

The three existing assert!(msg.contains(…)) checks pass even when the marker is present in the output. Once the fix above is applied, add a negative assertion to lock in the clean-output behavior:

assert!(
    !msg.contains(BACKEND_USER_STATE_MARKER),
    "internal routing token must not leak into user-visible halt message; got: {msg}"
);

Verified / looks good

  • classify_as_user_state correctly narrows to BackendUserError / BudgetExhausted / ProviderUserState — transport and session error kinds are excluded, keeping existing retry/re-auth machinery intact ✅
  • is_backend_user_state_error chain-walks anyhow so future .context() wraps don't silently re-enable Sentry capture ✅
  • All 4 bail sites in client.rs (POST non-2xx, POST envelope, GET non-2xx, GET envelope) covered ✅
  • Display impl on BackendUserStateError is identical to the original anyhow::bail! string — additive, no stringify callers broken ✅
  • backend_user_state_unmarked_failures_use_normal_threshold is the critical regression guard — ordinary 3-attempt threshold untouched ✅
  • outcome=failed_user_state Sentry tag is a clean dashboard win ✅

Just the one fix in tool_loop.rs + the test pin and this is ready.

…ssage

The internal routing token was passed through truncate_for_halt(result)
verbatim into the LLM/user-visible halt summary. Strip it with .replace()
+ .trim() before formatting so the message reads cleanly without the
[backend-user-state] prefix.

Also pins the clean-output invariant with a negative assertion in
backend_user_state_marker_halts_on_first_occurrence — addresses
@coderabbitai and @YellowSnnowmann review (TAURI-RUST-5KG).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/agent/harness/tool_loop.rs (1)

142-156: ⚡ Quick win

Consider adding debug-level logging when the marker triggers a halt.

The circuit breaker has multiple halt conditions (backend user-state, hard rejects, repeat threshold, no-progress) but only the caller logs a generic "circuit breaker tripped" message. Adding a debug log here when the marker is detected would help distinguish marker-based halts from other rules during troubleshooting, especially given the global-halt semantic is new.

🔍 Example logging addition
         if result.contains(BACKEND_USER_STATE_MARKER) {
+            tracing::debug!(
+                tool,
+                "[RepeatFailureGuard] backend user-state marker detected — halting on first occurrence"
+            );
             let clean_reason = result
                 .replace(BACKEND_USER_STATE_MARKER, "")
                 .trim()

As per coding guidelines: "Add substantial debug-level logs while implementing features or fixes in Rust using log / tracing crate with stable prefixes and correlation fields (request IDs, method names, entity IDs) for grep-friendly tracing"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/tool_loop.rs` around lines 142 - 156, The
BACKEND_USER_STATE_MARKER branch in the tool-halt logic (inside the if
result.contains(BACKEND_USER_STATE_MARKER) block) should emit a debug-level log
before returning so marker-based halts are distinguishable; add a
tracing::debug! (or log::debug!) call that logs a stable prefix like
"circuit_breaker:backend_user_state" and includes correlation fields (request
ID, method / handler name, tool name via `tool`, and a truncated reason using
`truncate_for_halt(&clean_reason)`) so operators can grep and correlate these
events; place this single debug call immediately after computing `clean_reason`
and before the existing return.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/agent/harness/tool_loop.rs`:
- Around line 142-156: The BACKEND_USER_STATE_MARKER branch in the tool-halt
logic (inside the if result.contains(BACKEND_USER_STATE_MARKER) block) should
emit a debug-level log before returning so marker-based halts are
distinguishable; add a tracing::debug! (or log::debug!) call that logs a stable
prefix like "circuit_breaker:backend_user_state" and includes correlation fields
(request ID, method / handler name, tool name via `tool`, and a truncated reason
using `truncate_for_halt(&clean_reason)`) so operators can grep and correlate
these events; place this single debug call immediately after computing
`clean_reason` and before the existing return.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 081fc44c-3355-4b45-a464-7c6b54c1f259

📥 Commits

Reviewing files that changed from the base of the PR and between 97bb991 and 63332ee.

📒 Files selected for processing (2)
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/harness/tool_loop_tests.rs

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 4, 2026
… fires

Adds a tracing::debug! call at [circuit_breaker:backend_user_state] so
marker-triggered halts are distinguishable from generic repeat-threshold
or no-progress halts in operator traces — addresses CodeRabbit nitpick
on TAURI-RUST-5KG.
@senamakel
Copy link
Copy Markdown
Member

as we spoke. serious errors like out of balance need to fail loudly.

@senamakel senamakel closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants