Skip to content

fix(channels): distinguish rate-limit sources in chat error classifier (#2364)#2371

Open
CodeGhost21 wants to merge 2 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/2364-rate-limit-classification
Open

fix(channels): distinguish rate-limit sources in chat error classifier (#2364)#2371
CodeGhost21 wants to merge 2 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/2364-rate-limit-classification

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented May 20, 2026

Summary

  • classify_inference_error in web.rs used to tag any error string containing "rate limit" as rate_limited, including the SecurityPolicy hourly-cap strings emitted by built-in tools — so users saw "Your AI provider is rate-limiting you" copy for an OpenHuman-local safety cap.
  • Split the catch-all branch into three sources (action_budget_exceeded, max_iterations, rate_limited) and surface Retry-After seconds in the user-facing message when the upstream supplies them.
  • No frontend change needed — the new copy flows through the existing chat_error.message forwarder in ChatRuntimeProvider.

Problem

Issue #2364 reports that one chat thread shows "You're being rate-limited" while a fresh thread works. Tracing the frontend (ChatRuntimeProvider.onError) showed no per-thread sticky state — inferenceStatus, inferenceTurnLifecycle, and activeThreadId all clear on chat_error. The stickiness is perceived, driven by:

  1. Generic copy that doesn't say "you can retry in this thread".
  2. No retry-after window so the user has no idea when to try again.
  3. The same generic copy used regardless of whether the limit came from the upstream LLM provider, the SecurityPolicy budget on tool calls, or the agent-loop max-iterations cap.

Solution

Three new / reshaped branches in classify_inference_error:

Source Classification User-facing copy
SecurityPolicy hourly action budget (web_fetch / curl / http_request / etc.) action_budget_exceeded "You've hit OpenHuman's per-hour action budget — this is a local safety cap, not your AI provider. The window decays gradually; you can keep chatting in this thread …"
Agent-loop max-iterations cap ("Agent exceeded maximum tool iterations") max_iterations "The agent ran the maximum number of tool steps for one turn … You can retry the same question in this thread once the underlying limit clears."
Upstream provider 429 / "rate limit" rate_limited (enriched) "Your AI provider is rate-limiting requests. This is a transient upstream limit, not a thread-level block — you can retry in this thread. Try again in 30 seconds." — retry-after parsed from Retry-After: / retry_after: header bodies, fractional values rounded up.

The SecurityPolicy and max-iterations checks run BEFORE the provider-429 branch so their substring rate limit no longer leaks into the wrong bucket. Existing rate_limited callers (FE handler, tests) continue to receive the same error_type token; only the human-readable message changed.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason.

  • Tests added or updated (happy path + at least one failure / edge case) — 5 new tests + 2 pre-existing classifier tests; cargo test --lib classify_inference_error → 7 passed.
  • Diff coverage ≥ 80% — every new branch and the new retry-after helper has at least one focused test. No untested branches added.
  • Coverage matrix updated — N/A: behaviour-only change to existing classifier output; no new feature row added/removed/renamed.
  • All affected feature IDs from the matrix are listed in ## Related — N/A: no matrix row touched.
  • No new external network dependencies introduced — pure string classification, no IO.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: error-message wording only; happy-path chat behaviour is unchanged.
  • Linked issue closed via Closes #NNN in the ## Related section — see below.

Impact

  • Runtime/platform: backend chat error path (channels::providers::web::classify_inference_error).
  • User-visible: when the agent hits a SecurityPolicy budget cap, the chat surface now says "OpenHuman's per-hour action budget" instead of mislabeling the cap as an upstream provider 429. Provider 429s now include a parsed retry-after window when the upstream supplies one.
  • Performance / security: zero runtime cost (string classification only). No new error data is exposed; the retry-after parser only reads what's already in the error body.
  • Migration / compatibility: error_type tokens are unchanged for upstream consumers; only the message body is reworded. FE error forwarder picks up the new copy automatically.

Tests

cargo test --lib classify_inference_error7 passed, 0 failed (5 new, 2 pre-existing).

New test Covers
classify_inference_error_distinguishes_action_budget_from_provider_429 all three SecurityPolicy strings classify as action_budget_exceeded; copy says "local safety cap" + "can keep chatting in this thread"
classify_inference_error_max_iterations_gets_dedicated_branch flattened web-channel error wrapper resolves to max_iterations with the same-thread recovery hint
classify_inference_error_rate_limited_surfaces_retry_after_seconds 30-second retry-after appears verbatim; thread-recovery hint present
classify_inference_error_rate_limited_no_retry_after_omits_hint 429 without Retry-After does NOT hallucinate a window
classify_inference_error_rate_limited_handles_fractional_and_minute_windows 2.4s rounds to 3s; 180s renders as "about 3 minutes"

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

  • Branch: fix/2364-rate-limit-classification (branched from origin/main after fresh fetch)
  • Commit SHA: see PR head

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no frontend changes.
  • pnpm typecheck — N/A: no TypeScript changes.
  • Focused tests: cargo test --lib classify_inference_error → 7 passed, 0 failed.
  • Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml applied; cargo check clean.
  • Tauri fmt/check (if changed): N/A: no Tauri shell changes.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: error classification distinguishes OpenHuman SecurityPolicy budget caps from upstream provider 429s; retry-after seconds appear in the user-facing message when parseable.
  • User-visible effect: chat error bubbles for rate-limit-style errors now name the actual cause and (where known) a concrete retry window.

Parity Contract

  • Legacy behavior preserved: error_type tokens are unchanged for the existing rate_limited consumers; only message body changed. Generic-inference fallback unchanged. Order of remaining branches (timeout / auth / budget / provider / model_unavailable / config-rejection) is unchanged.
  • Guard/fallback/dispatch parity checks: the new SecurityPolicy / max-iterations branches are ordered BEFORE the generic 429 branch so their substring rate limit no longer leaks into the wrong bucket — locked in by _distinguishes_action_budget_from_provider_429 and _max_iterations_gets_dedicated_branch.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none.
  • Canonical PR: this PR.
  • Resolution: N/A.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error messages when action budget is exhausted, clarifying that the limit is local and the thread remains usable
    • Enhanced rate-limiting responses with parsed Retry-After timing information to provide clearer retry guidance
    • Better error handling and messaging for agent tool iteration limit exceeded scenarios
  • Tests

    • Added comprehensive tests verifying correct error classification and user messaging across multiple failure scenarios

Review Change Stack

tinyhumansai#2364)

User-perception bug from tinyhumansai#2364: any agent-loop error string containing
"rate limit" was classified as `rate_limited` and the user saw
"You're being rate-limited. Please wait a moment and try again." —
copy that implies the AI provider is throttling them and gives no
hint about which thread is affected or how long to wait. When the
real cause was the SecurityPolicy hourly action cap on built-in
tools (web_fetch / curl / http_request), the message was misleading;
users opened new threads, saw those "work" (because the new thread
didn't trigger a tool-call storm), and concluded the original thread
was "stuck".

Fix: split the catch-all rate-limit branch in `classify_inference_error`
into three sources and surface retry-after when available.

Classifier (src/openhuman/channels/providers/web.rs)
- `action_budget_exceeded` (new): catches the SecurityPolicy strings
  emitted by the built-in tools (`Rate limit exceeded: action budget
  exhausted`, `Rate limit exceeded: too many actions in the last
  hour`, `Action blocked: rate limit exceeded`). User-facing copy:
  *"You've hit OpenHuman's per-hour action budget — this is a local
  safety cap, not your AI provider. The window decays gradually;
  you can keep chatting in this thread …"*. Ordered BEFORE the
  provider-429 branch so its substring `rate limit` no longer leaks
  into the wrong bucket.
- `max_iterations` (new): catches the canonical agent-loop cap
  string ("Agent exceeded maximum tool iterations") via the existing
  `is_max_iterations_error` predicate so the user sees: *"The agent
  ran the maximum number of tool steps for one turn … You can
  retry the same question in this thread once the underlying limit
  clears."* — previously this fell through to the opaque generic
  `inference` bucket.
- `rate_limited` (kept, enriched): now extracts `Retry-After` /
  `retry_after` seconds from the error body and appends a concrete
  hint to the user message ("Try again in 30 seconds" / "Try again
  in about 3 minutes" for windows ≥90s / "You can retry immediately"
  for 0). Fractional values round up so we never under-promise.
  Copy also now states the limit is upstream and that retrying in
  the same thread is fine.

Tests (src/openhuman/channels/providers/web_tests.rs)
- `classify_inference_error_distinguishes_action_budget_from_provider_429`
  — all three SecurityPolicy strings classify as
  `action_budget_exceeded` and the copy says "local safety cap" +
  "can keep chatting in this thread".
- `classify_inference_error_max_iterations_gets_dedicated_branch`
  — the flattened web-channel error wrapper resolves to
  `max_iterations` with the same-thread recovery hint.
- `classify_inference_error_rate_limited_surfaces_retry_after_seconds`
  — 30-second retry-after appears verbatim and the thread-recovery
  hint is present.
- `classify_inference_error_rate_limited_no_retry_after_omits_hint`
  — 429 without Retry-After does NOT hallucinate a window.
- `classify_inference_error_rate_limited_handles_fractional_and_minute_windows`
  — 2.4s rounds to 3s; 180s renders as "about 3 minutes".

`cargo test --lib classify_inference_error` → 7 passed, 0 failed.
`cargo check` clean. `cargo fmt` applied.

What's intentionally out of scope
- A real per-thread rate-limit Redux state: the trace shows none
  exists; `inferenceStatus` / `lifecycle` / `activeThread` all clear
  on `chat_error` in `ChatRuntimeProvider.onError`, and the FE test
  suite already exercises this.
- A countdown timer in the composer: separate UI work; this PR
  keeps the fix server-side so the existing FE forwarder picks it
  up automatically.
@CodeGhost21 CodeGhost21 requested a review from a team May 20, 2026 20:04
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

The web channel error classifier now detects OpenHuman action-budget exhaustion, agent max-iteration tool-step exhaustion, and provider rate limits. It adds helpers to parse/format Retry-After timing, reorders classification priority to surface budgets first, and updates rate-limited messages to include optional retry timing hints.

Changes

Error classification improvements with retry messaging

Layer / File(s) Summary
Rate-limit parsing and message formatting
src/openhuman/channels/providers/web.rs
parse_retry_after_secs_from_str extracts numeric Retry-After values (ceiling seconds) from free-form error strings; retry_after_hint renders Option<u64> into user-facing suffixes (immediate, N seconds, or "about N minutes").
Action-budget detection
src/openhuman/channels/providers/web.rs
is_action_budget_exhausted matches OpenHuman-local action-budget phrases in lowercased error text to classify local safety budget exhaustion.
Error classification logic and orchestration
src/openhuman/channels/providers/web.rs
classify_inference_error is reordered to check action-budget first, then is_max_iterations_error, then provider rate limit/429; the rate_limited branch now uses parsed Retry-After hints and updated transient-provider messaging.
Test suite for error classification paths
src/openhuman/channels/providers/web_tests.rs
Adds tests validating action-budget classification, max-iterations classification, and several rate-limit scenarios (with/without Retry-After, fractional-second rounding-up, minute-threshold formatting, pluralization, and parsing Retry-After inside JSON bodies).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

working

Suggested reviewers

  • graycyrus

Poem

A rabbit hops through logs at night, 🐰
Counting seconds till retry feels right,
Budgets and iterations neatly pried,
Retry-After hints now stand beside—
Chat resumes when timers say "alright."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: distinguishing rate-limit sources in the chat error classifier, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR directly addresses all key requirements from issue #2364: distinguishes rate-limit sources (provider 429s, action-budget, max-iterations), provides retry-after parsing and recovery guidance, prevents thread-level state stickiness through accurate classification, and adds comprehensive tests covering transient limits and same-thread retry behavior.
Out of Scope Changes check ✅ Passed All changes are scoped to the inference error classifier in src/openhuman/channels/providers/web.rs and its corresponding tests; no unrelated alterations or dependencies outside the stated objectives are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 20, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/channels/providers/web.rs`:
- Around line 271-278: The retry hint currently floors minutes in the Some(n)
branch of the match on secs, producing "about 1 minutes" for 90–119s and
understating the wait; change the minutes calculation to round up (e.g., minutes
= (n + 59) / 60) in the Some(n) => branch and format the message using correct
singular/plural ("minute"/"minutes") when building the string for "Try again in
about {mins} minute(s).".
- Around line 243-263: The current prefix list in the retry-after parsing loop
misses quoted JSON keys like "\"retry_after\": 30"; update the prefix matching
in the loop (where variables lower and err are used and the num_str parsing
occurs) to also check for quoted key patterns (e.g. "\"retry_after\":",
"\"retry-after\":", "\"retry_after\" :", "\"retry-after\" :") or use a small
regex that matches optional surrounding quotes, optional whitespace, the key
(retry_after|retry-after) and the colon, then extract the same trailing numeric
substring and parse it as before so quoted JSON bodies return the retry hint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3857d90f-f32f-4d2a-9bfa-1dd877069a70

📥 Commits

Reviewing files that changed from the base of the PR and between fa8d75f and 9f7a74e.

📒 Files selected for processing (2)
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/providers/web_tests.rs

Comment thread src/openhuman/channels/providers/web.rs
Comment thread src/openhuman/channels/providers/web.rs
…oted JSON retry_after + round minutes up

Two CodeRabbit findings on src/openhuman/channels/providers/web.rs:

1. **Minor — quoted JSON retry_after.** A serialised provider body
   like `{"retry_after": 30}` would miss every prefix because the
   surrounding quote stopped `lower.find("retry_after:")` from
   matching, and the user lost the retry hint the upstream
   actually supplied. Normalise by stripping double quotes from
   the lowercased scan buffer before searching for prefixes.
   New test `classify_inference_error_rate_limited_parses_quoted_json_retry_after`.

2. **Minor — minute hint rounds up + uses singular/plural.** The
   `Some(n)` arm used `n / 60` (integer floor) and a hard-coded
   "minutes" suffix, so 90–119s rendered as "about 1 minutes" —
   both grammatically wrong and an instruction to retry sooner
   than the upstream allows. Round up via
   `(n / 60) + u64::from(n % 60 != 0)` and pick singular vs
   plural. New test
   `classify_inference_error_rate_limited_minute_window_uses_singular_and_rounds_up`
   pins 90s → "about 2 minutes" and 119s → "about 2 minutes".

`cargo test --lib classify_inference_error` → 9 passed, 0 failed
(7 pre-existing + 2 new).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/channels/providers/web.rs (1)

305-309: 💤 Low value

Consider case-sensitivity and pattern fragility.

The is_action_budget_exhausted function receives an already-lowercased string (err_lower) from the caller, but the function signature doesn't enforce or document this contract. If a caller passes a non-lowercased string, the match will silently fail.

Additionally, these exact substring matches are brittle if the SecurityPolicy error messages ever change slightly (e.g., "Action budget exhausted" vs "action budget exhausted").

🔧 Optional: Add defensive lowercasing and doc comment
 /// Detect the SecurityPolicy global hourly action-budget signal
 /// emitted by the built-in tools (`web_fetch`, `curl`, `http_request`,
 /// `polymarket`, `composio`, etc.) — see `src/openhuman/security/
 /// policy.rs::SecurityPolicy::is_rate_limited`.
+///
+/// # Arguments
+/// * `err_lower` - The error string, expected to be pre-lowercased by the caller.
 ///
 /// We match the canonical English strings those tools emit. This is
 /// load-bearing for issue `#2364`: before this check ran, any string
 /// containing "rate limit" was misclassified as a provider 429 and
 /// the user saw the generic "You're being rate-limited" copy, which
 /// hides that the cap is OpenHuman's own per-hour safety budget,
 /// not the upstream LLM provider.
 fn is_action_budget_exhausted(err_lower: &str) -> bool {
+    // Defensive: ensure lowercase even if caller forgets
+    let err_lower = err_lower.to_ascii_lowercase();
     err_lower.contains("rate limit exceeded: action budget exhausted")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/providers/web.rs` around lines 305 - 309, The helper
is_action_budget_exhausted currently assumes its input is lowercased and uses
brittle exact substrings; change it to accept &str, perform defensive
normalization inside (e.g., let s = err.to_lowercase();) and run the contains
checks against s so callers need not lowercase; add a short doc comment above
is_action_budget_exhausted indicating it normalizes the input and what error
substrings it checks for, and consider keeping the three existing substrings but
document they are lowercased for future maintainers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/channels/providers/web.rs`:
- Around line 305-309: The helper is_action_budget_exhausted currently assumes
its input is lowercased and uses brittle exact substrings; change it to accept
&str, perform defensive normalization inside (e.g., let s = err.to_lowercase();)
and run the contains checks against s so callers need not lowercase; add a short
doc comment above is_action_budget_exhausted indicating it normalizes the input
and what error substrings it checks for, and consider keeping the three existing
substrings but document they are lowercased for future maintainers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ec2649e-61d8-49d3-90de-081fb75b091d

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7a74e and 553ec29.

📒 Files selected for processing (2)
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/providers/web_tests.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rate limit state sticks to one chat thread

1 participant