fix(retry): honor minute/hour and compound retry-after hints#1052
fix(retry): honor minute/hour and compound retry-after hints#1052ozpool wants to merge 2 commits into
Conversation
parse_retry_after only matched s/ms units, so upstream rate-limit messages expressed in minutes or hours (e.g. "try again in 6m0s", "try again in 20m") failed to match and returned None. handle_rate_limit then fell back to sub-second exponential backoff and re-selected the still-rate-limited account almost immediately. Parse the full contiguous run of <number><unit> components after the "try again in" hint, summing hours/minutes/seconds/milliseconds (plus their word forms) to total seconds.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
codex-lb/app/core/utils/retry.py
Line 44 in 60a0392
This commit changes the retry/cooldown contract by teaching parse_retry_after to honor minute/hour and compound upstream hints, but I checked the commit diff and it only modifies app/core/utils/retry.py plus tests with no openspec/changes/<slug>/ artifact. That violates /workspace/codex-lb/AGENTS.md lines 90-96: “OpenSpec is a hard gate for behavior... changes” and “Code/tests alone are not enough when OpenSpec is required,” so the PR is not ready until the behavior change is captured and validated in OpenSpec.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _RETRY_PATTERN = re.compile(rf"(?i)try again in\s*((?:\d+(?:\.\d+)?\s*(?:{_UNIT_ALTERNATION})\s*)+)") | ||
| _DURATION_TOKEN = re.compile(rf"(?i)(\d+(?:\.\d+)?)\s*({_UNIT_ALTERNATION})") |
There was a problem hiding this comment.
Prevent longer units from being parsed as minutes
Because the unit alternation is not followed by a token boundary, unsupported longer units that begin with a supported compact unit are partially accepted; for example, parse_retry_after("Try again in 2 months") now matches the leading 2m and returns 120.0. If an upstream monthly/quota message uses month(s), handle_rate_limit will retry after minutes instead of respecting the larger hint or falling back, so add a boundary/negative lookahead around units while preserving compact forms like 1m30s.
Useful? React with 👍 / 👎.
The retry/cooldown contract change in this PR (parse_retry_after now honors minute/hour and compound upstream hints) needs an OpenSpec change per AGENTS.md. Adds openspec/changes/honor-upstream-retry-after-hint-units/ with the proposal, tasks, and an ADDED account-routing requirement plus scenarios covering compound, minute-only, and unparseable retry hints.
|
Good catch on the missing OpenSpec artifact. Added ruff check + ruff format --check clean, |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cd08c2f5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def test_parse_retry_after_compound_minutes_seconds(): | ||
| assert parse_retry_after("Please try again in 6m0s.") == 360.0 | ||
| assert parse_retry_after("Try again in 1m30s") == 90.0 |
There was a problem hiding this comment.
Add product-path coverage for cooldown handling
AGENTS.md's PR trapdoor requires bug fixes to add regression coverage at the externally failing product path, but this change only tests the parsing helper. Because the actual failure is handle_rate_limit setting cooldown_until and select_account skipping that account, a future wiring regression where the parsed minute/compound delay is ignored would still pass these tests; add a load-balancer-level test for a 6m0s/20m 429 that verifies the cooldown and non-reselection behavior.
Useful? React with 👍 / 👎.
| # Capture the contiguous run of ``<number><unit>`` components that immediately | ||
| # follows "try again in" so compound hints such as ``6m0s`` or ``1h2m3s`` are | ||
| # read in full instead of stopping at the first unit. | ||
| _RETRY_PATTERN = re.compile(rf"(?i)try again in\s*((?:\d+(?:\.\d+)?\s*(?:{_UNIT_ALTERNATION})\s*)+)") |
There was a problem hiding this comment.
Delimit unit matches before accepting abbreviations
When an upstream hint uses an unsupported word that starts with a supported one-letter unit, such as try again in 1 month, this pattern accepts only the leading m and schedules a 60-second cooldown instead of treating the hint as unparseable. That bypasses the new fallback promised for hints with no recognizable unit token and can reselect an account far earlier than the upstream intended; require a token boundary/negative lookahead around the unit alternatives so abbreviations only match complete units.
Useful? React with 👍 / 👎.
Closes #1051.
Problem
parse_retry_afteronly matcheds/msunits:Upstream rate-limit messages that express the cooldown in minutes or hours (
Please try again in 6m0s.,Try again in 20m, compound1h2m3s) failed to match, so the function returnedNone. The sole callerhandle_rate_limitthen fell back tobackoff_seconds(state.error_count)(~0.2s for a first error), re-selecting an account the upstream asked us to pause for minutes — amplifying 429s against a still-rate-limited account.Fix
Parse the full contiguous run of
<number><unit>components that follows "try again in" and sum them to total seconds. Handlesh/m/s/msand their word forms (hours,minutes,seconds,milliseconds), including compound Go-duration strings.Try again in 1.2sTry again in 500msTry again in 20mPlease try again in 6m0s.Try again in 1m30sTry again in 1h2m3sno retry infoTests
Added minute, compound minute+second, compound-hour, and word-unit cases to
tests/unit/test_retry.py. Existing second/millisecond/missing cases unchanged.ruff check+ruff format --checkclean.