-
Notifications
You must be signed in to change notification settings - Fork 316
fix(retry): honor minute/hour and compound retry-after hints #1052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ozpool
wants to merge
3
commits into
Soju06:main
Choose a base branch
from
ozpool:fix/retry-after-minute-units
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
38 changes: 38 additions & 0 deletions
38
openspec/changes/honor-upstream-retry-after-hint-units/proposal.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # Honor upstream Retry-After hint units in account cooldown | ||
|
|
||
| ## Problem | ||
|
|
||
| When an upstream Codex/ChatGPT response rate-limits an account (HTTP 429), its | ||
| message often carries a "try again in <duration>" hint. `parse_retry_after` | ||
| only recognized second and millisecond units, so any minute, hour, or compound | ||
| hint (for example `20m`, `6m0s`, `1h2m3s`) failed to match and returned `None`. | ||
| `handle_rate_limit` then fell back to `backoff_seconds(state.error_count)`, a | ||
| sub-second-to-few-second backoff, and set `cooldown_until` far earlier than the | ||
| upstream asked. The balancer re-selected the still-rate-limited account almost | ||
| immediately, re-sending traffic into the same 429 and amplifying the rate limit | ||
| instead of waiting the cooldown out. | ||
|
|
||
| ## Solution | ||
|
|
||
| Teach `parse_retry_after` to parse the full contiguous run of `<number><unit>` | ||
| tokens after "try again in", summing hour, minute, second, and millisecond | ||
| components (and their word forms). A longest-match-first unit alternation keeps | ||
| `ms` distinct from `m` and `minutes` from `m`. When no token is recognized the | ||
| function still returns `None`, so `handle_rate_limit` keeps its existing backoff | ||
| fallback. The account then stays in cooldown for the duration the upstream | ||
| actually requested. | ||
|
|
||
| ## Changes | ||
|
|
||
| - Parse minute, hour, and compound `<num><unit>` retry hints in | ||
| `parse_retry_after`, in addition to seconds and milliseconds | ||
| - Sum compound durations (for example `1h2m3s`) into a single seconds value | ||
| - Preserve the `None` result for unparseable hints so the error-count backoff | ||
| fallback in `handle_rate_limit` is unchanged | ||
| - Add unit coverage for minute, hour, compound, and word-form hints | ||
|
|
||
| ## Out of scope | ||
|
|
||
| - Changing the `backoff_seconds` fallback schedule | ||
| - Changing how `reset_at` is extracted or clamped | ||
| - Changing the selector's user-visible "Try again in {N}s" hint ceiling |
40 changes: 40 additions & 0 deletions
40
...pec/changes/honor-upstream-retry-after-hint-units/specs/account-routing/spec.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| ## ADDED Requirements | ||
|
|
||
| ### Requirement: Upstream rate-limit cooldown honors the Retry-After hint duration | ||
|
|
||
| When an upstream rate-limit error carries a "try again in" hint, the account | ||
| cooldown SHALL last for the full duration the hint expresses. The parser SHALL | ||
| recognize hour, minute, second, and millisecond units, including their word | ||
| forms, and SHALL sum compound hints such as `1h2m3s` into a single duration. | ||
| A unit token SHALL be recognized only when it is not immediately followed by | ||
| another letter, so an unsupported longer word whose prefix matches a unit (for | ||
| example `month`, where `m` prefixes the word) is not mis-read as that shorter | ||
| unit. When the hint contains no recognizable unit token, the system SHALL fall | ||
| back to the error-count backoff schedule. A rate-limited account SHALL NOT be | ||
| re-selected before its cooldown elapses. | ||
|
|
||
| #### Scenario: Compound minute-and-second hint sets the full cooldown | ||
|
|
||
| - **GIVEN** an upstream 429 whose message says "try again in 6m0s" | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the account cooldown lasts 360 seconds | ||
| - **AND** the account is not re-selected until that cooldown elapses | ||
|
|
||
| #### Scenario: Minutes-only hint is honored | ||
|
|
||
| - **GIVEN** an upstream 429 whose message says "try again in 20m" | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the account cooldown lasts 1200 seconds | ||
|
|
||
| #### Scenario: Unparseable hint falls back to backoff | ||
|
|
||
| - **GIVEN** an upstream 429 whose message has no recognizable "try again in" duration | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the cooldown uses the error-count backoff schedule instead | ||
|
|
||
| #### Scenario: Unsupported longer word is not mis-read as a shorter unit | ||
|
|
||
| - **GIVEN** an upstream 429 whose message says "try again in 1 month" | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the `month` token is not read as a 1-minute hint | ||
| - **AND** the cooldown uses the error-count backoff schedule instead |
10 changes: 10 additions & 0 deletions
10
openspec/changes/honor-upstream-retry-after-hint-units/tasks.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # Tasks | ||
|
|
||
| - [x] Extend `parse_retry_after` to honor minute/hour and compound `<num><unit>` retry hints | ||
| - [x] Keep the `None` fallback for unparseable hints so `handle_rate_limit` backoff is unchanged | ||
| - [x] Add regression coverage for minute, hour, compound, and word-form hints in `tests/unit/test_retry.py` | ||
| - [x] Require a non-letter boundary after each unit so an unsupported longer word (`month` -> `m`) is not mis-read as a shorter unit | ||
| - [x] Add product-path regression coverage in `tests/unit/test_load_balancer.py` proving `handle_rate_limit` sets the cooldown from a word-unit hint and falls back to backoff for an unsupported longer word | ||
| - [x] Document the account-routing cooldown requirement delta (proposal + ADDED requirement with GIVEN/WHEN/THEN scenarios, including the unit-boundary scenario) | ||
| - [x] Run `uv run --frozen ruff check .` and `uv run --frozen ruff format --check .` | ||
| - [x] Run `uv run --frozen pytest tests/unit/test_retry.py` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AGENTS.md's PR trapdoor requires bug fixes to add regression coverage at the externally failing product path, but this change only tests the parsing helper. Because the actual failure is
handle_rate_limitsettingcooldown_untilandselect_accountskipping that account, a future wiring regression where the parsed minute/compound delay is ignored would still pass these tests; add a load-balancer-level test for a6m0s/20m429 that verifies the cooldown and non-reselection behavior.Useful? React with 👍 / 👎.