-
Notifications
You must be signed in to change notification settings - Fork 315
fix(retry): honor minute/hour and compound retry-after hints #1052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,7 +3,38 @@ | |
| import random | ||
| import re | ||
|
|
||
| _RETRY_PATTERN = re.compile(r"(?i)try again in\s*(\d+(?:\.\d+)?)\s*(s|ms|seconds?)") | ||
| # Units the retry hint may use, longest-first within each family so that, for | ||
| # example, ``ms`` is preferred over ``m`` and ``minutes`` over ``m``. | ||
| _UNIT_ALTERNATION = r"ms|milliseconds?|hours?|hrs?|h|minutes?|mins?|m|seconds?|secs?|s" | ||
|
|
||
| # Seconds-per-unit for every literal ``_UNIT_ALTERNATION`` can capture. | ||
| _UNIT_SECONDS: dict[str, float] = { | ||
| "ms": 0.001, | ||
| "millisecond": 0.001, | ||
| "milliseconds": 0.001, | ||
| "h": 3600.0, | ||
| "hr": 3600.0, | ||
| "hrs": 3600.0, | ||
| "hour": 3600.0, | ||
| "hours": 3600.0, | ||
| "m": 60.0, | ||
| "min": 60.0, | ||
| "mins": 60.0, | ||
| "minute": 60.0, | ||
| "minutes": 60.0, | ||
| "s": 1.0, | ||
| "sec": 1.0, | ||
| "secs": 1.0, | ||
| "second": 1.0, | ||
| "seconds": 1.0, | ||
| } | ||
|
|
||
| # Capture the contiguous run of ``<number><unit>`` components that immediately | ||
| # follows "try again in" so compound hints such as ``6m0s`` or ``1h2m3s`` are | ||
| # read in full instead of stopping at the first unit. | ||
| _RETRY_PATTERN = re.compile(rf"(?i)try again in\s*((?:\d+(?:\.\d+)?\s*(?:{_UNIT_ALTERNATION})\s*)+)") | ||
| _DURATION_TOKEN = re.compile(rf"(?i)(\d+(?:\.\d+)?)\s*({_UNIT_ALTERNATION})") | ||
|
Comment on lines
+35
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because the unit alternation is not followed by a token boundary, unsupported longer units that begin with a supported compact unit are partially accepted; for example, Useful? React with 👍 / 👎. |
||
|
|
||
| _BACKOFF_INITIAL_DELAY_MS = 200 | ||
| _BACKOFF_FACTOR = 2.0 | ||
| _BACKOFF_JITTER_MIN = 0.9 | ||
|
|
@@ -14,11 +45,20 @@ def parse_retry_after(message: str) -> float | None: | |
| match = _RETRY_PATTERN.search(message or "") | ||
| if not match: | ||
| return None | ||
| value = float(match.group(1)) | ||
| unit = match.group(2).lower() | ||
| if unit == "ms": | ||
| return value / 1000 | ||
| return value | ||
| total = 0.0 | ||
| matched = False | ||
| for value, unit in _DURATION_TOKEN.findall(match.group(1)): | ||
| multiplier = _UNIT_SECONDS.get(unit.lower()) | ||
| if multiplier is None: | ||
| continue | ||
| if unit.lower() == "ms": | ||
| total += float(value) / 1000 | ||
| else: | ||
| total += float(value) * multiplier | ||
| matched = True | ||
| if not matched: | ||
| return None | ||
| return total | ||
|
|
||
|
|
||
| def backoff_seconds(attempt: int) -> float: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # Honor upstream Retry-After hint units in account cooldown | ||
|
|
||
| ## Problem | ||
|
|
||
| When an upstream Codex/ChatGPT response rate-limits an account (HTTP 429), its | ||
| message often carries a "try again in <duration>" hint. `parse_retry_after` | ||
| only recognized second and millisecond units, so any minute, hour, or compound | ||
| hint (for example `20m`, `6m0s`, `1h2m3s`) failed to match and returned `None`. | ||
| `handle_rate_limit` then fell back to `backoff_seconds(state.error_count)`, a | ||
| sub-second-to-few-second backoff, and set `cooldown_until` far earlier than the | ||
| upstream asked. The balancer re-selected the still-rate-limited account almost | ||
| immediately, re-sending traffic into the same 429 and amplifying the rate limit | ||
| instead of waiting the cooldown out. | ||
|
|
||
| ## Solution | ||
|
|
||
| Teach `parse_retry_after` to parse the full contiguous run of `<number><unit>` | ||
| tokens after "try again in", summing hour, minute, second, and millisecond | ||
| components (and their word forms). A longest-match-first unit alternation keeps | ||
| `ms` distinct from `m` and `minutes` from `m`. When no token is recognized the | ||
| function still returns `None`, so `handle_rate_limit` keeps its existing backoff | ||
| fallback. The account then stays in cooldown for the duration the upstream | ||
| actually requested. | ||
|
|
||
| ## Changes | ||
|
|
||
| - Parse minute, hour, and compound `<num><unit>` retry hints in | ||
| `parse_retry_after`, in addition to seconds and milliseconds | ||
| - Sum compound durations (for example `1h2m3s`) into a single seconds value | ||
| - Preserve the `None` result for unparseable hints so the error-count backoff | ||
| fallback in `handle_rate_limit` is unchanged | ||
| - Add unit coverage for minute, hour, compound, and word-form hints | ||
|
|
||
| ## Out of scope | ||
|
|
||
| - Changing the `backoff_seconds` fallback schedule | ||
| - Changing how `reset_at` is extracted or clamped | ||
| - Changing the selector's user-visible "Try again in {N}s" hint ceiling |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| ## ADDED Requirements | ||
|
|
||
| ### Requirement: Upstream rate-limit cooldown honors the Retry-After hint duration | ||
|
|
||
| When an upstream rate-limit error carries a "try again in" hint, the account | ||
| cooldown SHALL last for the full duration the hint expresses. The parser SHALL | ||
| recognize hour, minute, second, and millisecond units, including their word | ||
| forms, and SHALL sum compound hints such as `1h2m3s` into a single duration. | ||
| When the hint contains no recognizable unit token, the system SHALL fall back to | ||
| the error-count backoff schedule. A rate-limited account SHALL NOT be | ||
| re-selected before its cooldown elapses. | ||
|
|
||
| #### Scenario: Compound minute-and-second hint sets the full cooldown | ||
|
|
||
| - **GIVEN** an upstream 429 whose message says "try again in 6m0s" | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the account cooldown lasts 360 seconds | ||
| - **AND** the account is not re-selected until that cooldown elapses | ||
|
|
||
| #### Scenario: Minutes-only hint is honored | ||
|
|
||
| - **GIVEN** an upstream 429 whose message says "try again in 20m" | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the account cooldown lasts 1200 seconds | ||
|
|
||
| #### Scenario: Unparseable hint falls back to backoff | ||
|
|
||
| - **GIVEN** an upstream 429 whose message has no recognizable "try again in" duration | ||
| - **WHEN** the balancer records the rate limit for the account | ||
| - **THEN** the cooldown uses the error-count backoff schedule instead |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| # Tasks | ||
|
|
||
| - [x] Extend `parse_retry_after` to honor minute/hour and compound `<num><unit>` retry hints | ||
| - [x] Keep the `None` fallback for unparseable hints so `handle_rate_limit` backoff is unchanged | ||
| - [x] Add regression coverage for minute, hour, compound, and word-form hints in `tests/unit/test_retry.py` | ||
| - [x] Document the account-routing cooldown requirement delta (proposal + ADDED requirement with GIVEN/WHEN/THEN scenarios) | ||
| - [x] Run `uv run --frozen ruff check .` and `uv run --frozen ruff format --check .` | ||
| - [x] Run `uv run --frozen pytest tests/unit/test_retry.py` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,3 +17,21 @@ def test_parse_retry_after_milliseconds(): | |
|
|
||
| def test_parse_retry_after_missing(): | ||
| assert parse_retry_after("no retry info") is None | ||
|
|
||
|
|
||
| def test_parse_retry_after_minutes(): | ||
| assert parse_retry_after("Try again in 20m") == 1200.0 | ||
|
|
||
|
|
||
| def test_parse_retry_after_compound_minutes_seconds(): | ||
| assert parse_retry_after("Please try again in 6m0s.") == 360.0 | ||
| assert parse_retry_after("Try again in 1m30s") == 90.0 | ||
|
Comment on lines
+26
to
+28
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
AGENTS.md's PR trapdoor requires bug fixes to add regression coverage at the externally failing product path, but this change only tests the parsing helper. Because the actual failure is Useful? React with 👍 / 👎. |
||
|
|
||
|
|
||
| def test_parse_retry_after_compound_hours(): | ||
| assert parse_retry_after("Try again in 1h2m3s") == 3723.0 | ||
|
|
||
|
|
||
| def test_parse_retry_after_word_units(): | ||
| assert parse_retry_after("Try again in 30 seconds") == 30.0 | ||
| assert parse_retry_after("Try again in 2 minutes") == 120.0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an upstream hint uses an unsupported word that starts with a supported one-letter unit, such as
try again in 1 month, this pattern accepts only the leadingmand schedules a 60-second cooldown instead of treating the hint as unparseable. That bypasses the new fallback promised for hints with no recognizable unit token and can reselect an account far earlier than the upstream intended; require a token boundary/negative lookahead around the unit alternatives so abbreviations only match complete units.Useful? React with 👍 / 👎.