Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 46 additions & 6 deletions app/core/utils/retry.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,38 @@
import random
import re

_RETRY_PATTERN = re.compile(r"(?i)try again in\s*(\d+(?:\.\d+)?)\s*(s|ms|seconds?)")
# Units the retry hint may use, longest-first within each family so that, for
# example, ``ms`` is preferred over ``m`` and ``minutes`` over ``m``.
_UNIT_ALTERNATION = r"ms|milliseconds?|hours?|hrs?|h|minutes?|mins?|m|seconds?|secs?|s"

# Seconds-per-unit for every literal ``_UNIT_ALTERNATION`` can capture.
_UNIT_SECONDS: dict[str, float] = {
"ms": 0.001,
"millisecond": 0.001,
"milliseconds": 0.001,
"h": 3600.0,
"hr": 3600.0,
"hrs": 3600.0,
"hour": 3600.0,
"hours": 3600.0,
"m": 60.0,
"min": 60.0,
"mins": 60.0,
"minute": 60.0,
"minutes": 60.0,
"s": 1.0,
"sec": 1.0,
"secs": 1.0,
"second": 1.0,
"seconds": 1.0,
}

# Capture the contiguous run of ``<number><unit>`` components that immediately
# follows "try again in" so compound hints such as ``6m0s`` or ``1h2m3s`` are
# read in full instead of stopping at the first unit.
_RETRY_PATTERN = re.compile(rf"(?i)try again in\s*((?:\d+(?:\.\d+)?\s*(?:{_UNIT_ALTERNATION})\s*)+)")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delimit unit matches before accepting abbreviations

When an upstream hint uses an unsupported word that starts with a supported one-letter unit, such as try again in 1 month, this pattern accepts only the leading m and schedules a 60-second cooldown instead of treating the hint as unparseable. That bypasses the new fallback promised for hints with no recognizable unit token and can reselect an account far earlier than the upstream intended; require a token boundary/negative lookahead around the unit alternatives so abbreviations only match complete units.

Useful? React with 👍 / 👎.

_DURATION_TOKEN = re.compile(rf"(?i)(\d+(?:\.\d+)?)\s*({_UNIT_ALTERNATION})")
Comment on lines +35 to +36

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent longer units from being parsed as minutes

Because the unit alternation is not followed by a token boundary, unsupported longer units that begin with a supported compact unit are partially accepted; for example, parse_retry_after("Try again in 2 months") now matches the leading 2m and returns 120.0. If an upstream monthly/quota message uses month(s), handle_rate_limit will retry after minutes instead of respecting the larger hint or falling back, so add a boundary/negative lookahead around units while preserving compact forms like 1m30s.

Useful? React with 👍 / 👎.


_BACKOFF_INITIAL_DELAY_MS = 200
_BACKOFF_FACTOR = 2.0
_BACKOFF_JITTER_MIN = 0.9
Expand All @@ -14,11 +45,20 @@ def parse_retry_after(message: str) -> float | None:
match = _RETRY_PATTERN.search(message or "")
if not match:
return None
value = float(match.group(1))
unit = match.group(2).lower()
if unit == "ms":
return value / 1000
return value
total = 0.0
matched = False
for value, unit in _DURATION_TOKEN.findall(match.group(1)):
multiplier = _UNIT_SECONDS.get(unit.lower())
if multiplier is None:
continue
if unit.lower() == "ms":
total += float(value) / 1000
else:
total += float(value) * multiplier
matched = True
if not matched:
return None
return total


def backoff_seconds(attempt: int) -> float:
Expand Down
38 changes: 38 additions & 0 deletions openspec/changes/honor-upstream-retry-after-hint-units/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Honor upstream Retry-After hint units in account cooldown

## Problem

When an upstream Codex/ChatGPT response rate-limits an account (HTTP 429), its
message often carries a "try again in <duration>" hint. `parse_retry_after`
only recognized second and millisecond units, so any minute, hour, or compound
hint (for example `20m`, `6m0s`, `1h2m3s`) failed to match and returned `None`.
`handle_rate_limit` then fell back to `backoff_seconds(state.error_count)`, a
sub-second-to-few-second backoff, and set `cooldown_until` far earlier than the
upstream asked. The balancer re-selected the still-rate-limited account almost
immediately, re-sending traffic into the same 429 and amplifying the rate limit
instead of waiting the cooldown out.

## Solution

Teach `parse_retry_after` to parse the full contiguous run of `<number><unit>`
tokens after "try again in", summing hour, minute, second, and millisecond
components (and their word forms). A longest-match-first unit alternation keeps
`ms` distinct from `m` and `minutes` from `m`. When no token is recognized the
function still returns `None`, so `handle_rate_limit` keeps its existing backoff
fallback. The account then stays in cooldown for the duration the upstream
actually requested.

## Changes

- Parse minute, hour, and compound `<num><unit>` retry hints in
`parse_retry_after`, in addition to seconds and milliseconds
- Sum compound durations (for example `1h2m3s`) into a single seconds value
- Preserve the `None` result for unparseable hints so the error-count backoff
fallback in `handle_rate_limit` is unchanged
- Add unit coverage for minute, hour, compound, and word-form hints

## Out of scope

- Changing the `backoff_seconds` fallback schedule
- Changing how `reset_at` is extracted or clamped
- Changing the selector's user-visible "Try again in {N}s" hint ceiling
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## ADDED Requirements

### Requirement: Upstream rate-limit cooldown honors the Retry-After hint duration

When an upstream rate-limit error carries a "try again in" hint, the account
cooldown SHALL last for the full duration the hint expresses. The parser SHALL
recognize hour, minute, second, and millisecond units, including their word
forms, and SHALL sum compound hints such as `1h2m3s` into a single duration.
When the hint contains no recognizable unit token, the system SHALL fall back to
the error-count backoff schedule. A rate-limited account SHALL NOT be
re-selected before its cooldown elapses.

#### Scenario: Compound minute-and-second hint sets the full cooldown

- **GIVEN** an upstream 429 whose message says "try again in 6m0s"
- **WHEN** the balancer records the rate limit for the account
- **THEN** the account cooldown lasts 360 seconds
- **AND** the account is not re-selected until that cooldown elapses

#### Scenario: Minutes-only hint is honored

- **GIVEN** an upstream 429 whose message says "try again in 20m"
- **WHEN** the balancer records the rate limit for the account
- **THEN** the account cooldown lasts 1200 seconds

#### Scenario: Unparseable hint falls back to backoff

- **GIVEN** an upstream 429 whose message has no recognizable "try again in" duration
- **WHEN** the balancer records the rate limit for the account
- **THEN** the cooldown uses the error-count backoff schedule instead
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Tasks

- [x] Extend `parse_retry_after` to honor minute/hour and compound `<num><unit>` retry hints
- [x] Keep the `None` fallback for unparseable hints so `handle_rate_limit` backoff is unchanged
- [x] Add regression coverage for minute, hour, compound, and word-form hints in `tests/unit/test_retry.py`
- [x] Document the account-routing cooldown requirement delta (proposal + ADDED requirement with GIVEN/WHEN/THEN scenarios)
- [x] Run `uv run --frozen ruff check .` and `uv run --frozen ruff format --check .`
- [x] Run `uv run --frozen pytest tests/unit/test_retry.py`
18 changes: 18 additions & 0 deletions tests/unit/test_retry.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,21 @@ def test_parse_retry_after_milliseconds():

def test_parse_retry_after_missing():
assert parse_retry_after("no retry info") is None


def test_parse_retry_after_minutes():
assert parse_retry_after("Try again in 20m") == 1200.0


def test_parse_retry_after_compound_minutes_seconds():
assert parse_retry_after("Please try again in 6m0s.") == 360.0
assert parse_retry_after("Try again in 1m30s") == 90.0
Comment on lines +26 to +28

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add product-path coverage for cooldown handling

AGENTS.md's PR trapdoor requires bug fixes to add regression coverage at the externally failing product path, but this change only tests the parsing helper. Because the actual failure is handle_rate_limit setting cooldown_until and select_account skipping that account, a future wiring regression where the parsed minute/compound delay is ignored would still pass these tests; add a load-balancer-level test for a 6m0s/20m 429 that verifies the cooldown and non-reselection behavior.

Useful? React with 👍 / 👎.



def test_parse_retry_after_compound_hours():
assert parse_retry_after("Try again in 1h2m3s") == 3723.0


def test_parse_retry_after_word_units():
assert parse_retry_after("Try again in 30 seconds") == 30.0
assert parse_retry_after("Try again in 2 minutes") == 120.0