Skip to content

fix(http-bridge): clean stale inflight session futures#1044

Open
zvladru wants to merge 2 commits into
Soju06:mainfrom
zvladru:fix/http-bridge-stale-inflight-cleanup
Open

fix(http-bridge): clean stale inflight session futures#1044
zvladru wants to merge 2 commits into
Soju06:mainfrom
zvladru:fix/http-bridge-stale-inflight-cleanup

Conversation

@zvladru

@zvladru zvladru commented Jun 17, 2026

Copy link
Copy Markdown

Problem

The HTTP bridge tracks in-flight session creation futures in _http_bridge_inflight_sessions. If a creation path leaves behind a completed or stale pending future, drain/status can keep reporting an in-flight session create without enough context to explain or recover from it.

That makes graceful maintenance harder: a restart can appear blocked by a stale bridge create rather than by real queued work.

Changes

  • Timestamp newly-created HTTP bridge in-flight session futures.
  • Add a non-blocking cleanup pass to http_bridge_activity_snapshot_nowait().
  • Remove completed in-flight session-create futures from the tracking map.
  • Treat very old pending in-flight futures as stale, remove them, and complete them with a capacity-style ProxyResponseError so waiters are released.
  • Add snapshot counters for oldest age, stale count, and cleanup count.
  • Log structured cleanup diagnostics with bridge kind/key hash, age, threshold, done/cancelled state, and cleanup reason.

Impact

Normal bridge session creation is unchanged. The new path only affects done leftovers or pending futures older than the stale threshold (max(120s, admission_timeout * 6)). Drain/status can now self-heal stale in-flight session-create bookkeeping and explain what was cleaned up.

Tests

  • uv run pytest -q tests/unit/test_proxy_http_bridge.py::test_http_bridge_activity_snapshot_counts_pending_and_inflight_sessions tests/unit/test_proxy_http_bridge.py::test_http_bridge_activity_snapshot_cleans_stale_inflight_session tests/unit/test_proxy_http_bridge.py::test_get_or_create_http_bridge_session_inflight_wait_times_out tests/unit/test_proxy_http_bridge.py::test_get_or_create_http_bridge_session_capacity_wait_times_out

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da7e9f4b2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/_service/http_bridge/mixin.py
@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jun 17, 2026
@zvladru

zvladru commented Jun 18, 2026

Copy link
Copy Markdown
Author

Addressed the automated review feedback in b895c568 by wiring the HTTP bridge activity snapshot into /internal/drain/status.

@codex review

@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jun 18, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b895c5684f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
if cleanup_reason is None:
continue
self._http_bridge_inflight_sessions.pop(key, None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep inflight cleanup under the registry lock

When /internal/drain/status runs while _get_or_create_http_bridge_session is suspended inside _http_bridge_lock on owner/ring/durable awaits, this cleanup can still pop and set exceptions on _http_bridge_inflight_sessions without taking that lock. In that overlap a request that is about to read the inflight future under the lock can instead see it disappear, skipping the terminal inflight error or starting a duplicate bridge create; make this cleanup try-acquire/use the same registry lock, or skip mutation when it cannot.

Useful? React with 👍 / 👎.

@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖 codex: needs work [@codex review] raised an issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants