fix(ui): return 409 for overlapping chat turns by fallintoplace · Pull Request #1304 · amd/gaia

fallintoplace · 2026-05-30T11:16:15Z

Summary

return 409 when a second /api/chat/send request arrives for a session that already has an active turn
remove the unsafe force-release path for held session locks
update the chat concurrency unit test to assert the conflict behavior we already expect in stress coverage

Why

The old code would wait 5 seconds, call release() on the per-session asyncio.Lock, and then continue. That is not safe because asyncio.Lock does not track ownership. A slow but healthy request could still be holding the lock when a second request force-unlocked it, which opened the door to overlapping turns mutating the same session.

This change keeps the contract simple: one active turn per session, and the second request gets a 409 instead of barging through.

Closes #1303.

Testing

uv run pytest tests/unit/chat/ui/test_chat_concurrency.py -q
uv run python -m compileall src/gaia/ui/routers/chat.py tests/unit/chat/ui/test_chat_concurrency.py

github-actions · 2026-05-30T11:20:13Z

Correct fix for a real safety violation — removing the force-release path eliminates the risk of two coroutines writing to the same session concurrently, and the 409 response gives the client a clear, actionable signal. The change is small, focused, and the test update matches the new contract precisely.

Issues Found

🟢 Minor — TOCTOU window is safe in asyncio, but worth a short note (`chat.py:92-98`)

The locked() check and acquire() are not atomic at the language level, but they are safe here because asyncio's cooperative model never yields between them (no await between the two lines). A future reader who is unfamiliar with asyncio's single-threaded scheduler might worry about a race. Consider a one-liner comment to pre-empt the question:

    # No await between the check and acquire — safe under asyncio's cooperative scheduler.
    if session_lock.locked():
        raise HTTPException(
            status_code=409,
            detail="A chat request is already in progress for this session. "
            "Please wait for it to finish.",
        )
    await session_lock.acquire()

🟢 Minor — Frontend 409 handling not in scope but should be tracked

The router now returns 409 where it previously succeeded (or eventually succeeded after 5 s). If the Agent UI frontend doesn't already handle 409 on /api/chat/send, users will see an unhandled error instead of a "please wait" message. This PR is backend-only, so it doesn't need to fix this — but it's worth opening a follow-up or checking src/gaia/apps/webui/ for the error-handling path before this lands in a release.

Strengths

Diagnosis is precise. The PR description correctly identifies why the force-release was unsafe (asyncio.Lock has no ownership tracking), and the removal is targeted — nothing else in the concurrency logic is disturbed.
Test cleanup is correct. The old test omitted lock.release() because it expected the force-release path to do it. The new test adds the explicit lock.release() before loop.close(), which is exactly right.
Comment quality. The inline comment in chat.py names the invariant being enforced ("force-releasing … is unsafe because the lock has no ownership tracking") without exceeding one line — follows CLAUDE.md's comment guideline precisely.

Verdict

Approve. The nit above is optional; the fix is sound and safe to merge as-is.

itomek

Approve. Verified locally; this is a correct, well-scoped fix.

Thanks @fallintoplace — the diagnosis in the body matches what's in the diff, and the new path at chat.py:92-96 is the right shape.

Independent verification (since this is a fork PR and CI only ran metadata checks at the time of local review — full suite has now completed green):

409 on conflict — confirmed at chat.py:92-96, asserted in the renamed TestSessionLockConflict test (409 + "already in progress").
Force-release path fully removed — asyncio.wait_for, the bare release(), and the inner except RuntimeError: pass are all gone.
Test updated and passing — 9/9 in test_chat_concurrency.py, 672 passing across the broader chat/ui suite. Lint (black + isort) clean on both files.

Two things worth affirming beyond the surface diff:

Removing the old release() in the timeout handler closes a real concurrency hazard, not just a style issue. asyncio.Lock doesn't track ownership, so calling release() from a coroutine that never held the lock could hand the lock to a waiter prematurely and corrupt another turn's critical section. Replacing that with a 409 is the correct fix.
The deleted except RuntimeError: pass also retires a pre-existing fail-quietly pattern, which moves this code in the fail-loudly direction the project wants.

The two pre-existing except Exception: pass blocks at chat.py:44-49 and chat.py:199 are out of scope here — fine to leave for a separate change.

No blocking findings. Deferring to @kovtcharov-amd for the final merge.

fix(ui): reject overlapping chat turns

e538f07

github-actions Bot added the tests Test changes label May 30, 2026

fallintoplace marked this pull request as ready for review May 30, 2026 11:17

fallintoplace requested a review from kovtcharov-amd as a code owner May 30, 2026 11:17

itomek self-assigned this Jun 1, 2026

itomek added this to the v0.19 — Test & CI Hardening [OSS] milestone Jun 1, 2026

itomek approved these changes Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ui): return 409 for overlapping chat turns#1304

fix(ui): return 409 for overlapping chat turns#1304
fallintoplace wants to merge 1 commit into
amd:mainfrom
fallintoplace:fix/session-lock-conflict-1303

fallintoplace commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

itomek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fallintoplace commented May 30, 2026

Summary

Why

Testing

Uh oh!

github-actions Bot commented May 30, 2026

Issues Found

🟢 Minor — TOCTOU window is safe in asyncio, but worth a short note (chat.py:92-98)

🟢 Minor — Frontend 409 handling not in scope but should be tracked

Strengths

Verdict

Uh oh!

itomek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🟢 Minor — TOCTOU window is safe in asyncio, but worth a short note (`chat.py:92-98`)