fix(ui): return 409 for overlapping chat turns#1304
Conversation
|
Correct fix for a real safety violation — removing the force-release path eliminates the risk of two coroutines writing to the same session concurrently, and the 409 response gives the client a clear, actionable signal. The change is small, focused, and the test update matches the new contract precisely. Issues Found🟢 Minor — TOCTOU window is safe in asyncio, but worth a short note (
|
itomek
left a comment
There was a problem hiding this comment.
Approve. Verified locally; this is a correct, well-scoped fix.
Thanks @fallintoplace — the diagnosis in the body matches what's in the diff, and the new path at chat.py:92-96 is the right shape.
Independent verification (since this is a fork PR and CI only ran metadata checks at the time of local review — full suite has now completed green):
- 409 on conflict — confirmed at
chat.py:92-96, asserted in the renamedTestSessionLockConflicttest (409 + "already in progress"). - Force-release path fully removed —
asyncio.wait_for, the barerelease(), and the innerexcept RuntimeError: passare all gone. - Test updated and passing — 9/9 in
test_chat_concurrency.py, 672 passing across the broader chat/ui suite. Lint (black + isort) clean on both files.
Two things worth affirming beyond the surface diff:
- Removing the old
release()in the timeout handler closes a real concurrency hazard, not just a style issue.asyncio.Lockdoesn't track ownership, so callingrelease()from a coroutine that never held the lock could hand the lock to a waiter prematurely and corrupt another turn's critical section. Replacing that with a 409 is the correct fix. - The deleted
except RuntimeError: passalso retires a pre-existing fail-quietly pattern, which moves this code in the fail-loudly direction the project wants.
The two pre-existing except Exception: pass blocks at chat.py:44-49 and chat.py:199 are out of scope here — fine to leave for a separate change.
No blocking findings. Deferring to @kovtcharov-amd for the final merge.
Summary
409when a second/api/chat/sendrequest arrives for a session that already has an active turnWhy
The old code would wait 5 seconds, call
release()on the per-sessionasyncio.Lock, and then continue. That is not safe becauseasyncio.Lockdoes not track ownership. A slow but healthy request could still be holding the lock when a second request force-unlocked it, which opened the door to overlapping turns mutating the same session.This change keeps the contract simple: one active turn per session, and the second request gets a
409instead of barging through.Closes #1303.
Testing
uv run pytest tests/unit/chat/ui/test_chat_concurrency.py -quv run python -m compileall src/gaia/ui/routers/chat.py tests/unit/chat/ui/test_chat_concurrency.py