fix: stop holding pooled DB connections across embedder/LLM/reranker calls#2434
Draft
zommiommy wants to merge 2 commits into
Draft
fix: stop holding pooled DB connections across embedder/LLM/reranker calls#2434zommiommy wants to merge 2 commits into
zommiommy wants to merge 2 commits into
Conversation
3691864 to
91c4e30
Compare
…LLM calls The background consolidation worker held a single pooled PostgreSQL connection for an entire LLM batch -- across the consolidation recall, the consolidation LLM call, per-action embeddings, and the dedup recall/embed/LLM. Each MemoryEngine pool is bounded (default max 20), so under load these multi-second external calls park the worker's whole pool and the poller and remaining batches block on connection acquisition: the worker saturates its own pool. Idle-but-checked-out connections also add needless pressure on the server's global max_connections. Acquire connections short-lived, only around SQL; run embeddings, the LLM call, and dedup adjudication with no connection held. The action executors and dedup helpers now take the backend (pool) and self-acquire around their own writes. Moving the slow calls off the connection widens the decision-to-write window, so the held-transaction serialization is replaced with explicit guards that preserve the same correctness: - Each source-liveness check is paired with its write in one short transaction (atomic; previously autocommit-per-statement on PostgreSQL, normalized on Oracle). - Dedup CREATE/UPDATE folds are RETURNING-gated and re-filter live source memories inside the fold transaction (FOR SHARE, sources-before- observation lock order, matching _create_observation_directly and _execute_update_action), so a twin or source deleted during the now connection-free window cannot drop a CREATE or fold a dead source id. - Skip embedding entirely when every source memory is already dead. - _execute_create_action reports its action so _process_memory_batch counts a memory as created only when an observation was actually written. The deterministic tests need no DB or LLM and pin the fold guards (RETURNING gate, live-source filtering, created/skipped propagation) and the pre-embed short-circuits directly.
update_memory_unit re-embedded mid-transaction and update_mental_model embedded inside the acquired connection, so an interactive edit held a pooled connection across the embedder call. update_mental_model now embeds before acquiring (its text depends only on its arguments). update_memory_unit is split into two phases: - Phase 1 reads, resolves, and embeds off any connection, into typed _MemoryEditPlan / _MemoryRevertPlan. - Phase 2 is a short write transaction that re-locks the row (FOR UPDATE), filters surviving entities, and applies the precomputed embedding. The wider read-to-write window is covered by explicit guards: - Abort (rollback and retry) if a concurrent edit changed an embedding- or resolution-input column between the phases. - Reclaim orphan entities when an entity edit fails or is skipped, and re-lock the resolved set FOR UPDATE before linking to avoid an FK race with concurrent graph-maintenance pruning. One bounded exception remains by design: when a concurrent prune (or a same-unit entity-only edit) lands between the phases, Phase 2 and the revert path re-embed once under the lock so the stored vector stays consistent with unit_entities. There is no retry wrapper around update_memory_unit, so this is preferred over surfacing a transient error, and it is pinned by the curation tests. Tests cover the two-phase edit, revert, concurrent-edit abort, orphan reclaim, the prune-recovery re-embeds, observation invalidation, and document transfer under the new acquisition model.
91c4e30 to
7ac205a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Several memory-engine paths held a pooled PostgreSQL connection checked out for the entire duration of a slow external call — an embedder, reranker, or LLM round-trip.
The pools are already bounded (
asyncpg, defaultmax_size=20) and per-process — the API and the background worker each build their ownMemoryEngine/pool — so this is not a leak or unbounded growth. The failure mode is saturation:_process_one_llm_batchheld one connection for a whole batch — across the consolidation recall, the consolidation LLM call, per-action embeddings, and the dedup recall/embed/LLM. Under load, enough concurrent batches park the worker's entire pool on these multi-second calls; the poller and remaining batches then block untilacquire_timeout. The worker starves itself.update_memory_unitre-embedded mid-transaction, andupdate_mental_modelembedded inside the acquired connection.max_connections(the sum of every process's pool) for no useful work.(The hot recall/rerank path was already clean: the query is embedded before the connection is acquired, and reranking runs after retrieval returns.)
Why a bounded / bulkhead pool wasn't enough
The obvious smaller fix — bound or bulkhead the pool — doesn't address the real failure:
update_memory_unit/update_mental_modelhold-across-embed paths untouched.To keep consolidation throughput decoupled from connection count, and to shrink the connection footprint everywhere, the connection has to be released across the slow call. That's what this PR does.
Approach
Invariant enforced across the changed paths: no pooled DB connection is held across an embedder / reranker / LLM
await.pool) and self-acquire a short connection around their SQL. Each source-liveness check is paired with its write in one tiny transaction (this strengthens PG — previously autocommit-per-statement — and normalizes Oracle). Dedup folds areRETURNING-gated and re-filter live sources inside the fold transaction (FOR SHARE, sources-before-observation lock order, matching the normal write paths), so a twin or source deleted during the now connection-free window can't drop aCREATEor fold a dead source id.update_memory_unit: split into a read/resolve + embed phase (off-connection, into typed_MemoryEditPlan/_MemoryRevertPlan) and a short write transaction that re-locks the row (FOR UPDATE), filters surviving entities, and applies the precomputed embedding — aborting (rollback) if a concurrent edit changed an embedding-input column between the phases.update_mental_model: embeds before acquiring (its text depends only on its arguments).This is a large change — and why
Moving the embed/LLM out of the held connection widens the read→write window, so this isn't a one-line "release the connection." The held transaction used to provide serialization for free; replacing it requires new guards to preserve the same correctness:
RETURNING-gated, live-source-filtered dedup folds with a consistent (sources-first) lock order,update_memory_unitwith re-lock, abort/retry on inter-phase edits, and orphan-entity reclaim.That's why the diff is substantial (~1.5k lines), concentrated in the two core engine modules and their test suites.
One deliberate exception
A few rare recovery paths re-embed inside the Phase-2 transaction: when a concurrent graph-maintenance prune (or a same-unit entity-only edit) lands between the phases,
update_memory_unit/ revert re-embed under the lock so the stored vector stays consistent withunit_entities. These are bounded to a single re-embed and pinned by tests. There is no retry wrapper aroundupdate_memory_unit(aRuntimeErrorsurfaces as HTTP 500), so converting them to abort/retry would regress interactive edits — the bounded in-txn re-embed is the intentional fallback. It can be converted to strict abort/retry later if a "never embed in-transaction" guarantee is preferred.Verification
ruff+tyclean (changed files).RETURNING-gating, live-source filtering, created/skipped propagation) and the pre-embed guards directly.