fix(codex): use a stable prompt_cache_key instead of a per-request uuid4 to enable Codex prompt-cache reuse#2390
Open
sumleo wants to merge 1 commit into
Open
Conversation
A fresh uuid4 per request makes OpenAI/Codex prompt-cache prefix routing miss on every call, so the stable system-instructions + tools[] prefix is never reused. Derive the key once per provider instance from (account, model) in CodexLLM.call() and call_with_tools().
Author
|
Hi @nicoloboschi — gentle follow-up on this one (open since mid-June). It swaps the per-request |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
In the Codex provider,
prompt_cache_keyis set to a freshuuid.uuid4()on every request:prompt_cache_keyis OpenAI/Codex's explicit hint for routing a request to a cached-prefix backend, so it needs to stay constant across calls that share the sameinstructions+tools[]prefix. A new random value per request means that prefix is never matched, giving a ~100% prompt-cache miss on the hottest LLM path (recall/reflect/retain all funnel through these two methods) with no functional upside.Fix
Derive the key once per provider instance from
(account_id, model)so it is stable across requests but still distinct per account/model, and use it at both call sites. Tiny, self-contained change; the now-unuseduuidimport is dropped.If you'd prefer the key to be scoped more tightly (e.g. per bank/session/reflect-mission) rather than per provider instance, happy to adjust — I kept it to identifiers already in scope to stay minimal.
How this was found
Spotted via static analysis of prompt-cache anti-patterns (a tool I'm experimenting with, CacheLint) and then confirmed by hand on
mainat9dafadc(lines ~391 and ~714). I have not been able to run the full integration suite against a live Codex account, so a maintainer sanity-check on the cache-key semantics would be appreciated.