Skip to content

fix(codex): use a stable prompt_cache_key instead of a per-request uuid4 to enable Codex prompt-cache reuse#2390

Open
sumleo wants to merge 1 commit into
vectorize-io:mainfrom
sumleo:cachelint/hindsight-ap3
Open

fix(codex): use a stable prompt_cache_key instead of a per-request uuid4 to enable Codex prompt-cache reuse#2390
sumleo wants to merge 1 commit into
vectorize-io:mainfrom
sumleo:cachelint/hindsight-ap3

Conversation

@sumleo

@sumleo sumleo commented Jun 25, 2026

Copy link
Copy Markdown

What

In the Codex provider, prompt_cache_key is set to a fresh uuid.uuid4() on every request:

# hindsight-api-slim/hindsight_api/engine/providers/codex_llm.py
"prompt_cache_key": str(uuid.uuid4()),   # in both call() and call_with_tools()

prompt_cache_key is OpenAI/Codex's explicit hint for routing a request to a cached-prefix backend, so it needs to stay constant across calls that share the same instructions + tools[] prefix. A new random value per request means that prefix is never matched, giving a ~100% prompt-cache miss on the hottest LLM path (recall/reflect/retain all funnel through these two methods) with no functional upside.

Fix

Derive the key once per provider instance from (account_id, model) so it is stable across requests but still distinct per account/model, and use it at both call sites. Tiny, self-contained change; the now-unused uuid import is dropped.

@functools.cached_property
def _prompt_cache_key(self) -> str:
    seed = f"{self.account_id}:{self.model}"
    return hashlib.sha256(seed.encode("utf-8")).hexdigest()[:32]

If you'd prefer the key to be scoped more tightly (e.g. per bank/session/reflect-mission) rather than per provider instance, happy to adjust — I kept it to identifiers already in scope to stay minimal.

How this was found

Spotted via static analysis of prompt-cache anti-patterns (a tool I'm experimenting with, CacheLint) and then confirmed by hand on main at 9dafadc (lines ~391 and ~714). I have not been able to run the full integration suite against a live Codex account, so a maintainer sanity-check on the cache-key semantics would be appreciated.

A fresh uuid4 per request makes OpenAI/Codex prompt-cache prefix routing
miss on every call, so the stable system-instructions + tools[] prefix is
never reused. Derive the key once per provider instance from (account,
model) in CodexLLM.call() and call_with_tools().
@sumleo

sumleo commented Jun 27, 2026

Copy link
Copy Markdown
Author

Hi @nicoloboschi — gentle follow-up on this one (open since mid-June). It swaps the per-request uuid4 prompt_cache_key for a stable sha256(account:model) key so Codex prompt-cache reuse actually kicks in. CI is green and it's a small, self-contained change. Would you have a moment to take a look, or let me know if you'd like anything adjusted? Happy to rebase. Thanks for your time and for the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant