feat(stats): opt-out entity-link aggregation via include_entity_links param#2428
Open
chrislatimer wants to merge 10 commits into
Open
feat(stats): opt-out entity-link aggregation via include_entity_links param#2428chrislatimer wants to merge 10 commits into
chrislatimer wants to merge 10 commits into
Conversation
… param
Under multi-tenant workloads where one bank holds many memories with many
distinct entities per memory, the entity-link aggregation in
`get_bank_stats` (a join + group on `unit_entities ⨝ memory_units`)
dominates the response time of `GET /v1/default/banks/{bank_id}/stats`.
Add an optional `include_entity_links` query parameter (default true,
preserving existing behavior) so callers that don't surface the
per-entity slice can skip the aggregation entirely. When false, the
"entity" key in `link_counts` is omitted — matching the existing "no
entity edges yet" rendering — so downstream readers that already tolerate
a missing key see no behavior change.
Engine
------
- `MemoryEngineInterface.get_bank_stats` and the concrete `MemoryEngine`
impl now accept `include_entity_links: bool = True`. The flag is
threaded through to `_compute_bank_stats`, which gates the entity-link
CTE on it.
- `BankStatsCache` extended to accept a `key_suffix` so semantically
distinct variants of the same (schema, bank_id) get separate cache
slots. `invalidate` clears every variant for a bank by matching the
(schema, bank_id) prefix — writers don't have to know which suffixes
the read path is using.
- `get_bank_stats` passes `(include_entity_links,)` as the suffix, so a
True-variant cache hit can never be served to a False caller (or vice
versa).
HTTP
----
- `GET /v1/default/banks/{bank_id}/stats` accepts `?include_entity_links=false`.
Default is true. Response schema is unchanged: the optional "entity"
key in `links_by_link_type` is omitted when skipped, which is the same
shape the existing "0 → key absent" convention emits today.
Tests
-----
Unit (`tests/test_bank_stats_cache.py`):
- `test_key_suffix_separates_cache_slots`: True and False variants
produce distinct cache entries; cross-variant reads never run the
other loader.
- `test_default_key_suffix_is_distinct_slot`: the empty default suffix
is its own slot — does not collide with `(True,)`.
- `test_invalidate_clears_all_variants_for_bank`: a single invalidate
call wipes both variants so a post-write read reloads regardless of
which the client asks for next.
- `test_invalidate_keeps_other_banks`: invalidation is scoped — never
bleeds across banks.
- `test_concurrent_misses_coalesce_per_variant`: two concurrent True
callers share one load; a concurrent False caller runs independently.
Integration (`tests/test_bank_stats.py`):
- `test_bank_stats_default_includes_entity_link_count`: default path
computes the entity-link slice and returns a well-shaped response.
- `test_bank_stats_include_entity_links_false_skips_entity_aggregation`:
the query param threads through to `_compute_bank_stats` with
`include_entity_links=False`, the response omits the "entity" key, and
every other field is still present.
- `test_bank_stats_caches_true_and_false_separately`: at the HTTP layer,
True and False variants each cache once and are served independently
on repeat calls; the payloads agree on every field except the optional
entity slot.
- `test_bank_stats_invalidate_clears_both_variants`: a retain that
invalidates the cache forces both variants to reload from fresh SQL.
Cache unit tests: 17/17 passing. Lint + type check clean. Integration
tests require a running Postgres + LLM and are validated by the
reviewer.
Generated
---------
- `hindsight-docs/static/openapi.json` regenerated; the diff is strictly
additive — new optional query param + expanded route description.
Response schema is unchanged.
- `skills/hindsight-docs/references/openapi.json` regenerated to match
(the generate-docs-skill.sh pre-commit hook syncs the two).
Note: SDK regeneration (`./scripts/generate-clients.sh`) is not included
in this commit because the script requires the Rust toolchain
(`cargo`) which isn't available in this dev env. Reviewer should run it
before merge so the Python / TypeScript / Go / Rust clients pick up the
new optional query param.
…ents
Addresses the test-api and verify-generated-files failures on this PR.
Test fixes
----------
- `test_bank_stats_served_from_cache_on_repeat_call`: the pre-existing
`counting_compute` wrapper took a bare `(bank_id: str)` signature. After
this PR `get_bank_stats` now passes `include_entity_links=` through to
`_compute_bank_stats`, so calling the patched wrapper raised
TypeError("got an unexpected keyword argument 'include_entity_links'")
and the route returned 500. Updated to forward the kwarg.
- `test_bank_stats_invalidate_clears_both_variants`: the original test
triggered invalidation via a retain, on the wrong assumption that
retain invalidates the bank-stats cache. It doesn't — only mutations
that change the counts the endpoint reports (delete_memory_unit,
delete_document, clear_observations, etc.) do. So the cache stayed
warm, the counting wrapper never fired, and `calls == []`. Rewrote to
call `_bank_stats_cache.invalidate(get_current_schema(), test_bank_id)`
directly, which exercises the prefix-clear contract end-to-end
through HTTP and matches what real writers actually do.
Generated clients
-----------------
Ran `./scripts/generate-clients.sh` so the four client SDKs (Python,
TypeScript, Go, Rust) pick up the new optional `include_entity_links`
query parameter. Diff is strictly additive: a new optional `query` field
on the `getAgentStats` operation plus the expanded route description.
No existing type or behavior changes.
…lModelTriggerInput Two compile failures the Rust toolchain caught after the client regen: 1. `api.rs:155` — `get_agent_stats(agent_id, None)` now takes 3 args. The new optional `include_entity_links: Option<bool>` slots between `bank_id` and `authorization`. Pass `None` so we get the server's default (true), preserving the historical CLI behavior. 2. `mental_model.rs:120` + `:198` — `MentalModelTriggerInput` literals were missing `refresh_cron`. The field was added to the spec in PR #2377 (cron-scheduled mental model refresh), the Rust client regenerates from openapi.json at build time, but the CLI was never updated to set it. Existed latently on `main` and was exposed when this PR regenerated all clients in lockstep. Setting to None preserves the existing CLI behavior (no cron schedule).
…_append_mode The repo's pre-commit lint hook reformats these two files but they were not re-committed on `main` (this drift was already present on `main` when this branch was cut). CI's verify-generated-files step runs `./scripts/hooks/lint.sh` after generation, sees the format diff, and fails. Picking up the format drift here so this PR's CI is green; `main` will pick it up on the next regen/merge. Pure formatter changes — no behavior change.
`test_extraction_failure_at_retry_cap_fails_terminally` (added in #2418, guarding the recovered-worker path from #2413) asserts that when fact extraction fails terminally, the original exception message survives into `async_operations.error_message` so an operator can tell apart a structured-JSON parse failure from a rate-limit reset from a network 5xx — all of which can surface as the same exception types in different code paths. The formatter was joining only `type(err).__name__`, producing rows like "chunk 0: RuntimeError". The exception message was discarded, leaving worker failures unactionable and silently defeating the test. The test ran for the first time on this branch (its original PR's test-api job was skipped) and surfaced the bug. Add the message to the summary: "chunk 0: RuntimeError: structured JSON parse failed after all retain_extract_facts attempts". Same shape, just the field the test was added to enforce. Drive-by: pre-existing, unrelated to the include_entity_links work in this PR — but the test is wired in now and CI won't go green without it.
…entity_data param
Extends the graph endpoint with the same opt-out shape this PR added to
/stats. Under workloads where observations carry large source_memory_ids
arrays — multiple thousand UUIDs per visible unit at typical limits —
the entity_lookup_ids IN-list balloons and the unit_entities ⨝ entities
join dominates the response. The downstream in-memory entity-link
inference (sliding-window pairing across every entity's per-unit list)
adds another large constant on top of that.
Callers that don't render entity coloring on nodes or entity-typed
edges (e.g. table-style data views) can pass include_entity_data=false
to skip both. Default true preserves the existing response shape and
historical behavior for SDK consumers + the OSS control plane.
Engine
------
- `MemoryEngine.get_graph_data` accepts `include_entity_data: bool = True`.
When False:
- the `unit_entities ⨝ entities` lookup is skipped → entity_map is empty,
so nodes get the existing "no entities" sentinel
- the entity-link inference (sliding-window pairing) is explicitly gated
even though entity_to_units_visible is necessarily empty without the
lookup — the explicit gate documents the intent
- Semantic-link inference for observations (via shared source memories)
is unrelated to entities and stays unconditional.
HTTP
----
- `GET /v1/default/banks/{bank_id}/graph` accepts
`?include_entity_data=false`. Default true. Response schema unchanged —
nodes still carry an `entities` field, just empty/None when skipped.
CLI
---
- `hindsight-cli` regenerated against the new client signature; passing
None for the new arg preserves existing CLI behavior (server default
= true). Args are alphabetical per progenitor, so the new arg slots
between document_id and limit; a one-line `None` keeps the chain
compiling.
Tests
-----
- `test_graph_default_includes_entity_edges` — default still produces
entity surface (edges of type=entity OR non-empty entities on nodes).
- `test_graph_include_entity_data_false_skips_entity_edges` — no edges
of type=entity; nodes + table_rows still populated.
- `test_graph_include_entity_data_false_omits_node_entities` — nodes'
`entities` field collapses to the empty sentinel.
Generated artifacts regenerated (openapi.json + Python/TS/Go/Rust
clients). The diff is strictly additive: one new optional query param
and the matching client argument.
…ng fixtures The LLMProvider constructor was made config-free in #2405 (no longer reads global HindsightConfig for Vertex AI / litellmrouter fields), and that PR updated test_vertexai_provider.py and test_gemini_safety_settings.py to the explicit-args contract — but two other env-loading fixtures were missed: - tests/test_fact_extraction_analysis.py::llm_config — used by test_fact_extraction_basic_analysis. Now errors at setup with "HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID is required" whenever CI configures Vertex AI as the provider, because the fixture didn't thread the project ID through. - tests/test_llm_provider.py::_make_llm — used by the LLM-acceptance tests. Same root cause for the vertexai matrix variant; the litellmrouter matrix variant hits the same shape, raising "LITELLMROUTER_CONFIG required". Both fixtures now pass the provider-specific required fields explicitly, mirroring the patterns already in MemoryEngine's retain-config build and LLMProvider.from_env() — same env vars, same parser helper, no new attack surface vs the path the constructor used to take itself before #2405. No behaviour change for non-Vertex / non-litellmrouter provider setups (the new args are all None-by-default and only consulted by those branches).
The previous commit landed with multi-line wrapping that the project's ruff formatter rewraps onto a single line. Apply the formatter so verify-generated-files stops drifting.
…rovider errors Mirror the @pytest.mark.flaky(reruns=2, reruns_delay=2) marker that test_llm_api_methods already carries. Both tests hit the same _make_llm fixture and the same upstream provider, and both observe the same class of transient failures — Gemini's 503 UNAVAILABLE during demand spikes, provider rate-limit blips, network hiccups in the CI runner. Without the marker the matrix fails the whole job for a one-off 503; with it pytest-rerunfailures absorbs up to two retries (with a 2s delay) and a real regression still surfaces after the third attempt. Test-only change. No production retry/backoff added (the LLM provider already has its own configurable retry loop for application traffic).
…n-determinism Two layers of LLM non-determinism stack in this test: 1. The fact-extractor sometimes leaves "Yesterday" literal in the When field of the extracted fact instead of resolving it to 2024-11-12 (the day before the supplied event_date). 2. The downstream LLM judge then correctly rejects the literal "Yesterday" as ambiguous, because the criteria explicitly require either the absolute date or an unambiguous reference. Either single layer would already produce occasional failures; the product of the two is higher. The sibling test_date_field_calculation_last_night handles the same class of flake with an in-function max_retries=3 loop — we take the simpler @pytest.mark.flaky(reruns=2, reruns_delay=2) path here to match the convention used for test_llm_memory_operations / test_llm_api_methods. A real regression in temporal handling still surfaces after the third attempt. Test-only change; no production code touched.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Under multi-tenant workloads where one bank holds many memories with many distinct entities per memory, the entity-link aggregation in
get_bank_stats(a join + group onunit_entities ⨝ memory_units) dominates the response time ofGET /v1/default/banks/{bank_id}/stats. The aggregation is an approximation (per-entity count capped at 10) feeding a single dashboard slice, but it's the only piece of the stats endpoint that scales with the cross-product of memories × entities rather than just memories.Add an optional
include_entity_linksquery parameter (defaulttrue, preserving existing behavior) so callers that don't surface the per-entity slice can skip the aggregation entirely. Whenfalse, theentitykey inlink_countsis omitted — matching the existing "no entity edges yet" rendering — so downstream readers that already tolerate a missing key see no behavior change.This is purely additive: no existing endpoint contract, response field, default, or behavior changes. Existing callers (including the OSS control plane, the CLI's
bank statsrenderer, and all generated SDK consumers that don't pass the new param) see identical responses.Changes
Engine
MemoryEngineInterface.get_bank_statsand the concreteMemoryEngineimpl now acceptinclude_entity_links: bool = True. The flag is threaded through to_compute_bank_stats, which gates the entity-link CTE on it.BankStatsCacheextended to accept akey_suffixso semantically distinct variants of the same(schema, bank_id)get separate cache slots.invalidateclears every variant for a bank by matching the(schema, bank_id)prefix — writers don't have to know which suffixes the read path is using.get_bank_statspasses(include_entity_links,)as the suffix, so aTrue-variant cache hit can never be served to aFalsecaller and vice versa.HTTP
GET /v1/default/banks/{bank_id}/statsaccepts?include_entity_links=false. Default istrue. The route description is expanded so clients see the option in the docs.Generated artifacts
hindsight-docs/static/openapi.jsonregenerated. Diff is strictly additive: new optional query param + expanded route description. Response schema unchanged.skills/hindsight-docs/references/openapi.jsonregenerated to match (thegenerate-docs-skill.shpre-commit hook syncs the two)../scripts/generate-clients.sh) not included — the script invokescargo buildfor the Rust client which isn't available in this dev environment. Maintainer should run it before merge so Python / TypeScript / Go / Rust clients all pick up the new optional query param.Test plan
Unit tests (
tests/test_bank_stats_cache.py) — added, all 17 passing locallytest_key_suffix_separates_cache_slots—TrueandFalsevariants produce distinct cache entries; cross-variant reads never run the other loader.test_default_key_suffix_is_distinct_slot— the empty default suffix is its own slot, does not collide with(True,).test_invalidate_clears_all_variants_for_bank— a single invalidate call wipes both variants so a post-write read reloads regardless of which variant the client asks for next.test_invalidate_keeps_other_banks— invalidation is scoped; never bleeds across banks.test_concurrent_misses_coalesce_per_variant— two concurrentTruecallers share one load; a concurrentFalsecaller runs independently.Integration tests (
tests/test_bank_stats.py) — added, require a running Postgres + LLMtest_bank_stats_default_includes_entity_link_count— default path computes the entity-link slice and returns a well-shaped response.test_bank_stats_include_entity_links_false_skips_entity_aggregation— query param threads through to_compute_bank_statswithinclude_entity_links=False; response omits theentitykey; every other field still present.test_bank_stats_caches_true_and_false_separately— at the HTTP layer,TrueandFalsevariants each cache once and are served independently on repeat calls; payloads agree on every field except the optional entity slot.test_bank_stats_invalidate_clears_both_variants— a retain that invalidates the cache forces both variants to reload from fresh SQL.Local results
ruff check: clean on changed filesty check: clean on changed filespg0+ LLM environment to exercise; reviewer should runcd hindsight-api-slim && uv run pytest tests/test_bank_stats.py -vcargoavailable; reviewer should run./scripts/generate-clients.sh