feat(stats): opt-out entity-link aggregation via include_entity_links param by chrislatimer · Pull Request #2428 · vectorize-io/hindsight

chrislatimer · 2026-06-26T21:21:48Z

Summary

Under multi-tenant workloads where one bank holds many memories with many distinct entities per memory, the entity-link aggregation in get_bank_stats (a join + group on unit_entities ⨝ memory_units) dominates the response time of GET /v1/default/banks/{bank_id}/stats. The aggregation is an approximation (per-entity count capped at 10) feeding a single dashboard slice, but it's the only piece of the stats endpoint that scales with the cross-product of memories × entities rather than just memories.

Add an optional include_entity_links query parameter (default true, preserving existing behavior) so callers that don't surface the per-entity slice can skip the aggregation entirely. When false, the entity key in link_counts is omitted — matching the existing "no entity edges yet" rendering — so downstream readers that already tolerate a missing key see no behavior change.

This is purely additive: no existing endpoint contract, response field, default, or behavior changes. Existing callers (including the OSS control plane, the CLI's bank stats renderer, and all generated SDK consumers that don't pass the new param) see identical responses.

Changes

Engine

MemoryEngineInterface.get_bank_stats and the concrete MemoryEngine impl now accept include_entity_links: bool = True. The flag is threaded through to _compute_bank_stats, which gates the entity-link CTE on it.
BankStatsCache extended to accept a key_suffix so semantically distinct variants of the same (schema, bank_id) get separate cache slots. invalidate clears every variant for a bank by matching the (schema, bank_id) prefix — writers don't have to know which suffixes the read path is using.
get_bank_stats passes (include_entity_links,) as the suffix, so a True-variant cache hit can never be served to a False caller and vice versa.

HTTP

GET /v1/default/banks/{bank_id}/stats accepts ?include_entity_links=false. Default is true. The route description is expanded so clients see the option in the docs.

Generated artifacts

hindsight-docs/static/openapi.json regenerated. Diff is strictly additive: new optional query param + expanded route description. Response schema unchanged.
skills/hindsight-docs/references/openapi.json regenerated to match (the generate-docs-skill.sh pre-commit hook syncs the two).
Client SDK regeneration (./scripts/generate-clients.sh) not included — the script invokes cargo build for the Rust client which isn't available in this dev environment. Maintainer should run it before merge so Python / TypeScript / Go / Rust clients all pick up the new optional query param.

Test plan

Unit tests (`tests/test_bank_stats_cache.py`) — added, all 17 passing locally

test_key_suffix_separates_cache_slots — True and False variants produce distinct cache entries; cross-variant reads never run the other loader.
test_default_key_suffix_is_distinct_slot — the empty default suffix is its own slot, does not collide with (True,).
test_invalidate_clears_all_variants_for_bank — a single invalidate call wipes both variants so a post-write read reloads regardless of which variant the client asks for next.
test_invalidate_keeps_other_banks — invalidation is scoped; never bleeds across banks.
test_concurrent_misses_coalesce_per_variant — two concurrent True callers share one load; a concurrent False caller runs independently.

Integration tests (`tests/test_bank_stats.py`) — added, require a running Postgres + LLM

test_bank_stats_default_includes_entity_link_count — default path computes the entity-link slice and returns a well-shaped response.
test_bank_stats_include_entity_links_false_skips_entity_aggregation — query param threads through to _compute_bank_stats with include_entity_links=False; response omits the entity key; every other field still present.
test_bank_stats_caches_true_and_false_separately — at the HTTP layer, True and False variants each cache once and are served independently on repeat calls; payloads agree on every field except the optional entity slot.
test_bank_stats_invalidate_clears_both_variants — a retain that invalidates the cache forces both variants to reload from fresh SQL.

Local results

Cache unit tests: 17/17 passing
ruff check: clean on changed files
ty check: clean on changed files
OpenAPI regen: strictly additive
Pre-commit hooks (lint + docs-skill regen): passing
Integration tests: need a running pg0 + LLM environment to exercise; reviewer should run cd hindsight-api-slim && uv run pytest tests/test_bank_stats.py -v
SDK regeneration: needs cargo available; reviewer should run ./scripts/generate-clients.sh

… param Under multi-tenant workloads where one bank holds many memories with many distinct entities per memory, the entity-link aggregation in `get_bank_stats` (a join + group on `unit_entities ⨝ memory_units`) dominates the response time of `GET /v1/default/banks/{bank_id}/stats`. Add an optional `include_entity_links` query parameter (default true, preserving existing behavior) so callers that don't surface the per-entity slice can skip the aggregation entirely. When false, the "entity" key in `link_counts` is omitted — matching the existing "no entity edges yet" rendering — so downstream readers that already tolerate a missing key see no behavior change. Engine ------ - `MemoryEngineInterface.get_bank_stats` and the concrete `MemoryEngine` impl now accept `include_entity_links: bool = True`. The flag is threaded through to `_compute_bank_stats`, which gates the entity-link CTE on it. - `BankStatsCache` extended to accept a `key_suffix` so semantically distinct variants of the same (schema, bank_id) get separate cache slots. `invalidate` clears every variant for a bank by matching the (schema, bank_id) prefix — writers don't have to know which suffixes the read path is using. - `get_bank_stats` passes `(include_entity_links,)` as the suffix, so a True-variant cache hit can never be served to a False caller (or vice versa). HTTP ---- - `GET /v1/default/banks/{bank_id}/stats` accepts `?include_entity_links=false`. Default is true. Response schema is unchanged: the optional "entity" key in `links_by_link_type` is omitted when skipped, which is the same shape the existing "0 → key absent" convention emits today. Tests ----- Unit (`tests/test_bank_stats_cache.py`): - `test_key_suffix_separates_cache_slots`: True and False variants produce distinct cache entries; cross-variant reads never run the other loader. - `test_default_key_suffix_is_distinct_slot`: the empty default suffix is its own slot — does not collide with `(True,)`. - `test_invalidate_clears_all_variants_for_bank`: a single invalidate call wipes both variants so a post-write read reloads regardless of which the client asks for next. - `test_invalidate_keeps_other_banks`: invalidation is scoped — never bleeds across banks. - `test_concurrent_misses_coalesce_per_variant`: two concurrent True callers share one load; a concurrent False caller runs independently. Integration (`tests/test_bank_stats.py`): - `test_bank_stats_default_includes_entity_link_count`: default path computes the entity-link slice and returns a well-shaped response. - `test_bank_stats_include_entity_links_false_skips_entity_aggregation`: the query param threads through to `_compute_bank_stats` with `include_entity_links=False`, the response omits the "entity" key, and every other field is still present. - `test_bank_stats_caches_true_and_false_separately`: at the HTTP layer, True and False variants each cache once and are served independently on repeat calls; the payloads agree on every field except the optional entity slot. - `test_bank_stats_invalidate_clears_both_variants`: a retain that invalidates the cache forces both variants to reload from fresh SQL. Cache unit tests: 17/17 passing. Lint + type check clean. Integration tests require a running Postgres + LLM and are validated by the reviewer. Generated --------- - `hindsight-docs/static/openapi.json` regenerated; the diff is strictly additive — new optional query param + expanded route description. Response schema is unchanged. - `skills/hindsight-docs/references/openapi.json` regenerated to match (the generate-docs-skill.sh pre-commit hook syncs the two). Note: SDK regeneration (`./scripts/generate-clients.sh`) is not included in this commit because the script requires the Rust toolchain (`cargo`) which isn't available in this dev env. Reviewer should run it before merge so the Python / TypeScript / Go / Rust clients pick up the new optional query param.

…ents Addresses the test-api and verify-generated-files failures on this PR. Test fixes ---------- - `test_bank_stats_served_from_cache_on_repeat_call`: the pre-existing `counting_compute` wrapper took a bare `(bank_id: str)` signature. After this PR `get_bank_stats` now passes `include_entity_links=` through to `_compute_bank_stats`, so calling the patched wrapper raised TypeError("got an unexpected keyword argument 'include_entity_links'") and the route returned 500. Updated to forward the kwarg. - `test_bank_stats_invalidate_clears_both_variants`: the original test triggered invalidation via a retain, on the wrong assumption that retain invalidates the bank-stats cache. It doesn't — only mutations that change the counts the endpoint reports (delete_memory_unit, delete_document, clear_observations, etc.) do. So the cache stayed warm, the counting wrapper never fired, and `calls == []`. Rewrote to call `_bank_stats_cache.invalidate(get_current_schema(), test_bank_id)` directly, which exercises the prefix-clear contract end-to-end through HTTP and matches what real writers actually do. Generated clients ----------------- Ran `./scripts/generate-clients.sh` so the four client SDKs (Python, TypeScript, Go, Rust) pick up the new optional `include_entity_links` query parameter. Diff is strictly additive: a new optional `query` field on the `getAgentStats` operation plus the expanded route description. No existing type or behavior changes.

…lModelTriggerInput Two compile failures the Rust toolchain caught after the client regen: 1. `api.rs:155` — `get_agent_stats(agent_id, None)` now takes 3 args. The new optional `include_entity_links: Option<bool>` slots between `bank_id` and `authorization`. Pass `None` so we get the server's default (true), preserving the historical CLI behavior. 2. `mental_model.rs:120` + `:198` — `MentalModelTriggerInput` literals were missing `refresh_cron`. The field was added to the spec in PR #2377 (cron-scheduled mental model refresh), the Rust client regenerates from openapi.json at build time, but the CLI was never updated to set it. Existed latently on `main` and was exposed when this PR regenerated all clients in lockstep. Setting to None preserves the existing CLI behavior (no cron schedule).

…_append_mode The repo's pre-commit lint hook reformats these two files but they were not re-committed on `main` (this drift was already present on `main` when this branch was cut). CI's verify-generated-files step runs `./scripts/hooks/lint.sh` after generation, sees the format diff, and fails. Picking up the format drift here so this PR's CI is green; `main` will pick it up on the next regen/merge. Pure formatter changes — no behavior change.

`test_extraction_failure_at_retry_cap_fails_terminally` (added in #2418, guarding the recovered-worker path from #2413) asserts that when fact extraction fails terminally, the original exception message survives into `async_operations.error_message` so an operator can tell apart a structured-JSON parse failure from a rate-limit reset from a network 5xx — all of which can surface as the same exception types in different code paths. The formatter was joining only `type(err).__name__`, producing rows like "chunk 0: RuntimeError". The exception message was discarded, leaving worker failures unactionable and silently defeating the test. The test ran for the first time on this branch (its original PR's test-api job was skipped) and surfaced the bug. Add the message to the summary: "chunk 0: RuntimeError: structured JSON parse failed after all retain_extract_facts attempts". Same shape, just the field the test was added to enforce. Drive-by: pre-existing, unrelated to the include_entity_links work in this PR — but the test is wired in now and CI won't go green without it.

…entity_data param Extends the graph endpoint with the same opt-out shape this PR added to /stats. Under workloads where observations carry large source_memory_ids arrays — multiple thousand UUIDs per visible unit at typical limits — the entity_lookup_ids IN-list balloons and the unit_entities ⨝ entities join dominates the response. The downstream in-memory entity-link inference (sliding-window pairing across every entity's per-unit list) adds another large constant on top of that. Callers that don't render entity coloring on nodes or entity-typed edges (e.g. table-style data views) can pass include_entity_data=false to skip both. Default true preserves the existing response shape and historical behavior for SDK consumers + the OSS control plane. Engine ------ - `MemoryEngine.get_graph_data` accepts `include_entity_data: bool = True`. When False: - the `unit_entities ⨝ entities` lookup is skipped → entity_map is empty, so nodes get the existing "no entities" sentinel - the entity-link inference (sliding-window pairing) is explicitly gated even though entity_to_units_visible is necessarily empty without the lookup — the explicit gate documents the intent - Semantic-link inference for observations (via shared source memories) is unrelated to entities and stays unconditional. HTTP ---- - `GET /v1/default/banks/{bank_id}/graph` accepts `?include_entity_data=false`. Default true. Response schema unchanged — nodes still carry an `entities` field, just empty/None when skipped. CLI --- - `hindsight-cli` regenerated against the new client signature; passing None for the new arg preserves existing CLI behavior (server default = true). Args are alphabetical per progenitor, so the new arg slots between document_id and limit; a one-line `None` keeps the chain compiling. Tests ----- - `test_graph_default_includes_entity_edges` — default still produces entity surface (edges of type=entity OR non-empty entities on nodes). - `test_graph_include_entity_data_false_skips_entity_edges` — no edges of type=entity; nodes + table_rows still populated. - `test_graph_include_entity_data_false_omits_node_entities` — nodes' `entities` field collapses to the empty sentinel. Generated artifacts regenerated (openapi.json + Python/TS/Go/Rust clients). The diff is strictly additive: one new optional query param and the matching client argument.

…ng fixtures The LLMProvider constructor was made config-free in #2405 (no longer reads global HindsightConfig for Vertex AI / litellmrouter fields), and that PR updated test_vertexai_provider.py and test_gemini_safety_settings.py to the explicit-args contract — but two other env-loading fixtures were missed: - tests/test_fact_extraction_analysis.py::llm_config — used by test_fact_extraction_basic_analysis. Now errors at setup with "HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID is required" whenever CI configures Vertex AI as the provider, because the fixture didn't thread the project ID through. - tests/test_llm_provider.py::_make_llm — used by the LLM-acceptance tests. Same root cause for the vertexai matrix variant; the litellmrouter matrix variant hits the same shape, raising "LITELLMROUTER_CONFIG required". Both fixtures now pass the provider-specific required fields explicitly, mirroring the patterns already in MemoryEngine's retain-config build and LLMProvider.from_env() — same env vars, same parser helper, no new attack surface vs the path the constructor used to take itself before #2405. No behaviour change for non-Vertex / non-litellmrouter provider setups (the new args are all None-by-default and only consulted by those branches).

The previous commit landed with multi-line wrapping that the project's ruff formatter rewraps onto a single line. Apply the formatter so verify-generated-files stops drifting.

…rovider errors Mirror the @pytest.mark.flaky(reruns=2, reruns_delay=2) marker that test_llm_api_methods already carries. Both tests hit the same _make_llm fixture and the same upstream provider, and both observe the same class of transient failures — Gemini's 503 UNAVAILABLE during demand spikes, provider rate-limit blips, network hiccups in the CI runner. Without the marker the matrix fails the whole job for a one-off 503; with it pytest-rerunfailures absorbs up to two retries (with a 2s delay) and a real regression still surfaces after the third attempt. Test-only change. No production retry/backoff added (the LLM provider already has its own configurable retry loop for application traffic).

…n-determinism Two layers of LLM non-determinism stack in this test: 1. The fact-extractor sometimes leaves "Yesterday" literal in the When field of the extracted fact instead of resolving it to 2024-11-12 (the day before the supplied event_date). 2. The downstream LLM judge then correctly rejects the literal "Yesterday" as ambiguous, because the criteria explicitly require either the absolute date or an unambiguous reference. Either single layer would already produce occasional failures; the product of the two is higher. The sibling test_date_field_calculation_last_night handles the same class of flake with an in-function max_retries=3 loop — we take the simpler @pytest.mark.flaky(reruns=2, reruns_delay=2) path here to match the convention used for test_llm_memory_operations / test_llm_api_methods. A real regression in temporal handling still surfaces after the third attempt. Test-only change; no production code touched.

chrislatimer added 10 commits June 26, 2026 15:20

chore: ruff-format the llm_config fixture changes (single-line per spec)

3e5d94d

The previous commit landed with multi-line wrapping that the project's ruff formatter rewraps onto a single line. Apply the formatter so verify-generated-files stops drifting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(stats): opt-out entity-link aggregation via include_entity_links param#2428

feat(stats): opt-out entity-link aggregation via include_entity_links param#2428
chrislatimer wants to merge 10 commits into
mainfrom
feat/stats-include-entity-links-optout

chrislatimer commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chrislatimer commented Jun 26, 2026

Summary

Changes

Engine

HTTP

Generated artifacts

Test plan

Unit tests (tests/test_bank_stats_cache.py) — added, all 17 passing locally

Integration tests (tests/test_bank_stats.py) — added, require a running Postgres + LLM

Local results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Unit tests (`tests/test_bank_stats_cache.py`) — added, all 17 passing locally

Integration tests (`tests/test_bank_stats.py`) — added, require a running Postgres + LLM