Skip to content

feat(stats): opt-out entity-link aggregation via include_entity_links param#2428

Open
chrislatimer wants to merge 10 commits into
mainfrom
feat/stats-include-entity-links-optout
Open

feat(stats): opt-out entity-link aggregation via include_entity_links param#2428
chrislatimer wants to merge 10 commits into
mainfrom
feat/stats-include-entity-links-optout

Conversation

@chrislatimer

Copy link
Copy Markdown
Contributor

Summary

Under multi-tenant workloads where one bank holds many memories with many distinct entities per memory, the entity-link aggregation in get_bank_stats (a join + group on unit_entities ⨝ memory_units) dominates the response time of GET /v1/default/banks/{bank_id}/stats. The aggregation is an approximation (per-entity count capped at 10) feeding a single dashboard slice, but it's the only piece of the stats endpoint that scales with the cross-product of memories × entities rather than just memories.

Add an optional include_entity_links query parameter (default true, preserving existing behavior) so callers that don't surface the per-entity slice can skip the aggregation entirely. When false, the entity key in link_counts is omitted — matching the existing "no entity edges yet" rendering — so downstream readers that already tolerate a missing key see no behavior change.

This is purely additive: no existing endpoint contract, response field, default, or behavior changes. Existing callers (including the OSS control plane, the CLI's bank stats renderer, and all generated SDK consumers that don't pass the new param) see identical responses.

Changes

Engine

  • MemoryEngineInterface.get_bank_stats and the concrete MemoryEngine impl now accept include_entity_links: bool = True. The flag is threaded through to _compute_bank_stats, which gates the entity-link CTE on it.
  • BankStatsCache extended to accept a key_suffix so semantically distinct variants of the same (schema, bank_id) get separate cache slots. invalidate clears every variant for a bank by matching the (schema, bank_id) prefix — writers don't have to know which suffixes the read path is using.
  • get_bank_stats passes (include_entity_links,) as the suffix, so a True-variant cache hit can never be served to a False caller and vice versa.

HTTP

  • GET /v1/default/banks/{bank_id}/stats accepts ?include_entity_links=false. Default is true. The route description is expanded so clients see the option in the docs.

Generated artifacts

  • hindsight-docs/static/openapi.json regenerated. Diff is strictly additive: new optional query param + expanded route description. Response schema unchanged.
  • skills/hindsight-docs/references/openapi.json regenerated to match (the generate-docs-skill.sh pre-commit hook syncs the two).
  • Client SDK regeneration (./scripts/generate-clients.sh) not included — the script invokes cargo build for the Rust client which isn't available in this dev environment. Maintainer should run it before merge so Python / TypeScript / Go / Rust clients all pick up the new optional query param.

Test plan

Unit tests (tests/test_bank_stats_cache.py) — added, all 17 passing locally

  • test_key_suffix_separates_cache_slotsTrue and False variants produce distinct cache entries; cross-variant reads never run the other loader.
  • test_default_key_suffix_is_distinct_slot — the empty default suffix is its own slot, does not collide with (True,).
  • test_invalidate_clears_all_variants_for_bank — a single invalidate call wipes both variants so a post-write read reloads regardless of which variant the client asks for next.
  • test_invalidate_keeps_other_banks — invalidation is scoped; never bleeds across banks.
  • test_concurrent_misses_coalesce_per_variant — two concurrent True callers share one load; a concurrent False caller runs independently.

Integration tests (tests/test_bank_stats.py) — added, require a running Postgres + LLM

  • test_bank_stats_default_includes_entity_link_count — default path computes the entity-link slice and returns a well-shaped response.
  • test_bank_stats_include_entity_links_false_skips_entity_aggregation — query param threads through to _compute_bank_stats with include_entity_links=False; response omits the entity key; every other field still present.
  • test_bank_stats_caches_true_and_false_separately — at the HTTP layer, True and False variants each cache once and are served independently on repeat calls; payloads agree on every field except the optional entity slot.
  • test_bank_stats_invalidate_clears_both_variants — a retain that invalidates the cache forces both variants to reload from fresh SQL.

Local results

  • Cache unit tests: 17/17 passing
  • ruff check: clean on changed files
  • ty check: clean on changed files
  • OpenAPI regen: strictly additive
  • Pre-commit hooks (lint + docs-skill regen): passing
  • Integration tests: need a running pg0 + LLM environment to exercise; reviewer should run cd hindsight-api-slim && uv run pytest tests/test_bank_stats.py -v
  • SDK regeneration: needs cargo available; reviewer should run ./scripts/generate-clients.sh

… param

Under multi-tenant workloads where one bank holds many memories with many
distinct entities per memory, the entity-link aggregation in
`get_bank_stats` (a join + group on `unit_entities ⨝ memory_units`)
dominates the response time of `GET /v1/default/banks/{bank_id}/stats`.

Add an optional `include_entity_links` query parameter (default true,
preserving existing behavior) so callers that don't surface the
per-entity slice can skip the aggregation entirely. When false, the
"entity" key in `link_counts` is omitted — matching the existing "no
entity edges yet" rendering — so downstream readers that already tolerate
a missing key see no behavior change.

Engine
------
- `MemoryEngineInterface.get_bank_stats` and the concrete `MemoryEngine`
  impl now accept `include_entity_links: bool = True`. The flag is
  threaded through to `_compute_bank_stats`, which gates the entity-link
  CTE on it.
- `BankStatsCache` extended to accept a `key_suffix` so semantically
  distinct variants of the same (schema, bank_id) get separate cache
  slots. `invalidate` clears every variant for a bank by matching the
  (schema, bank_id) prefix — writers don't have to know which suffixes
  the read path is using.
- `get_bank_stats` passes `(include_entity_links,)` as the suffix, so a
  True-variant cache hit can never be served to a False caller (or vice
  versa).

HTTP
----
- `GET /v1/default/banks/{bank_id}/stats` accepts `?include_entity_links=false`.
  Default is true. Response schema is unchanged: the optional "entity"
  key in `links_by_link_type` is omitted when skipped, which is the same
  shape the existing "0 → key absent" convention emits today.

Tests
-----
Unit (`tests/test_bank_stats_cache.py`):
- `test_key_suffix_separates_cache_slots`: True and False variants
  produce distinct cache entries; cross-variant reads never run the
  other loader.
- `test_default_key_suffix_is_distinct_slot`: the empty default suffix
  is its own slot — does not collide with `(True,)`.
- `test_invalidate_clears_all_variants_for_bank`: a single invalidate
  call wipes both variants so a post-write read reloads regardless of
  which the client asks for next.
- `test_invalidate_keeps_other_banks`: invalidation is scoped — never
  bleeds across banks.
- `test_concurrent_misses_coalesce_per_variant`: two concurrent True
  callers share one load; a concurrent False caller runs independently.

Integration (`tests/test_bank_stats.py`):
- `test_bank_stats_default_includes_entity_link_count`: default path
  computes the entity-link slice and returns a well-shaped response.
- `test_bank_stats_include_entity_links_false_skips_entity_aggregation`:
  the query param threads through to `_compute_bank_stats` with
  `include_entity_links=False`, the response omits the "entity" key, and
  every other field is still present.
- `test_bank_stats_caches_true_and_false_separately`: at the HTTP layer,
  True and False variants each cache once and are served independently
  on repeat calls; the payloads agree on every field except the optional
  entity slot.
- `test_bank_stats_invalidate_clears_both_variants`: a retain that
  invalidates the cache forces both variants to reload from fresh SQL.

Cache unit tests: 17/17 passing. Lint + type check clean. Integration
tests require a running Postgres + LLM and are validated by the
reviewer.

Generated
---------
- `hindsight-docs/static/openapi.json` regenerated; the diff is strictly
  additive — new optional query param + expanded route description.
  Response schema is unchanged.
- `skills/hindsight-docs/references/openapi.json` regenerated to match
  (the generate-docs-skill.sh pre-commit hook syncs the two).

Note: SDK regeneration (`./scripts/generate-clients.sh`) is not included
in this commit because the script requires the Rust toolchain
(`cargo`) which isn't available in this dev env. Reviewer should run it
before merge so the Python / TypeScript / Go / Rust clients pick up the
new optional query param.
…ents

Addresses the test-api and verify-generated-files failures on this PR.

Test fixes
----------
- `test_bank_stats_served_from_cache_on_repeat_call`: the pre-existing
  `counting_compute` wrapper took a bare `(bank_id: str)` signature. After
  this PR `get_bank_stats` now passes `include_entity_links=` through to
  `_compute_bank_stats`, so calling the patched wrapper raised
  TypeError("got an unexpected keyword argument 'include_entity_links'")
  and the route returned 500. Updated to forward the kwarg.
- `test_bank_stats_invalidate_clears_both_variants`: the original test
  triggered invalidation via a retain, on the wrong assumption that
  retain invalidates the bank-stats cache. It doesn't — only mutations
  that change the counts the endpoint reports (delete_memory_unit,
  delete_document, clear_observations, etc.) do. So the cache stayed
  warm, the counting wrapper never fired, and `calls == []`. Rewrote to
  call `_bank_stats_cache.invalidate(get_current_schema(), test_bank_id)`
  directly, which exercises the prefix-clear contract end-to-end
  through HTTP and matches what real writers actually do.

Generated clients
-----------------
Ran `./scripts/generate-clients.sh` so the four client SDKs (Python,
TypeScript, Go, Rust) pick up the new optional `include_entity_links`
query parameter. Diff is strictly additive: a new optional `query` field
on the `getAgentStats` operation plus the expanded route description.
No existing type or behavior changes.
…lModelTriggerInput

Two compile failures the Rust toolchain caught after the client regen:

1. `api.rs:155` — `get_agent_stats(agent_id, None)` now takes 3 args.
   The new optional `include_entity_links: Option<bool>` slots between
   `bank_id` and `authorization`. Pass `None` so we get the server's
   default (true), preserving the historical CLI behavior.

2. `mental_model.rs:120` + `:198` — `MentalModelTriggerInput` literals
   were missing `refresh_cron`. The field was added to the spec in
   PR #2377 (cron-scheduled mental model refresh), the Rust client
   regenerates from openapi.json at build time, but the CLI was never
   updated to set it. Existed latently on `main` and was exposed when
   this PR regenerated all clients in lockstep. Setting to None
   preserves the existing CLI behavior (no cron schedule).
…_append_mode

The repo's pre-commit lint hook reformats these two files but they were
not re-committed on `main` (this drift was already present on `main`
when this branch was cut). CI's verify-generated-files step runs
`./scripts/hooks/lint.sh` after generation, sees the format diff, and
fails. Picking up the format drift here so this PR's CI is green;
`main` will pick it up on the next regen/merge.

Pure formatter changes — no behavior change.
`test_extraction_failure_at_retry_cap_fails_terminally` (added in #2418,
guarding the recovered-worker path from #2413) asserts that when fact
extraction fails terminally, the original exception message survives
into `async_operations.error_message` so an operator can tell apart a
structured-JSON parse failure from a rate-limit reset from a network
5xx — all of which can surface as the same exception types in different
code paths.

The formatter was joining only `type(err).__name__`, producing rows like
"chunk 0: RuntimeError". The exception message was discarded, leaving
worker failures unactionable and silently defeating the test. The test
ran for the first time on this branch (its original PR's test-api job
was skipped) and surfaced the bug.

Add the message to the summary: "chunk 0: RuntimeError: structured JSON
parse failed after all retain_extract_facts attempts". Same shape, just
the field the test was added to enforce.

Drive-by: pre-existing, unrelated to the include_entity_links work in
this PR — but the test is wired in now and CI won't go green without it.
…entity_data param

Extends the graph endpoint with the same opt-out shape this PR added to
/stats. Under workloads where observations carry large source_memory_ids
arrays — multiple thousand UUIDs per visible unit at typical limits —
the entity_lookup_ids IN-list balloons and the unit_entities ⨝ entities
join dominates the response. The downstream in-memory entity-link
inference (sliding-window pairing across every entity's per-unit list)
adds another large constant on top of that.

Callers that don't render entity coloring on nodes or entity-typed
edges (e.g. table-style data views) can pass include_entity_data=false
to skip both. Default true preserves the existing response shape and
historical behavior for SDK consumers + the OSS control plane.

Engine
------
- `MemoryEngine.get_graph_data` accepts `include_entity_data: bool = True`.
  When False:
  - the `unit_entities ⨝ entities` lookup is skipped → entity_map is empty,
    so nodes get the existing "no entities" sentinel
  - the entity-link inference (sliding-window pairing) is explicitly gated
    even though entity_to_units_visible is necessarily empty without the
    lookup — the explicit gate documents the intent
- Semantic-link inference for observations (via shared source memories)
  is unrelated to entities and stays unconditional.

HTTP
----
- `GET /v1/default/banks/{bank_id}/graph` accepts
  `?include_entity_data=false`. Default true. Response schema unchanged —
  nodes still carry an `entities` field, just empty/None when skipped.

CLI
---
- `hindsight-cli` regenerated against the new client signature; passing
  None for the new arg preserves existing CLI behavior (server default
  = true). Args are alphabetical per progenitor, so the new arg slots
  between document_id and limit; a one-line `None` keeps the chain
  compiling.

Tests
-----
- `test_graph_default_includes_entity_edges` — default still produces
  entity surface (edges of type=entity OR non-empty entities on nodes).
- `test_graph_include_entity_data_false_skips_entity_edges` — no edges
  of type=entity; nodes + table_rows still populated.
- `test_graph_include_entity_data_false_omits_node_entities` — nodes'
  `entities` field collapses to the empty sentinel.

Generated artifacts regenerated (openapi.json + Python/TS/Go/Rust
clients). The diff is strictly additive: one new optional query param
and the matching client argument.
…ng fixtures

The LLMProvider constructor was made config-free in #2405 (no longer reads
global HindsightConfig for Vertex AI / litellmrouter fields), and that PR
updated test_vertexai_provider.py and test_gemini_safety_settings.py to the
explicit-args contract — but two other env-loading fixtures were missed:

- tests/test_fact_extraction_analysis.py::llm_config — used by
  test_fact_extraction_basic_analysis. Now errors at setup with
  "HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID is required" whenever CI configures
  Vertex AI as the provider, because the fixture didn't thread the project
  ID through.
- tests/test_llm_provider.py::_make_llm — used by the LLM-acceptance tests.
  Same root cause for the vertexai matrix variant; the litellmrouter matrix
  variant hits the same shape, raising "LITELLMROUTER_CONFIG required".

Both fixtures now pass the provider-specific required fields explicitly,
mirroring the patterns already in MemoryEngine's retain-config build and
LLMProvider.from_env() — same env vars, same parser helper, no new attack
surface vs the path the constructor used to take itself before #2405.

No behaviour change for non-Vertex / non-litellmrouter provider setups
(the new args are all None-by-default and only consulted by those branches).
The previous commit landed with multi-line wrapping that the project's ruff
formatter rewraps onto a single line. Apply the formatter so
verify-generated-files stops drifting.
…rovider errors

Mirror the @pytest.mark.flaky(reruns=2, reruns_delay=2) marker that
test_llm_api_methods already carries. Both tests hit the same _make_llm
fixture and the same upstream provider, and both observe the same class
of transient failures — Gemini's 503 UNAVAILABLE during demand spikes,
provider rate-limit blips, network hiccups in the CI runner. Without
the marker the matrix fails the whole job for a one-off 503; with it
pytest-rerunfailures absorbs up to two retries (with a 2s delay) and a
real regression still surfaces after the third attempt.

Test-only change. No production retry/backoff added (the LLM provider
already has its own configurable retry loop for application traffic).
…n-determinism

Two layers of LLM non-determinism stack in this test:

1. The fact-extractor sometimes leaves "Yesterday" literal in the When
   field of the extracted fact instead of resolving it to 2024-11-12 (the
   day before the supplied event_date).
2. The downstream LLM judge then correctly rejects the literal "Yesterday"
   as ambiguous, because the criteria explicitly require either the
   absolute date or an unambiguous reference.

Either single layer would already produce occasional failures; the product
of the two is higher. The sibling test_date_field_calculation_last_night
handles the same class of flake with an in-function max_retries=3 loop —
we take the simpler @pytest.mark.flaky(reruns=2, reruns_delay=2) path here
to match the convention used for test_llm_memory_operations /
test_llm_api_methods. A real regression in temporal handling still
surfaces after the third attempt.

Test-only change; no production code touched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant