Skip to content

Read document status from a local snapshot (fix is_* N+1) — SDK 1.2.6#411

Merged
Adityav369 merged 1 commit into
mainfrom
adi/document-status-snapshot
Jun 19, 2026
Merged

Read document status from a local snapshot (fix is_* N+1) — SDK 1.2.6#411
Adityav369 merged 1 commit into
mainfrom
adi/document-status-snapshot

Conversation

@Adityav369

Copy link
Copy Markdown
Collaborator

Summary

Fixes an N+1 in the SDK: reading Document.is_failed / is_processing / is_ingested / status / error made a get_document_status API call on every access, so building a list of dicts over N documents fired ~3N extra requests — erasing the speed win from field projection.

Now these read the status the document already carries (system_metadata) — zero network calls. Measured against a live server: 0 extra calls for 50 docs (was ~150).

Changes

SDK (models.py)

  • status / is_* / error read local system_metadata instead of calling the server.
  • status returns as_of (when the snapshot was pulled) and source ("local" / "not_loaded"). If status wasn't fetched (e.g. projected away), is_* return False and make no call.
  • New Document.refresh() for an explicit live re-fetch; docstrings point to refresh() / wait_for_completion() for the current status.

Server (postgres_database.py, routes/utils.py)

  • status (and error/timestamps) is now a cheap projectable field: list_documents(fields=[..., "status"]) reads system_metadata->>'status' via JSON-path (no document text) and returns it in a slim system_metadata, so is_* resolve locally with zero extra calls.

Release: bump SDK to 1.2.6 (published to PyPI), CHANGELOG updated.

Behavior note

status / is_* now reflect the document as fetched (a snapshot), not a live value. Polling via while doc.is_processing: should use doc.wait_for_completion() (unchanged — it re-fetches) or the new doc.refresh().

Tests

  • SDK: test_document_status.py incl. a call-counting regression guard asserting is_* make zero client calls (the gap that let this bug through).
  • Server: test_document_projection.pystatus resolves to a JSON-path (not the full blob), slim system_metadata reassembly, app-layer projection keeps it.
  • Full core unit suite green; live e2e confirms 0 extra calls. wait_for_completion verified unaffected.

🤖 Generated with Claude Code

…calls

Document.status / is_processing / is_ingested / is_failed / error now read the status
already carried on the document (system_metadata) instead of calling get_document_status on
every access, which caused an N+1 when iterating documents. status now also returns `as_of`
and `source` ("local"/"not_loaded"); when status was not fetched (e.g. projected away),
is_* return False and make no network call.

- Add Document.refresh() for an explicit live re-fetch; docstrings point to refresh()/
  wait_for_completion() for the current status.
- Add a cheap `status` projection: list_documents(fields=[..., "status"]) reads
  system_metadata->>'status' via JSON-path (no document text), so is_* resolve locally.
- Bump SDK to 1.2.6; add SDK + server projection tests incl. a call-counting regression guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Adityav369 Adityav369 merged commit 0a3ec31 into main Jun 19, 2026
9 checks passed
@Adityav369 Adityav369 deleted the adi/document-status-snapshot branch June 19, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant