Skip to content

Stop large list pages from running an unbounded total-count query#1233

Open
Copilot wants to merge 4 commits into
mainfrom
copilot/add-query-param-with-counts
Open

Stop large list pages from running an unbounded total-count query#1233
Copilot wants to merge 4 commits into
mainfrom
copilot/add-query-param-with-counts

Conversation

Copilot AI commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Every paginated list endpoint runs a COUNT(*) over the filtered result set on each request to populate the count field. On large, densely-filtered tables that count can dominate the request: the page of rows is fast and indexed, but counting every matching row is not. This PR makes that cost predictable. The count stays exact up to a threshold (default 10,000); beyond it the endpoint returns the threshold as a lower bound flagged as inexact, so the UI can render "10,000+" instead of the server scanning the whole table. Callers that don't need a total at all can pass ?with_counts=false to skip the count query entirely.

The default response stays backward-compatible: small result sets return the same exact integer count they always did. The only addition is a count_is_exact field alongside count.

This is one lever in the wider list-performance effort. It bounds the worst case of the count query; it is not a speedup for every endpoint (see "What we still need to verify").

List of Changes

# Change (effect) How (implementation)
1 Large list pages no longer run an unbounded COUNT(*); the count's cost is bounded to roughly the threshold regardless of table size. queryset.order_by()[:N+1].count(), which Postgres plans as SELECT COUNT(*) FROM (SELECT 1 … LIMIT N) sub. The list view's ORDER BY is stripped first so the LIMIT can short-circuit instead of forcing a top-N sort that would scan the whole set anyway.
2 Totals past the threshold are shown as a lower bound rather than dropped. The response returns count = 10000 with count_is_exact: false; the UI renders this as e.g. "10,000+". New per-request count_is_exact flag; an _OVER_CAP sentinel distinguishes "exactly N rows" from "more than N".
3 Callers can skip the total entirely with ?with_counts=false — no count query runs at all. Returns count: null and count_is_exact: null.
4 next / previous links stay correct when there is no reliable total. When the count is inexact or skipped, the paginator fetches one extra row beyond the page and uses its presence to decide the next link, instead of deriving links from count.
5 API consumers can tell exact counts from capped ones. count_is_exact (true / false / null) documented in the OpenAPI schema; count marked nullable.

Behavior and compatibility:

  • Default behavior is unchanged for normal-sized result sets: an exact integer count with count_is_exact: true.
  • The threshold constant is named COUNT_PRECISION_THRESHOLD (default 10,000) to reflect that it bounds count precision, not merely "large querysets".
  • null is reserved for the explicit with_counts=false opt-out; over-the-cap responses return the capped lower bound, not null.

All changes are in ami/base/pagination.py (LimitOffsetPaginationWithPermissions, the project-wide default paginator). Tests in ami/main/tests.py::TestPaginationWithCounts.

What we still need to verify

  • The performance numbers so far are single-run query plans on a dev environment loaded with a production data snapshot, not production APM. Rough magnitudes: on a dense filter the count dropped about 27×; on a selective filter there was no improvement. This is expected — the cap bounds the rows returned by the count subquery, not the rows the planner must scan to find matches, so it helps dense result sets and does little for selective ones. Confirm against production traffic before relying on it.
  • Confirm django-cachalot still caches the capped-count subquery on the default path.
  • Spot-check a paginated UI page against this branch: count behavior should match main for normal-sized result sets.

Frontend follow-up (not in this PR)

  • The UI must tolerate count: null and render the lower bound when count_is_exact: false (e.g. "10,000+"). Until that lands, the server default stays with_counts=true so existing components keep working unchanged.
  • Optionally add a per-view override so small, cheap endpoints (projects, pipelines, processing services) can keep returning exact counts by default even if the global default is later flipped.

Test plan

  • ami/main/tests.py::TestPaginationWithCounts (8 tests): exact count below the cap, exact at the boundary, capped + inexact over the cap, with_counts=false null opt-out, and next/previous on first/middle/last pages. 8/8 pass.
  • Production latency comparison, before vs after, on a large densely-filtered list endpoint.

Update — query cleanup + minimal UI (after review + measurement)

Two follow-ups were added after a structural review and a measurement pass on a dev environment loaded with a production-sized data snapshot (one project with ~2.9M captures).

Query correctness (ami/base/pagination.py, ami/main/api/views.py): the capped count is now computed over a stripped queryset (ordering removed, projection narrowed to the primary key) via a _count_queryset seam. An unsliced COUNT(*) already drops the correlated-subquery annotations the list orderings add (e.g. last_processed on captures), but the LIMIT used for the cap would otherwise re-project them. This also makes the old per-view ProjectPagination.get_count override redundant, so it was removed and folded into the base paginator.

Minimal UI handling of the capped count: the four high-volume list views (captures, occurrences, species, sessions) now surface count_is_exact and render a capped total as e.g. "10000+" in the pagination info label. totalIsExact is an optional prop defaulting to true, so other lists are unchanged. The numbered page buttons still derive from the capped total, so pages beyond the cap are not reachable from the bar until these lists move to cursor pagination (tracked separately) — a deliberate, documented limitation.

Measurements (single-run, dev bench with a production snapshot — not production APM)

before (no cap) with cap
count query, one ~2.9M-row project (EXPLAIN ANALYZE) ~1140 ms ~5 ms (~175×)
full captures list endpoint, cold cache ~2.0–2.5 s ~1.0–1.5 s

Caveats worth knowing before relying on this:

  • The win is on the count. A follow-up measurement showed the rest of the cold endpoint time is dominated by per-object permission checks during serialization (and, on the dev bench, by debug instrumentation) rather than data fetching — so it is untouched here and tracked as separate work.
  • Counts are cached (django-cachalot); warm counts are already fast. The cap helps the cold / cache-miss path, which dominates while a project is actively ingesting (writes invalidate the count cache). On a quiet table the cap rarely triggers.
  • The cap helps dense result sets (it bounds rows returned); it does little for selective filters (it does not bound rows scanned to find matches).

@netlify

netlify Bot commented Apr 15, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit 9516b77
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a3e0fc8eaa57c00081e8911
😎 Deploy Preview https://deploy-preview-1233--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 57 (🔴 down 8 from production)
Accessibility: 81 (🔴 down 8 from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented Apr 15, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-ssec ready!

Name Link
🔨 Latest commit 9516b77
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a3e0fc8af649a00088e8ffb
😎 Deploy Preview https://deploy-preview-1233--antenna-ssec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI requested a review from mihow April 15, 2026 02:04
@mihow mihow changed the title feat: capped COUNT(*) safety valve for with_counts=true requests feat: speed up list views by deferring big counts Apr 15, 2026
@mihow mihow changed the title feat: speed up list views by deferring big counts feat: opt-out with_counts param + capped COUNT(*) for paginated list endpoints Apr 17, 2026
@mihow

mihow commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Claude says: Pushed a small fix to this branch in 50379194 and wanted to share the measurement behind it, plus one scoping note. This came out of a benchmarking pass on a production-snapshot dataset for the #928 slow-counts work.

Fix: strip the ordering from the capped count subquery. _get_capped_count sliced the queryset (queryset[:N].count()) but kept the list view's ORDER BY. When that ordering isn't served by an index, Postgres top-N sorts the entire filtered set before the LIMIT can stop it — which defeats the early exit the cap is supposed to give. On a dense filter with a non-index-served order we measured the capped count running slower than the plain uncapped one:

time
uncapped COUNT(*) 75 ms
capped, ordering kept (before this fix) 110 ms
capped, ordering dropped (after) 17 ms

The order is irrelevant to a count, so queryset.order_by()[:N].count() is a pure win. I also corrected the docstring, which claimed the slice was "always O(N)" — that's only true once the ordering is removed.

Where the cap shines (and where it doesn't): on a genuinely large dense result set the cap is dramatic — an unbounded source-image COUNT(*) over ~23M rows went from ~11 s to ~423 ms, and under concurrent load (8 clients) the p99 dropped from ~28 s to ~1.2 s. That's the case this PR is built for, and it's a big win.

The one thing to keep in mind for how we describe/use it: the cap bounds output rows, not input scanned, so it does nothing for a sparse/selective filter — e.g. a filter matching only ~0.7% of rows measured 64 ms uncapped vs 62 ms capped, because the LIMIT 10001 never trips. Those selective-filter counts need a covering index, not the cap. So this PR is the right fix for unbounded counts on large lists; it just isn't the general answer to "slow filtered counts," which is worth being explicit about so we don't over-claim its reach. No change needed here for that — just flagging it for the epic framing.

@mihow

mihow commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Claude says: Pushed 1a3b108e reframing the cap as a precision cap, which removes almost all of the frontend work this PR would otherwise need — at no backend cost.

What changed: previously, when a result set exceeded the threshold the response returned count: null. The UI's data?.count ?? 0 then collapses to 0 and pagination gets disabled — so the biggest projects (exactly the ones we're optimizing for) silently lost their pager. That made a frontend change a hard prerequisite for flipping this on.

Now, over the threshold the response returns the threshold as a lower-bound count plus a flag:

{ "count": 10000, "count_is_exact": false, "next": "...", "previous": "..." }

So the existing UI shows "10,000" with a working pager today, and the only follow-up frontend change is cosmetic: append a "+" when count_is_exact === false to render "10,000+". No null-count handling required.

Details:

  • Renamed LARGE_QUERYSET_THRESHOLDCOUNT_PRECISION_THRESHOLD.
  • count: null is now reserved for the explicit with_counts=false opt-out (with count_is_exact: null) — a real "I don't want a count" signal, distinct from "the count is approximate."
  • Because the capped value is a lower bound, next/previous are computed from the one-extra-row probe, not from count — so paging past the threshold keeps working (setting count to the cap and trusting it for paging would dead-end navigation at offset 10,000).
  • count_is_exact is additive to the response schema (documented in get_paginated_response_schema); existing clients that ignore it are unaffected.

Backend cost is unchanged — same order_by()[:N+1].count() query (the ORDER-BY strip from the previous commit still applies); only the mapping of its result to the response differs.

ProjectPagination inherits all of this (it only overrides default_limit). Tests updated to pin the new behavior: exact below the cap, capped+count_is_exact:false above it, null only on opt-out, and a boundary case at exactly the threshold. 8/8 pass in isolation.

@mihow mihow force-pushed the copilot/add-query-param-with-counts branch from 1a3b108 to c7b8e0e Compare June 26, 2026 01:29
@mihow mihow changed the title feat: opt-out with_counts param + capped COUNT(*) for paginated list endpoints Stop large list pages from running an unbounded total-count query Jun 26, 2026
@mihow mihow marked this pull request as ready for review June 26, 2026 01:29
Copilot AI review requested due to automatic review settings June 26, 2026 01:29
Every paginated list endpoint runs a COUNT(*) over the filtered result
set to populate `count`. On large, densely-filtered tables that count can
dominate the request even when the page query itself is fast. This bounds
the worst case.

- Counts stay exact up to COUNT_PRECISION_THRESHOLD (default 10,000).
  Beyond it the response returns the threshold as a lower bound with
  `count_is_exact: false`, which the UI renders as e.g. "10,000+", instead
  of scanning the whole table.
- The capped count strips the queryset's ORDER BY first so the LIMIT can
  short-circuit instead of forcing a top-N sort that would scan the whole
  set anyway: `SELECT COUNT(*) FROM (SELECT 1 ... LIMIT N) sub`.
- Callers can skip the total entirely with `?with_counts=false`, which
  returns `count: null` and runs no count query.
- `next`/`previous` fall back to a one-extra-row probe whenever the count
  is inexact or skipped, preserving the pagination contract.

Default behavior is unchanged for normal-sized result sets: an exact
integer count with `count_is_exact: true`. New `count_is_exact` field
documented in the OpenAPI schema.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mihow mihow force-pushed the copilot/add-query-param-with-counts branch from c7b8e0e to 942bfa3 Compare June 26, 2026 01:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the project-wide DRF paginator to avoid unbounded COUNT(*) queries on large, densely-filtered list endpoints by capping count precision (exact up to a threshold, then returning the threshold as a lower bound) and by allowing callers to opt out of counts entirely via ?with_counts=false. It also adds a count_is_exact field to help API consumers distinguish exact vs capped vs skipped totals, and introduces tests to validate the new behaviors.

Changes:

  • Implement capped counting and with_counts=false opt-out in LimitOffsetPaginationWithPermissions, plus probe-based next/previous logic when totals are inexact/skipped.
  • Extend the paginated response shape with count_is_exact and mark count nullable in the response schema.
  • Add API tests covering exact counts, capped/inexact counts, opt-out behavior, and navigation links.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ami/base/pagination.py Adds capped-count + opt-out logic to the default paginator and extends response/schema with count_is_exact.
ami/main/tests.py Adds TestPaginationWithCounts to validate exact/capped/skipped counts and next/previous behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/base/pagination.py
Comment on lines +61 to +66
capped = self._get_capped_count(queryset)
if capped is self._OVER_CAP:
# Over the precision cap: report the threshold as an approximate
# lower bound. It must not drive next/previous (the true total is
# higher), so fall back to the probe-based links.
self.count = self.COUNT_PRECISION_THRESHOLD
Comment thread ami/base/pagination.py
Comment on lines +122 to +136
def get_paginated_response_schema(self, schema):
paginated_schema = super().get_paginated_response_schema(schema)
# count is the exact total, the precision cap (a lower bound), or null
# when the caller passed with_counts=false.
paginated_schema["properties"]["count"]["nullable"] = True
paginated_schema["properties"]["count_is_exact"] = {
"type": "boolean",
"nullable": True,
"description": (
"True when `count` is exact; false when it is the precision cap "
'(a lower bound, render as e.g. "10,000+"); null when the count '
"was skipped via with_counts=false."
),
}
return paginated_schema
mihow and others added 3 commits June 25, 2026 22:23
…on.get_count

The capped count is computed over a stripped queryset (ordering removed,
projection narrowed to the primary key) via a new `_count_queryset` seam.
An unsliced COUNT(*) already drops the correlated-subquery annotations the
list orderings add (e.g. `last_processed` on captures), but the LIMIT used
for the precision cap would otherwise re-project them and run the subquery
per scanned row. Counting `values("pk")` keeps the COUNT over a bare
primary-key scan.

This also makes `ProjectPagination.get_count` redundant: the per-view
override existed only to strip those annotations before counting, which the
base paginator now does for every endpoint. Removed it.

Verified on a database snapshot (~2.88M captures in one project): the count
query stays ~5 ms whether or not the annotations are present, and EXPLAIN
confirms the detection subquery is not scanned.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When the API caps the total count (count_is_exact: false on large result
sets), surface it through the high-volume list hooks (captures, occurrences,
species, sessions) and render the total as e.g. "10000+" in the pagination
info label. The numbered page buttons still derive from the capped total, so
pages beyond the cap are not reachable from the bar until these lists move to
cursor pagination; this change keeps the label honest in the meantime.

totalIsExact is an optional prop defaulting to true, so list views not wired
to it (small tables that never reach the cap) are unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants