Stop large list pages from running an unbounded total-count query by Copilot · Pull Request #1233 · RolnickLab/antenna

Copilot · 2026-04-15T02:01:49Z

Summary

Every paginated list endpoint runs a COUNT(*) over the filtered result set on each request to populate the count field. On large, densely-filtered tables that count can dominate the request: the page of rows is fast and indexed, but counting every matching row is not. This PR makes that cost predictable. The count stays exact up to a threshold (default 10,000); beyond it the endpoint returns the threshold as a lower bound flagged as inexact, so the UI can render "10,000+" instead of the server scanning the whole table. Callers that don't need a total at all can pass ?with_counts=false to skip the count query entirely.

The default response stays backward-compatible: small result sets return the same exact integer count they always did. The only addition is a count_is_exact field alongside count.

This is one lever in the wider list-performance effort. It bounds the worst case of the count query; it is not a speedup for every endpoint (see "What we still need to verify").

List of Changes

#	Change (effect)	How (implementation)
1	Large list pages no longer run an unbounded `COUNT(*)`; the count's cost is bounded to roughly the threshold regardless of table size.	`queryset.order_by()[:N+1].count()`, which Postgres plans as `SELECT COUNT(*) FROM (SELECT 1 … LIMIT N) sub`. The list view's `ORDER BY` is stripped first so the `LIMIT` can short-circuit instead of forcing a top-N sort that would scan the whole set anyway.
2	Totals past the threshold are shown as a lower bound rather than dropped. The response returns `count = 10000` with `count_is_exact: false`; the UI renders this as e.g. "10,000+".	New per-request `count_is_exact` flag; an `_OVER_CAP` sentinel distinguishes "exactly N rows" from "more than N".
3	Callers can skip the total entirely with `?with_counts=false` — no count query runs at all.	Returns `count: null` and `count_is_exact: null`.
4	`next` / `previous` links stay correct when there is no reliable total.	When the count is inexact or skipped, the paginator fetches one extra row beyond the page and uses its presence to decide the `next` link, instead of deriving links from `count`.
5	API consumers can tell exact counts from capped ones.	`count_is_exact` (`true` / `false` / `null`) documented in the OpenAPI schema; `count` marked nullable.

Behavior and compatibility:

Default behavior is unchanged for normal-sized result sets: an exact integer count with count_is_exact: true.
The threshold constant is named COUNT_PRECISION_THRESHOLD (default 10,000) to reflect that it bounds count precision, not merely "large querysets".
null is reserved for the explicit with_counts=false opt-out; over-the-cap responses return the capped lower bound, not null.

All changes are in ami/base/pagination.py (LimitOffsetPaginationWithPermissions, the project-wide default paginator). Tests in ami/main/tests.py::TestPaginationWithCounts.

What we still need to verify

The performance numbers so far are single-run query plans on a dev environment loaded with a production data snapshot, not production APM. Rough magnitudes: on a dense filter the count dropped about 27×; on a selective filter there was no improvement. This is expected — the cap bounds the rows returned by the count subquery, not the rows the planner must scan to find matches, so it helps dense result sets and does little for selective ones. Confirm against production traffic before relying on it.
Confirm django-cachalot still caches the capped-count subquery on the default path.
Spot-check a paginated UI page against this branch: count behavior should match main for normal-sized result sets.

Frontend follow-up (not in this PR)

The UI must tolerate count: null and render the lower bound when count_is_exact: false (e.g. "10,000+"). Until that lands, the server default stays with_counts=true so existing components keep working unchanged.
Optionally add a per-view override so small, cheap endpoints (projects, pipelines, processing services) can keep returning exact counts by default even if the global default is later flipped.

Test plan

ami/main/tests.py::TestPaginationWithCounts (8 tests): exact count below the cap, exact at the boundary, capped + inexact over the cap, with_counts=false null opt-out, and next/previous on first/middle/last pages. 8/8 pass.
Production latency comparison, before vs after, on a large densely-filtered list endpoint.

Update — query cleanup + minimal UI (after review + measurement)

Two follow-ups were added after a structural review and a measurement pass on a dev environment loaded with a production-sized data snapshot (one project with ~2.9M captures).

Query correctness (ami/base/pagination.py, ami/main/api/views.py): the capped count is now computed over a stripped queryset (ordering removed, projection narrowed to the primary key) via a _count_queryset seam. An unsliced COUNT(*) already drops the correlated-subquery annotations the list orderings add (e.g. last_processed on captures), but the LIMIT used for the cap would otherwise re-project them. This also makes the old per-view ProjectPagination.get_count override redundant, so it was removed and folded into the base paginator.

Minimal UI handling of the capped count: the four high-volume list views (captures, occurrences, species, sessions) now surface count_is_exact and render a capped total as e.g. "10000+" in the pagination info label. totalIsExact is an optional prop defaulting to true, so other lists are unchanged. The numbered page buttons still derive from the capped total, so pages beyond the cap are not reachable from the bar until these lists move to cursor pagination (tracked separately) — a deliberate, documented limitation.

Measurements (single-run, dev bench with a production snapshot — not production APM)

	before (no cap)	with cap
count query, one ~2.9M-row project (`EXPLAIN ANALYZE`)	~1140 ms	~5 ms (~175×)
full captures list endpoint, cold cache	~2.0–2.5 s	~1.0–1.5 s

Caveats worth knowing before relying on this:

The win is on the count. A follow-up measurement showed the rest of the cold endpoint time is dominated by per-object permission checks during serialization (and, on the dev bench, by debug instrumentation) rather than data fetching — so it is untouched here and tracked as separate work.
Counts are cached (django-cachalot); warm counts are already fast. The cap helps the cold / cache-miss path, which dominates while a project is actively ingesting (writes invalidate the count cache). On a quiet table the cap rarely triggers.
The cap helps dense result sets (it bounds rows returned); it does little for selective filters (it does not bound rows scanned to find matches).

netlify · 2026-04-15T02:01:54Z

✅ Deploy Preview for antenna-preview ready!

Name	Link
🔨 Latest commit	`9516b77`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/6a3e0fc8eaa57c00081e8911
😎 Deploy Preview	https://deploy-preview-1233--antenna-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 57 (🔴 down 8 from production) Accessibility: 81 (🔴 down 8 from production) Best Practices: 92 (🔴 down 8 from production) SEO: 92 (no change from production) PWA: 80 (no change from production) View the detailed breakdown and full score reports
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2026-04-15T02:01:55Z

✅ Deploy Preview for antenna-ssec ready!

Name	Link
🔨 Latest commit	`9516b77`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/6a3e0fc8af649a00088e8ffb
😎 Deploy Preview	https://deploy-preview-1233--antenna-ssec.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

mihow · 2026-06-25T20:47:22Z

Claude says: Pushed a small fix to this branch in 50379194 and wanted to share the measurement behind it, plus one scoping note. This came out of a benchmarking pass on a production-snapshot dataset for the #928 slow-counts work.

Fix: strip the ordering from the capped count subquery. _get_capped_count sliced the queryset (queryset[:N].count()) but kept the list view's ORDER BY. When that ordering isn't served by an index, Postgres top-N sorts the entire filtered set before the LIMIT can stop it — which defeats the early exit the cap is supposed to give. On a dense filter with a non-index-served order we measured the capped count running slower than the plain uncapped one:

	time
uncapped `COUNT(*)`	75 ms
capped, ordering kept (before this fix)	110 ms
capped, ordering dropped (after)	17 ms

The order is irrelevant to a count, so queryset.order_by()[:N].count() is a pure win. I also corrected the docstring, which claimed the slice was "always O(N)" — that's only true once the ordering is removed.

Where the cap shines (and where it doesn't): on a genuinely large dense result set the cap is dramatic — an unbounded source-image COUNT(*) over ~23M rows went from ~11 s to ~423 ms, and under concurrent load (8 clients) the p99 dropped from ~28 s to ~1.2 s. That's the case this PR is built for, and it's a big win.

The one thing to keep in mind for how we describe/use it: the cap bounds output rows, not input scanned, so it does nothing for a sparse/selective filter — e.g. a filter matching only ~0.7% of rows measured 64 ms uncapped vs 62 ms capped, because the LIMIT 10001 never trips. Those selective-filter counts need a covering index, not the cap. So this PR is the right fix for unbounded counts on large lists; it just isn't the general answer to "slow filtered counts," which is worth being explicit about so we don't over-claim its reach. No change needed here for that — just flagging it for the epic framing.

mihow · 2026-06-25T21:46:00Z

Claude says: Pushed 1a3b108e reframing the cap as a precision cap, which removes almost all of the frontend work this PR would otherwise need — at no backend cost.

What changed: previously, when a result set exceeded the threshold the response returned count: null. The UI's data?.count ?? 0 then collapses to 0 and pagination gets disabled — so the biggest projects (exactly the ones we're optimizing for) silently lost their pager. That made a frontend change a hard prerequisite for flipping this on.

Now, over the threshold the response returns the threshold as a lower-bound count plus a flag:

{ "count": 10000, "count_is_exact": false, "next": "...", "previous": "..." }

So the existing UI shows "10,000" with a working pager today, and the only follow-up frontend change is cosmetic: append a "+" when count_is_exact === false to render "10,000+". No null-count handling required.

Details:

Renamed LARGE_QUERYSET_THRESHOLD → COUNT_PRECISION_THRESHOLD.
count: null is now reserved for the explicit with_counts=false opt-out (with count_is_exact: null) — a real "I don't want a count" signal, distinct from "the count is approximate."
Because the capped value is a lower bound, next/previous are computed from the one-extra-row probe, not from count — so paging past the threshold keeps working (setting count to the cap and trusting it for paging would dead-end navigation at offset 10,000).
count_is_exact is additive to the response schema (documented in get_paginated_response_schema); existing clients that ignore it are unaffected.

Backend cost is unchanged — same order_by()[:N+1].count() query (the ORDER-BY strip from the previous commit still applies); only the mapping of its result to the response differs.

ProjectPagination inherits all of this (it only overrides default_limit). Tests updated to pin the new behavior: exact below the cap, capped+count_is_exact:false above it, null only on opt-out, and a boundary case at exactly the threshold. 8/8 pass in isolation.

Every paginated list endpoint runs a COUNT(*) over the filtered result set to populate `count`. On large, densely-filtered tables that count can dominate the request even when the page query itself is fast. This bounds the worst case. - Counts stay exact up to COUNT_PRECISION_THRESHOLD (default 10,000). Beyond it the response returns the threshold as a lower bound with `count_is_exact: false`, which the UI renders as e.g. "10,000+", instead of scanning the whole table. - The capped count strips the queryset's ORDER BY first so the LIMIT can short-circuit instead of forcing a top-N sort that would scan the whole set anyway: `SELECT COUNT(*) FROM (SELECT 1 ... LIMIT N) sub`. - Callers can skip the total entirely with `?with_counts=false`, which returns `count: null` and runs no count query. - `next`/`previous` fall back to a one-extra-row probe whenever the count is inexact or skipped, preserving the pagination contract. Default behavior is unchanged for normal-sized result sets: an exact integer count with `count_is_exact: true`. New `count_is_exact` field documented in the OpenAPI schema. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates the project-wide DRF paginator to avoid unbounded COUNT(*) queries on large, densely-filtered list endpoints by capping count precision (exact up to a threshold, then returning the threshold as a lower bound) and by allowing callers to opt out of counts entirely via ?with_counts=false. It also adds a count_is_exact field to help API consumers distinguish exact vs capped vs skipped totals, and introduces tests to validate the new behaviors.

Changes:

Implement capped counting and with_counts=false opt-out in LimitOffsetPaginationWithPermissions, plus probe-based next/previous logic when totals are inexact/skipped.
Extend the paginated response shape with count_is_exact and mark count nullable in the response schema.
Add API tests covering exact counts, capped/inexact counts, opt-out behavior, and navigation links.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`ami/base/pagination.py`	Adds capped-count + opt-out logic to the default paginator and extends response/schema with `count_is_exact`.
`ami/main/tests.py`	Adds `TestPaginationWithCounts` to validate exact/capped/skipped counts and `next`/`previous` behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        capped = self._get_capped_count(queryset)
+        if capped is self._OVER_CAP:
+            # Over the precision cap: report the threshold as an approximate
+            # lower bound. It must not drive next/previous (the true total is
+            # higher), so fall back to the probe-based links.
+            self.count = self.COUNT_PRECISION_THRESHOLD


+    def get_paginated_response_schema(self, schema):
+        paginated_schema = super().get_paginated_response_schema(schema)
+        # count is the exact total, the precision cap (a lower bound), or null
+        # when the caller passed with_counts=false.
+        paginated_schema["properties"]["count"]["nullable"] = True
+        paginated_schema["properties"]["count_is_exact"] = {
+            "type": "boolean",
+            "nullable": True,
+            "description": (
+                "True when `count` is exact; false when it is the precision cap "
+                '(a lower bound, render as e.g. "10,000+"); null when the count '
+                "was skipped via with_counts=false."
+            ),
+        }
+        return paginated_schema


…on.get_count The capped count is computed over a stripped queryset (ordering removed, projection narrowed to the primary key) via a new `_count_queryset` seam. An unsliced COUNT(*) already drops the correlated-subquery annotations the list orderings add (e.g. `last_processed` on captures), but the LIMIT used for the precision cap would otherwise re-project them and run the subquery per scanned row. Counting `values("pk")` keeps the COUNT over a bare primary-key scan. This also makes `ProjectPagination.get_count` redundant: the per-view override existed only to strip those annotations before counting, which the base paginator now does for every endpoint. Removed it. Verified on a database snapshot (~2.88M captures in one project): the count query stays ~5 ms whether or not the annotations are present, and EXPLAIN confirms the detection subquery is not scanned. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

When the API caps the total count (count_is_exact: false on large result sets), surface it through the high-volume list hooks (captures, occurrences, species, sessions) and render the total as e.g. "10000+" in the pagination info label. The numbered page buttons still derive from the capped total, so pages beyond the cap are not reachable from the bar until these lists move to cursor pagination; this change keeps the label honest in the meantime. totalIsExact is an optional prop defaulting to true, so list views not wired to it (small tables that never reach the cap) are unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI assigned Copilot and mihow Apr 15, 2026

Copilot created this pull request from a session on behalf of mihow April 15, 2026 02:01 View session

Copilot started work on behalf of mihow April 15, 2026 02:03 View session

Copilot finished work on behalf of mihow April 15, 2026 02:04

Copilot AI requested a review from mihow April 15, 2026 02:04

mihow changed the title ~~feat: capped COUNT(*) safety valve for with_counts=true requests~~ feat: speed up list views by deferring big counts Apr 15, 2026

mihow changed the title ~~feat: speed up list views by deferring big counts~~ feat: opt-out with_counts param + capped COUNT(*) for paginated list endpoints Apr 17, 2026

mihow mentioned this pull request Jun 25, 2026

Make large lists fast to browse and stable while new data streams in #1352

Open

7 tasks

mihow force-pushed the copilot/add-query-param-with-counts branch from 1a3b108 to c7b8e0e Compare June 26, 2026 01:29

mihow changed the title ~~feat: opt-out with_counts param + capped COUNT(*) for paginated list endpoints~~ Stop large list pages from running an unbounded total-count query Jun 26, 2026

mihow marked this pull request as ready for review June 26, 2026 01:29

Copilot AI review requested due to automatic review settings June 26, 2026 01:29

Copilot started reviewing on behalf of mihow June 26, 2026 01:30 View session

mihow force-pushed the copilot/add-query-param-with-counts branch from c7b8e0e to 942bfa3 Compare June 26, 2026 01:32

Copilot AI reviewed Jun 26, 2026

View reviewed changes

mihow and others added 3 commits June 25, 2026 22:23

style(ui): prettier 2.8.4 formatting for pagination N+ wiring

9516b77

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop large list pages from running an unbounded total-count query#1233

Stop large list pages from running an unbounded total-count query#1233
Copilot wants to merge 4 commits into
mainfrom
copilot/add-query-param-with-counts

Copilot AI commented Apr 15, 2026 •

edited by mihow

Loading

Uh oh!

netlify Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

mihow commented Jun 25, 2026

Uh oh!

mihow commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Apr 15, 2026 • edited by mihow Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of Changes

What we still need to verify

Frontend follow-up (not in this PR)

Test plan

Update — query cleanup + minimal UI (after review + measurement)

Measurements (single-run, dev bench with a production snapshot — not production APM)

Uh oh!

netlify Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview ready!

Uh oh!

netlify Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec ready!

Uh oh!

mihow commented Jun 25, 2026

Uh oh!

mihow commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 15, 2026 •

edited by mihow

Loading

netlify Bot commented Apr 15, 2026 •

edited

Loading

netlify Bot commented Apr 15, 2026 •

edited

Loading