From 7b14b8d3fc58d4dd1c932276f30fe30799254ead Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:00:22 -0700 Subject: [PATCH 01/13] docs: design spec for captures processed/not-processed filter Co-Authored-By: Claude --- ...-05-28-captures-processed-filter-design.md | 80 +++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 docs/claude/planning/2026-05-28-captures-processed-filter-design.md diff --git a/docs/claude/planning/2026-05-28-captures-processed-filter-design.md b/docs/claude/planning/2026-05-28-captures-processed-filter-design.md new file mode 100644 index 000000000..d6fbc5b82 --- /dev/null +++ b/docs/claude/planning/2026-05-28-captures-processed-filter-design.md @@ -0,0 +1,80 @@ +# Captures list — "Processed / Not processed" filter + +Date: 2026-05-28 +Status: design approved, pending spec review +Scope: first of several planned captures-list filters; this PR ships the processed filter only. + +## Goal + +Add a "Processing status" filter to the Captures (SourceImage) list view, letting users +narrow to captures that have been processed, not processed, or all (no filter). Lay the +groundwork (a planned filter set) for additional filters in later PRs. + +"Processed" = the image has been run through detection. Because PR #1093 writes a null +Detection marker for the "processed, found nothing" case, the presence of *any* Detection +row is an accurate signal of "was processed." + +## Backend — no change required + +The filter already exists and is exercised by the list endpoint: + +- `ami/main/api/views.py:630-636` — `SourceImageViewSet.filter_by_has_detections` + handles `?has_detections=true|false` by annotating + `Exists(Detection.objects.filter(source_image=OuterRef("pk")))` and filtering on it. + (`SourceImageViewSet` at `views.py:528`.) +- Called from `get_queryset` only for the `list` action (`views.py:600`), which is what + the captures list uses. + +Decision: reuse the existing `has_detections` query param. Zero backend change, already +tested behavior. The param name (`has_detections`) means "was processed" because of the +null-marker convention; we surface it to users with the label "Processing status" and keep +`has_detections` as the internal query key. This name/meaning gap is the one known wart and +is documented here rather than fixed (a `was_processed` alias was considered and rejected to +avoid extra surface area). + +## Frontend — four wiring changes + +1. **New component** `ui/src/components/filtering/filters/processing-status-filter.tsx`. + Model on `verification-status-filter.tsx`. Two options: "Processed" (true) / + "Not processed" (false). Wire `onValueChange={onAdd}` directly so both true and false + are settable. (The generic `BooleanFilter` is unusable here: its "No" branch calls + `onClear()` instead of filtering to false — see `boolean-filter.tsx:21-27`.) + Use a translated label string for the two options (add to `utils/language` if needed). + +2. **Register the component** in `ui/src/components/filtering/filter-control.tsx` + `ComponentMap`: `has_detections: ProcessingStatusFilter`. + +3. **Register the filter** in `ui/src/utils/useFilters.ts` `AVAILABLE_FILTERS`: + `{ label: 'Processing status', field: 'has_detections', tooltip: { text: ... } }`. + +4. **Render it** on the captures page `ui/src/pages/captures/captures.tsx` (inside the + existing `FilterSection`, alongside `deployment` and `collections`): + ``. + +State, URL params, page reset, and the clear-X ("All") behavior all come from the existing +`useFilters` machinery — no changes there. + +## Data flow + +UI select -> `addFilter('has_detections', 'true'|'false')` -> URL search param -> +`useFilters` -> `useCaptures` builds `?has_detections=...` via `getFetchUrl` +(`ui/src/data-services/utils.ts`) -> DRF `filter_by_has_detections` -> filtered queryset. +Clear-X removes the param -> "All". + +## Testing + +- Backend: verify existing coverage for `?has_detections=true|false` on the captures list + endpoint; add a test if missing (both branches + absent param). +- Frontend: manual verification against the running stack — select Processed, Not processed, + and clear; confirm result counts change and the URL param round-trips. + +## Out of scope (planned follow-up PRs) + +To live in a collapsible "Advanced" `FilterSection` on the captures page later: + +- **Date range** — `date_start`/`date_end` already in the FE registry with a `DateFilter` + component, but the SourceImage viewset needs backend support mapping them to a `timestamp` + range (new work). +- **Station** — already available via the existing `deployment` filter. +- **Site** — add `deployment__research_site` to `filterset_fields` + a Site filter component. +- **Device** — add `deployment__device` to `filterset_fields` + a Device filter component. From 9a41c7d4bd34116b7a15733b6e9abe80382916ae Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:06:08 -0700 Subject: [PATCH 02/13] docs: implementation plan for captures processed filter Co-Authored-By: Claude --- ...26-05-28-captures-processed-filter-plan.md | 299 ++++++++++++++++++ 1 file changed, 299 insertions(+) create mode 100644 docs/claude/planning/2026-05-28-captures-processed-filter-plan.md diff --git a/docs/claude/planning/2026-05-28-captures-processed-filter-plan.md b/docs/claude/planning/2026-05-28-captures-processed-filter-plan.md new file mode 100644 index 000000000..1ca53c38a --- /dev/null +++ b/docs/claude/planning/2026-05-28-captures-processed-filter-plan.md @@ -0,0 +1,299 @@ +# Captures "Processed / Not processed" Filter — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a "Processing status" filter (Processed / Not processed / All) to the Captures list view, reusing the existing `has_detections` backend query param. + +**Architecture:** Backend already filters captures by `?has_detections=true|false` via `Exists(Detection)` (null Detection markers make "has any detection" == "was processed"). No backend logic change; add a regression test only. Frontend adds a 3-state select component (modeled on the verified filter) and wires it through the existing filter registry/URL-param machinery. + +**Tech Stack:** Django 4.2 + DRF (backend), React 18 + TypeScript + nova-ui-kit Select (frontend). + +Design spec: `docs/claude/planning/2026-05-28-captures-processed-filter-design.md` + +--- + +## Task 1: Backend regression test for `has_detections` filter + +No existing test covers `?has_detections=` on the captures list. Add one to lock in the behavior we depend on. This is the only backend change. + +**Files:** +- Test: `ami/main/tests.py` (add a new `APITestCase` class near the other captures/list tests, e.g. after `TestProjectRequiredOnListEndpoints` ~line 1392) + +- [ ] **Step 1: Write the failing test** + +Add this class to `ami/main/tests.py`. The fixtures `setup_test_project`, `create_captures`, and `create_detections` are already imported / available (`create_detections` lives in `ami.tests.fixtures.main` — add it to the existing import on line 41 if not present). + +Update the import line 41 to include `create_detections`: + +```python +from ami.tests.fixtures.main import ( + create_captures, + create_detections, + create_occurrences, + create_taxa, + setup_test_project, +) +``` + +Then add the test class: + +```python +class TestCapturesProcessedFilter(APITestCase): + """ + The captures list supports ?has_detections=true|false, which the UI surfaces + as the "Processing status" filter. A capture is "processed" when it has any + Detection row (including null markers for "processed, found nothing"). + """ + + def setUp(self) -> None: + self.project, self.deployment = setup_test_project(reuse=False) + self.captures = create_captures(self.deployment, num_nights=1, images_per_night=4) + # Mark the first two captures as processed by giving them a detection. + for capture in self.captures[:2]: + create_detections(capture, bboxes=[(0.1, 0.1, 0.2, 0.2)]) + self.user = User.objects.create_user(email="proc-filter@insectai.org", is_staff=True) # type: ignore + self.client.force_authenticate(user=self.user) + self.list_url = f"/api/v2/captures/?project_id={self.project.pk}" + return super().setUp() + + def test_no_filter_returns_all_captures(self): + response = self.client.get(self.list_url) + self.assertEqual(response.status_code, status.HTTP_200_OK) + self.assertEqual(response.json()["count"], 4) + + def test_has_detections_true_returns_only_processed(self): + response = self.client.get(f"{self.list_url}&has_detections=true") + self.assertEqual(response.status_code, status.HTTP_200_OK) + self.assertEqual(response.json()["count"], 2) + + def test_has_detections_false_returns_only_unprocessed(self): + response = self.client.get(f"{self.list_url}&has_detections=false") + self.assertEqual(response.status_code, status.HTTP_200_OK) + self.assertEqual(response.json()["count"], 2) +``` + +- [ ] **Step 2: Run the test to verify it passes (it should — behavior already exists)** + +Run: +```bash +docker compose run --rm django python manage.py test ami.main.tests.TestCapturesProcessedFilter --keepdb -v 2 +``` +Expected: 3 tests PASS. (This is a characterization test for existing behavior; if `test_has_detections_false` fails returning 4 instead of 2, that means null-marker handling differs — stop and investigate before touching the UI.) + +- [ ] **Step 3: Commit** + +```bash +git add ami/main/tests.py +git commit -m "test: cover has_detections filter on captures list endpoint + +Co-Authored-By: Claude " +``` + +--- + +## Task 2: Add label strings for the processing-status filter + +**Files:** +- Modify: `ui/src/utils/language.ts` (the `STRING` enum and the string map) + +- [ ] **Step 1: Add enum keys** + +In the `STRING` enum in `ui/src/utils/language.ts`, add two keys (place them alphabetically near `PROCESSING`/`NOT_VERIFIED` neighbors — exact position is cosmetic): + +```typescript + NOT_PROCESSED, + PROCESSED, +``` + +- [ ] **Step 2: Add the string values** + +In the string-map object (where entries like `[STRING.NOT_VERIFIED]: 'Not verified',` live), add: + +```typescript + [STRING.NOT_PROCESSED]: 'Not processed', + [STRING.PROCESSED]: 'Processed', +``` + +- [ ] **Step 3: Commit** + +```bash +git add ui/src/utils/language.ts +git commit -m "feat(ui): add Processed / Not processed label strings + +Co-Authored-By: Claude " +``` + +--- + +## Task 3: Create the `ProcessingStatusFilter` component + +A 3-state select: empty (= All, cleared via the FilterControl X button), true (Processed), false (Not processed). Modeled on `verification-status-filter.tsx`. Do NOT reuse `BooleanFilter` — its "No" branch calls `onClear()` and cannot filter to false (`boolean-filter.tsx:21-27`). + +**Files:** +- Create: `ui/src/components/filtering/filters/processing-status-filter.tsx` + +- [ ] **Step 1: Write the component** + +```tsx +import { Select } from 'nova-ui-kit' +import { STRING, translate } from 'utils/language' +import { booleanToString, stringToBoolean } from '../utils' +import { FilterProps } from './types' + +export const ProcessingStatusFilter = ({ value: string, onAdd }: FilterProps) => { + const value = stringToBoolean(string) + const options = [ + { value: true, label: translate(STRING.PROCESSED) }, + { value: false, label: translate(STRING.NOT_PROCESSED) }, + ] + + return ( + + + + + + {options.map((option) => ( + + {option.label} + + ))} + + + ) +} +``` + +- [ ] **Step 2: Commit** + +```bash +git add ui/src/components/filtering/filters/processing-status-filter.tsx +git commit -m "feat(ui): add ProcessingStatusFilter select component + +Co-Authored-By: Claude " +``` + +--- + +## Task 4: Register the component and the filter field + +**Files:** +- Modify: `ui/src/components/filtering/filter-control.tsx` (import + `ComponentMap`) +- Modify: `ui/src/utils/useFilters.ts` (`AVAILABLE_FILTERS`) + +- [ ] **Step 1: Import the component in filter-control.tsx** + +Add near the other filter imports (after the `PipelineFilter` import, ~line 10): + +```typescript +import { ProcessingStatusFilter } from './filters/processing-status-filter' +``` + +- [ ] **Step 2: Register in `ComponentMap`** + +In `ui/src/components/filtering/filter-control.tsx`, add to the `ComponentMap` object (keep keys alphabetical-ish; place after `pipeline:`): + +```typescript + has_detections: ProcessingStatusFilter, +``` + +- [ ] **Step 3: Register the filter field in `AVAILABLE_FILTERS`** + +In `ui/src/utils/useFilters.ts`, add an entry to the array returned by `AVAILABLE_FILTERS` (e.g. after the `event` entry, ~line 138): + +```typescript + { + label: 'Processing status', + field: 'has_detections', + tooltip: { + text: 'Filter captures by whether they have been processed by a detection pipeline.', + }, + }, +``` + +- [ ] **Step 4: Commit** + +```bash +git add ui/src/components/filtering/filter-control.tsx ui/src/utils/useFilters.ts +git commit -m "feat(ui): register processing-status filter (has_detections) + +Co-Authored-By: Claude " +``` + +--- + +## Task 5: Render the filter on the captures page + +**Files:** +- Modify: `ui/src/pages/captures/captures.tsx` (the `FilterSection`, ~lines 65-68) + +- [ ] **Step 1: Add the FilterControl** + +In `ui/src/pages/captures/captures.tsx`, inside the existing ``, add the new control below `collections`: + +```tsx + + + + + +``` + +- [ ] **Step 2: Commit** + +```bash +git add ui/src/pages/captures/captures.tsx +git commit -m "feat(ui): show processing-status filter on captures list + +Co-Authored-By: Claude " +``` + +--- + +## Task 6: Type-check, lint, and manual verification + +**Files:** none (verification only) + +- [ ] **Step 1: TypeScript type check** + +Run: +```bash +cd ui && yarn tsc --noEmit +``` +Expected: no errors. (If `yarn tsc` is not a script, use `npx tsc --noEmit`.) + +- [ ] **Step 2: Lint + format the touched files** + +Run: +```bash +cd ui && yarn lint && yarn format +``` +Expected: clean (or auto-fixed). Re-commit if format changed anything. + +- [ ] **Step 3: Manual verification against the running stack** + +Start the stack (`docker compose up -d` from repo root; for worktree code-only changes use the bind-mount Option A in CLAUDE.md if testing against the main stack). Then in the UI at `http://localhost:4000`, open a project's Captures list and: + - Select **Processed** → URL gains `?has_detections=true`, result count drops to processed captures only. + - Select **Not processed** → `has_detections=false`, count shows unprocessed only. + - Click the **X** clear button → param removed, all captures return. + - Confirm the page resets to page 1 when the filter changes (handled by `useFilters.addFilter`). + +- [ ] **Step 4: Final commit (only if lint/format changed files)** + +```bash +git add -A +git commit -m "chore(ui): lint/format for processing-status filter + +Co-Authored-By: Claude " +``` + +--- + +## Self-Review notes + +- **Spec coverage:** backend reuse (Task 1 verifies), new component (Task 3), registry wiring (Task 4), page render (Task 5), label strings (Task 2), testing (Tasks 1 + 6). All spec sections covered. +- **Type consistency:** component named `ProcessingStatusFilter` in Tasks 3 and 4; query field `has_detections` in Tasks 1, 4, 5; STRING keys `PROCESSED` / `NOT_PROCESSED` in Tasks 2 and 3. +- **Out of scope (later PRs):** date range, site, device filters — see design doc. From d18392f147ac2cc26794c02d967bf717d0069fed Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:11:52 -0700 Subject: [PATCH 03/13] test: cover has_detections filter on captures list endpoint Co-Authored-By: Claude --- ami/main/tests.py | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/ami/main/tests.py b/ami/main/tests.py index 58277d186..7aa79cae5 100644 --- a/ami/main/tests.py +++ b/ami/main/tests.py @@ -38,7 +38,13 @@ from ami.ml.models.pipeline import Pipeline from ami.ml.models.processing_service import ProcessingService from ami.ml.models.project_pipeline_config import ProjectPipelineConfig -from ami.tests.fixtures.main import create_captures, create_occurrences, create_taxa, setup_test_project +from ami.tests.fixtures.main import ( + create_captures, + create_detections, + create_occurrences, + create_taxa, + setup_test_project, +) from ami.tests.fixtures.storage import populate_bucket from ami.users.models import User from ami.users.roles import BasicMember, Identifier, MLDataManager, ProjectManager, create_roles_for_project @@ -1390,6 +1396,40 @@ def test_unrelated_list_endpoints_still_work_without_project_id(self): self.assertEqual(response.status_code, status.HTTP_200_OK, path) +class TestCapturesProcessedFilter(APITestCase): + """ + The captures list supports ?has_detections=true|false, which the UI surfaces + as the "Processing status" filter. A capture is "processed" when it has any + Detection row (including null markers for "processed, found nothing"). + """ + + def setUp(self) -> None: + self.project, self.deployment = setup_test_project(reuse=False) + self.captures = create_captures(self.deployment, num_nights=1, images_per_night=4) + # Mark the first two captures as processed by giving them a detection. + for capture in self.captures[:2]: + create_detections(capture, bboxes=[(0.1, 0.1, 0.2, 0.2)]) + self.user = User.objects.create_user(email="proc-filter@insectai.org", is_staff=True) # type: ignore + self.client.force_authenticate(user=self.user) + self.list_url = f"/api/v2/captures/?project_id={self.project.pk}" + return super().setUp() + + def test_no_filter_returns_all_captures(self): + response = self.client.get(self.list_url) + self.assertEqual(response.status_code, status.HTTP_200_OK) + self.assertEqual(response.json()["count"], 4) + + def test_has_detections_true_returns_only_processed(self): + response = self.client.get(f"{self.list_url}&has_detections=true") + self.assertEqual(response.status_code, status.HTTP_200_OK) + self.assertEqual(response.json()["count"], 2) + + def test_has_detections_false_returns_only_unprocessed(self): + response = self.client.get(f"{self.list_url}&has_detections=false") + self.assertEqual(response.status_code, status.HTTP_200_OK) + self.assertEqual(response.json()["count"], 2) + + class TestProjectOwnerAutoAssignment(APITestCase): def setUp(self) -> None: self.user_1 = User.objects.create_user(email="testuser@insectai.org", is_staff=True, is_superuser=True) From 98e259b295d42ef84176724414f0751aeb10c001 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:13:34 -0700 Subject: [PATCH 04/13] feat(ui): add Processed / Not processed label strings Co-Authored-By: Claude --- ui/src/utils/language.ts | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/ui/src/utils/language.ts b/ui/src/utils/language.ts index e432a04d7..9a26a9a62 100644 --- a/ui/src/utils/language.ts +++ b/ui/src/utils/language.ts @@ -305,10 +305,12 @@ export enum STRING { MOST_OBSERVED_TAXA, NEW_ID, NOT_CONNECTED, + NOT_PROCESSED, NOT_VERIFIED, OR, OVERVIEW, PIPELINES, + PROCESSED, RECENT, REJECT_ID_SHORT, REJECT_ID, @@ -689,10 +691,12 @@ const ENGLISH_STRINGS: { [key in STRING]: string } = { [STRING.MOST_OBSERVED_TAXA]: 'Most observed taxa', [STRING.NEW_ID]: 'New ID', [STRING.NOT_CONNECTED]: 'Not connected', + [STRING.NOT_PROCESSED]: 'Not processed', [STRING.NOT_VERIFIED]: 'Not verified', [STRING.OR]: 'Or', [STRING.OVERVIEW]: 'Overview', [STRING.PIPELINES]: 'Pipelines', + [STRING.PROCESSED]: 'Processed', [STRING.RECENT]: 'Recent', [STRING.REJECT_ID_SHORT]: 'Reject', [STRING.REJECT_ID]: 'Reject ID', From 741daa8090303bba50acaae79fe035dd66c54da3 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:13:46 -0700 Subject: [PATCH 05/13] feat(ui): add ProcessingStatusFilter select component Co-Authored-By: Claude --- .../filters/processing-status-filter.tsx | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 ui/src/components/filtering/filters/processing-status-filter.tsx diff --git a/ui/src/components/filtering/filters/processing-status-filter.tsx b/ui/src/components/filtering/filters/processing-status-filter.tsx new file mode 100644 index 000000000..f76657670 --- /dev/null +++ b/ui/src/components/filtering/filters/processing-status-filter.tsx @@ -0,0 +1,30 @@ +import { Select } from 'nova-ui-kit' +import { STRING, translate } from 'utils/language' +import { booleanToString, stringToBoolean } from '../utils' +import { FilterProps } from './types' + +export const ProcessingStatusFilter = ({ value: string, onAdd }: FilterProps) => { + const value = stringToBoolean(string) + const options = [ + { value: true, label: translate(STRING.PROCESSED) }, + { value: false, label: translate(STRING.NOT_PROCESSED) }, + ] + + return ( + + + + + + {options.map((option) => ( + + {option.label} + + ))} + + + ) +} From 0d855741dcd0889a7d8b2c8012ba15c10b33b22f Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:14:14 -0700 Subject: [PATCH 06/13] feat(ui): register processing-status filter (has_detections) Co-Authored-By: Claude --- ui/src/components/filtering/filter-control.tsx | 2 ++ ui/src/utils/useFilters.ts | 7 +++++++ 2 files changed, 9 insertions(+) diff --git a/ui/src/components/filtering/filter-control.tsx b/ui/src/components/filtering/filter-control.tsx index 36bf832ae..8c7826808 100644 --- a/ui/src/components/filtering/filter-control.tsx +++ b/ui/src/components/filtering/filter-control.tsx @@ -16,6 +16,7 @@ import { TaxaListFilter } from './filters/taxa-list-filter' import { TaxonFilter } from './filters/taxon-filter' import { TypeFilter } from './filters/type-filter' import { FilterProps } from './filters/types' +import { ProcessingStatusFilter } from './filters/processing-status-filter' import { VerificationStatusFilter } from './filters/verification-status-filter' import { VerifiedByFilter } from './filters/verified-by-filter' @@ -30,6 +31,7 @@ const ComponentMap: { deployment: StationFilter, detections__source_image: ImageFilter, event: SessionFilter, + has_detections: ProcessingStatusFilter, include_unobserved: BooleanFilter, job_type_key: TypeFilter, not_algorithm: NotAlgorithmFilter, diff --git a/ui/src/utils/useFilters.ts b/ui/src/utils/useFilters.ts index 656e5c6f0..d24cee1ff 100644 --- a/ui/src/utils/useFilters.ts +++ b/ui/src/utils/useFilters.ts @@ -136,6 +136,13 @@ export const AVAILABLE_FILTERS = (projectId: string): FilterConfig[] => [ }, }, }, + { + label: 'Processing status', + field: 'has_detections', + tooltip: { + text: 'Filter captures by whether they have been processed by a detection pipeline.', + }, + }, { label: 'Pipeline', field: 'pipeline', From 8e71fd5746ca50e557c617bf109c985bbf2b679f Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:14:32 -0700 Subject: [PATCH 07/13] feat(ui): show processing-status filter on captures list Co-Authored-By: Claude --- ui/src/pages/captures/captures.tsx | 1 + 1 file changed, 1 insertion(+) diff --git a/ui/src/pages/captures/captures.tsx b/ui/src/pages/captures/captures.tsx index 00f4b2558..26babfbe0 100644 --- a/ui/src/pages/captures/captures.tsx +++ b/ui/src/pages/captures/captures.tsx @@ -65,6 +65,7 @@ export const Captures = () => { +
From 658d58cd217d8d3b687f7f3ea4fef15ea99b7da5 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 21:57:17 -0700 Subject: [PATCH 08/13] feat: split processed status from real-detections; add Last processed column Captures filter now uses a new ?processed= param (any Detection row, including the null markers that mean 'processed, found nothing'), and the Processing status UI filter points at it. ?has_detections= is scoped to real detections only (NULL_DETECTIONS_FILTER excluded), matching how the capture set list separates its processed count from its detections count. Deployments list gains a sortable 'Last processed' column: the latest detection created_at across the deployment's captures, via a correlated subquery annotation (stable row count for pagination), exposed as last_processed and orderable. Tests cover the processed vs has_detections split including a null-marker-only capture, plus the deployment last_processed field and ordering. Co-Authored-By: Claude --- ami/main/api/serializers.py | 3 + ami/main/api/views.py | 43 ++++++++++- ami/main/tests.py | 72 +++++++++++++++---- .../components/filtering/filter-control.tsx | 2 +- ui/src/data-services/models/deployment.ts | 6 ++ ui/src/pages/captures/captures.tsx | 2 +- .../pages/deployments/deployment-columns.tsx | 6 ++ ui/src/pages/deployments/deployments.tsx | 1 + ui/src/utils/language.ts | 2 + ui/src/utils/useFilters.ts | 2 +- 10 files changed, 121 insertions(+), 18 deletions(-) diff --git a/ami/main/api/serializers.py b/ami/main/api/serializers.py index 5d73b4823..e7f703019 100644 --- a/ami/main/api/serializers.py +++ b/ami/main/api/serializers.py @@ -167,6 +167,8 @@ class DeploymentListSerializer(DefaultSerializer): device = DeviceNestedSerializer(read_only=True) research_site = SiteNestedSerializer(read_only=True) jobs = JobStatusSerializer(many=True, read_only=True) + # Annotated in DeploymentViewSet.get_queryset (latest detection created_at). + last_processed = serializers.DateTimeField(read_only=True) class Meta: model = Deployment @@ -188,6 +190,7 @@ class Meta: "longitude", "first_date", "last_date", + "last_processed", "device", "research_site", "jobs", diff --git a/ami/main/api/views.py b/ami/main/api/views.py index 488a31dad..b701522f1 100644 --- a/ami/main/api/views.py +++ b/ami/main/api/views.py @@ -296,6 +296,7 @@ class DeploymentViewSet(DefaultViewSet, ProjectMixin): "taxa_count", "first_capture_timestamp", "last_capture_timestamp", + "last_processed", "name", ] @@ -315,6 +316,19 @@ def get_queryset(self) -> QuerySet: project = self.get_active_project() if project: qs = qs.filter(project=project) + + # "Last processed" = the most recent detection created_at across this + # deployment's captures. A correlated subquery (rather than a join + + # Max) keeps the row count stable for pagination. Null when the + # deployment has no detections yet; NullsLastOrderingFilter sorts those last. + qs = qs.annotate( + last_processed=models.Subquery( + Detection.objects.filter(source_image__deployment=models.OuterRef("pk")) + .order_by("-created_at") + .values("created_at")[:1] + ) + ) + num_example_captures = 10 if self.action == "retrieve": qs = qs.prefetch_related( @@ -597,6 +611,7 @@ def get_queryset(self) -> QuerySet: if self.action == "list": # It's cumbersome to override the default list view, so customize the queryset here + queryset = self.filter_by_processed(queryset) queryset = self.filter_by_has_detections(queryset) elif self.action == "retrieve": @@ -627,12 +642,38 @@ def get_queryset(self) -> QuerySet: return queryset + def filter_by_processed(self, queryset: QuerySet) -> QuerySet: + """ + Filter by whether a capture has been processed by a detection pipeline. + + "Processed" means the capture has *any* Detection row, including the null + markers (``NULL_DETECTIONS_FILTER``) that record a "processed, found nothing" + result. This mirrors how the capture set list separates the processed count + from the (real) detections count. Use ``has_detections`` to filter on real + detections only. + """ + processed = self.request.query_params.get("processed") + if processed is not None: + processed = BooleanField(required=False).clean(processed) + queryset = queryset.annotate( + processed=models.Exists(Detection.objects.filter(source_image=models.OuterRef("pk"))), + ).filter(processed=processed) + return queryset + def filter_by_has_detections(self, queryset: QuerySet) -> QuerySet: + """ + Filter by whether a capture has any *real* detections (a detection with a + bounding box). Null detection markers are excluded, so a capture that was + processed but yielded nothing returns ``has_detections=false``. Use the + ``processed`` param to filter on processing status regardless of findings. + """ has_detections = self.request.query_params.get("has_detections") if has_detections is not None: has_detections = BooleanField(required=False).clean(has_detections) queryset = queryset.annotate( - has_detections=models.Exists(Detection.objects.filter(source_image=models.OuterRef("pk"))), + has_detections=models.Exists( + Detection.objects.filter(source_image=models.OuterRef("pk")).exclude(NULL_DETECTIONS_FILTER) + ), ).filter(has_detections=has_detections) return queryset diff --git a/ami/main/tests.py b/ami/main/tests.py index 7aa79cae5..63979a5fc 100644 --- a/ami/main/tests.py +++ b/ami/main/tests.py @@ -1398,36 +1398,80 @@ def test_unrelated_list_endpoints_still_work_without_project_id(self): class TestCapturesProcessedFilter(APITestCase): """ - The captures list supports ?has_detections=true|false, which the UI surfaces - as the "Processing status" filter. A capture is "processed" when it has any - Detection row (including null markers for "processed, found nothing"). + The captures list distinguishes two related filters: + + - ``?processed=true|false`` (the UI "Processing status" filter): a capture is + "processed" when it has *any* Detection row, including the null markers that + record a "processed, found nothing" result. + - ``?has_detections=true|false``: a capture has *real* detections (a detection + with a bounding box). Null markers are excluded. + + Fixture: 4 captures — 2 with a real detection, 1 with only a null marker + (processed but found nothing), 1 untouched. So: + processed=true -> 3 has_detections=true -> 2 + processed=false -> 1 has_detections=false -> 2 """ def setUp(self) -> None: self.project, self.deployment = setup_test_project(reuse=False) self.captures = create_captures(self.deployment, num_nights=1, images_per_night=4) - # Mark the first two captures as processed by giving them a detection. + # Two captures get a real detection (bounding box present). for capture in self.captures[:2]: create_detections(capture, bboxes=[(0.1, 0.1, 0.2, 0.2)]) + # One capture gets only a null marker: processed, but nothing found. + Detection.objects.create( + source_image=self.captures[2], + bbox=None, + timestamp=self.captures[2].timestamp, + ) + # self.captures[3] is left untouched (never processed). self.user = User.objects.create_user(email="proc-filter@insectai.org", is_staff=True) # type: ignore self.client.force_authenticate(user=self.user) self.list_url = f"/api/v2/captures/?project_id={self.project.pk}" return super().setUp() - def test_no_filter_returns_all_captures(self): - response = self.client.get(self.list_url) + def _count(self, query: str = "") -> int: + response = self.client.get(f"{self.list_url}{query}") self.assertEqual(response.status_code, status.HTTP_200_OK) - self.assertEqual(response.json()["count"], 4) + return response.json()["count"] - def test_has_detections_true_returns_only_processed(self): - response = self.client.get(f"{self.list_url}&has_detections=true") - self.assertEqual(response.status_code, status.HTTP_200_OK) - self.assertEqual(response.json()["count"], 2) + def test_processed_counts_null_markers(self): + # The null-marker capture counts as processed (2 real + 1 marker); its + # complement is the single untouched capture. + self.assertEqual(self._count("&processed=true"), 3) + self.assertEqual(self._count("&processed=false"), 1) + + def test_has_detections_excludes_null_markers(self): + # Only the 2 real-detection captures; the processed-but-empty capture + # falls on the has_detections=false side. + self.assertEqual(self._count("&has_detections=true"), 2) + self.assertEqual(self._count("&has_detections=false"), 2) + + +class TestDeploymentLastProcessed(APITestCase): + """ + The deployments list annotates and can order by ``last_processed`` — the most + recent detection created_at across the deployment's captures. + """ + + def setUp(self) -> None: + self.project, self.deployment = setup_test_project(reuse=False) + self.captures = create_captures(self.deployment, num_nights=1, images_per_night=2) + create_detections(self.captures[0], bboxes=[(0.1, 0.1, 0.2, 0.2)]) + self.user = User.objects.create_user(email="lastproc@insectai.org", is_staff=True) # type: ignore + self.client.force_authenticate(user=self.user) + self.url = f"/api/v2/deployments/?project_id={self.project.pk}" + return super().setUp() + + def _deployment_row(self, data: dict) -> dict: + return next(d for d in data["results"] if d["id"] == self.deployment.pk) - def test_has_detections_false_returns_only_unprocessed(self): - response = self.client.get(f"{self.list_url}&has_detections=false") + def test_last_processed_annotated_and_orderable(self): + # One request exercises the annotation, the serializer field, and the + # ordering registration together. + response = self.client.get(f"{self.url}&ordering=-last_processed") self.assertEqual(response.status_code, status.HTTP_200_OK) - self.assertEqual(response.json()["count"], 2) + self.assertIsNotNone(self._deployment_row(response.json())["last_processed"]) class TestProjectOwnerAutoAssignment(APITestCase): diff --git a/ui/src/components/filtering/filter-control.tsx b/ui/src/components/filtering/filter-control.tsx index 8c7826808..0e0944919 100644 --- a/ui/src/components/filtering/filter-control.tsx +++ b/ui/src/components/filtering/filter-control.tsx @@ -31,7 +31,7 @@ const ComponentMap: { deployment: StationFilter, detections__source_image: ImageFilter, event: SessionFilter, - has_detections: ProcessingStatusFilter, + processed: ProcessingStatusFilter, include_unobserved: BooleanFilter, job_type_key: TypeFilter, not_algorithm: NotAlgorithmFilter, diff --git a/ui/src/data-services/models/deployment.ts b/ui/src/data-services/models/deployment.ts index 069896bf8..17b1015f0 100644 --- a/ui/src/data-services/models/deployment.ts +++ b/ui/src/data-services/models/deployment.ts @@ -74,6 +74,12 @@ export class Deployment extends Entity { return this._deployment.taxa_count } + get lastProcessed(): Date | undefined { + return this._deployment.last_processed + ? new Date(this._deployment.last_processed) + : undefined + } + get device(): Entity | undefined { if (this._deployment.device) { return new Entity(this._deployment.device) diff --git a/ui/src/pages/captures/captures.tsx b/ui/src/pages/captures/captures.tsx index 26babfbe0..6a16e1ff0 100644 --- a/ui/src/pages/captures/captures.tsx +++ b/ui/src/pages/captures/captures.tsx @@ -65,7 +65,7 @@ export const Captures = () => { - +
diff --git a/ui/src/pages/deployments/deployment-columns.tsx b/ui/src/pages/deployments/deployment-columns.tsx index f29cb2ab1..8191486de 100644 --- a/ui/src/pages/deployments/deployment-columns.tsx +++ b/ui/src/pages/deployments/deployment-columns.tsx @@ -225,6 +225,12 @@ export const columns = ({ sortField: 'updated_at', renderCell: (item: Deployment) => , }, + { + id: 'last-processed', + name: translate(STRING.FIELD_LABEL_LAST_PROCESSED), + sortField: 'last_processed', + renderCell: (item: Deployment) => , + }, { id: 'actions', name: '', diff --git a/ui/src/pages/deployments/deployments.tsx b/ui/src/pages/deployments/deployments.tsx index 1389dff62..393cc8054 100644 --- a/ui/src/pages/deployments/deployments.tsx +++ b/ui/src/pages/deployments/deployments.tsx @@ -20,6 +20,7 @@ export const Deployments = () => { taxa: true, 'first-date': true, 'last-date': true, + 'last-processed': true, }) const { sort, setSort } = useSort({ field: 'name', diff --git a/ui/src/utils/language.ts b/ui/src/utils/language.ts index 9a26a9a62..ab045774a 100644 --- a/ui/src/utils/language.ts +++ b/ui/src/utils/language.ts @@ -116,6 +116,7 @@ export enum STRING { FIELD_LABEL_KEY, FIELD_LABEL_LAST_SEEN, FIELD_LABEL_LAST_DATE, + FIELD_LABEL_LAST_PROCESSED, FIELD_LABEL_LAST_SYNCED, FIELD_LABEL_LATITUDE, FIELD_LABEL_LOCATION, @@ -442,6 +443,7 @@ const ENGLISH_STRINGS: { [key in STRING]: string } = { [STRING.FIELD_LABEL_KEY]: 'Key', [STRING.FIELD_LABEL_LAST_SEEN]: 'Last seen', [STRING.FIELD_LABEL_LAST_DATE]: 'Last date', + [STRING.FIELD_LABEL_LAST_PROCESSED]: 'Last processed', [STRING.FIELD_LABEL_LAST_SYNCED]: 'Last synced with data source', [STRING.FIELD_LABEL_LATITUDE]: 'Latitude', [STRING.FIELD_LABEL_LOCATION]: 'Location', diff --git a/ui/src/utils/useFilters.ts b/ui/src/utils/useFilters.ts index d24cee1ff..b028a2592 100644 --- a/ui/src/utils/useFilters.ts +++ b/ui/src/utils/useFilters.ts @@ -138,7 +138,7 @@ export const AVAILABLE_FILTERS = (projectId: string): FilterConfig[] => [ }, { label: 'Processing status', - field: 'has_detections', + field: 'processed', tooltip: { text: 'Filter captures by whether they have been processed by a detection pipeline.', }, From ca47bb0f07494443acade0214ef1eadeb3f70edc Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 22:00:03 -0700 Subject: [PATCH 09/13] refactor(ui): clearer prop naming in ProcessingStatusFilter Avoid renaming the value prop to `string` (shadows the global type); use a `booleanValue` local instead. Addresses CodeRabbit review comment. Co-Authored-By: Claude --- .../filtering/filters/processing-status-filter.tsx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ui/src/components/filtering/filters/processing-status-filter.tsx b/ui/src/components/filtering/filters/processing-status-filter.tsx index f76657670..edfe33af9 100644 --- a/ui/src/components/filtering/filters/processing-status-filter.tsx +++ b/ui/src/components/filtering/filters/processing-status-filter.tsx @@ -3,15 +3,15 @@ import { STRING, translate } from 'utils/language' import { booleanToString, stringToBoolean } from '../utils' import { FilterProps } from './types' -export const ProcessingStatusFilter = ({ value: string, onAdd }: FilterProps) => { - const value = stringToBoolean(string) +export const ProcessingStatusFilter = ({ value, onAdd }: FilterProps) => { + const booleanValue = stringToBoolean(value) const options = [ { value: true, label: translate(STRING.PROCESSED) }, { value: false, label: translate(STRING.NOT_PROCESSED) }, ] return ( - + From 097e5d2c073327583c4e2d93c4ee40de0eb038e4 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Thu, 28 May 2026 22:07:24 -0700 Subject: [PATCH 10/13] refactor(captures): reuse with_was_processed() for the processed filter Keeps the 'processed' definition in one place (the existing SourceImageQuerySet annotation) instead of duplicating the Exists(Detection) subquery in the viewset. Co-Authored-By: Claude --- ami/main/api/views.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/ami/main/api/views.py b/ami/main/api/views.py index b701522f1..04b15bb03 100644 --- a/ami/main/api/views.py +++ b/ami/main/api/views.py @@ -651,13 +651,14 @@ def filter_by_processed(self, queryset: QuerySet) -> QuerySet: result. This mirrors how the capture set list separates the processed count from the (real) detections count. Use ``has_detections`` to filter on real detections only. + + Reuses the ``with_was_processed`` queryset annotation so the "processed" + definition stays in one place. """ processed = self.request.query_params.get("processed") if processed is not None: processed = BooleanField(required=False).clean(processed) - queryset = queryset.annotate( - processed=models.Exists(Detection.objects.filter(source_image=models.OuterRef("pk"))), - ).filter(processed=processed) + queryset = queryset.with_was_processed().filter(was_processed=processed) return queryset def filter_by_has_detections(self, queryset: QuerySet) -> QuerySet: From dae73c586f8564722ccebd58b4af97911f16a32e Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Fri, 29 May 2026 09:23:29 -0700 Subject: [PATCH 11/13] feat(captures): move Last processed column to the captures list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous commit added a sortable "Last processed" column to the deployments (Stations) list. Move it to the captures list instead, where it shows the most recent detection created_at for each capture — i.e. when that capture was last run through a detection pipeline. Exposed as the `last_processed` annotation on the captures endpoint and orderable via ?ordering=last_processed. Reverts the deployments-list column. A correlated subquery (LIMIT 1) keeps the pagination row count stable. A new composite index on Detection(source_image, -created_at) makes the per-capture lookup an index-only scan, so the column stays fast for the scoped event/deployment/collection views that the UI actually uses, without denormalizing a timestamp field onto SourceImage. Co-Authored-By: Claude --- ami/main/api/serializers.py | 6 ++-- ami/main/api/views.py | 36 ++++++++++++------- .../0088_detection_det_srcimg_created_idx.py | 16 +++++++++ ami/main/models.py | 5 +++ ami/main/tests.py | 21 ++++++----- ui/src/data-services/models/capture.ts | 6 ++++ ui/src/data-services/models/deployment.ts | 6 ---- ui/src/pages/captures/capture-columns.tsx | 7 ++++ ui/src/pages/captures/captures.tsx | 1 + .../pages/deployments/deployment-columns.tsx | 6 ---- ui/src/pages/deployments/deployments.tsx | 1 - 11 files changed, 74 insertions(+), 37 deletions(-) create mode 100644 ami/main/migrations/0088_detection_det_srcimg_created_idx.py diff --git a/ami/main/api/serializers.py b/ami/main/api/serializers.py index e7f703019..64c721cbd 100644 --- a/ami/main/api/serializers.py +++ b/ami/main/api/serializers.py @@ -167,8 +167,6 @@ class DeploymentListSerializer(DefaultSerializer): device = DeviceNestedSerializer(read_only=True) research_site = SiteNestedSerializer(read_only=True) jobs = JobStatusSerializer(many=True, read_only=True) - # Annotated in DeploymentViewSet.get_queryset (latest detection created_at). - last_processed = serializers.DateTimeField(read_only=True) class Meta: model = Deployment @@ -190,7 +188,6 @@ class Meta: "longitude", "first_date", "last_date", - "last_processed", "device", "research_site", "jobs", @@ -1101,6 +1098,8 @@ class SourceImageListSerializer(DefaultSerializer): deployment = DeploymentNestedSerializer(read_only=True) event = EventNestedSerializer(read_only=True) project = serializers.PrimaryKeyRelatedField(queryset=Project.objects.all(), required=False) + # Annotated in SourceImageViewSet.get_queryset (latest detection created_at). + last_processed = serializers.DateTimeField(read_only=True) # file = serializers.ImageField(allow_empty_file=False, use_url=True) class Meta: @@ -1121,6 +1120,7 @@ class Meta: "detections_count", "occurrences_count", "taxa_count", + "last_processed", "detections", "project", ] diff --git a/ami/main/api/views.py b/ami/main/api/views.py index 04b15bb03..1bfa55d74 100644 --- a/ami/main/api/views.py +++ b/ami/main/api/views.py @@ -296,7 +296,6 @@ class DeploymentViewSet(DefaultViewSet, ProjectMixin): "taxa_count", "first_capture_timestamp", "last_capture_timestamp", - "last_processed", "name", ] @@ -317,18 +316,6 @@ def get_queryset(self) -> QuerySet: if project: qs = qs.filter(project=project) - # "Last processed" = the most recent detection created_at across this - # deployment's captures. A correlated subquery (rather than a join + - # Max) keeps the row count stable for pagination. Null when the - # deployment has no detections yet; NullsLastOrderingFilter sorts those last. - qs = qs.annotate( - last_processed=models.Subquery( - Detection.objects.filter(source_image__deployment=models.OuterRef("pk")) - .order_by("-created_at") - .values("created_at")[:1] - ) - ) - num_example_captures = 10 if self.action == "retrieve": qs = qs.prefetch_related( @@ -575,6 +562,7 @@ class SourceImageViewSet(DefaultViewSet, ProjectMixin): "deployment__name", "event__start", "path", + "last_processed", ] permission_classes = [ObjectPermission] @@ -613,12 +601,14 @@ def get_queryset(self) -> QuerySet: # It's cumbersome to override the default list view, so customize the queryset here queryset = self.filter_by_processed(queryset) queryset = self.filter_by_has_detections(queryset) + queryset = self.annotate_last_processed(queryset) elif self.action == "retrieve": # For detail view, include storage info and additional prefetches with_counts_default = True queryset = queryset.prefetch_related("jobs", "collections") queryset = self.add_adjacent_captures(queryset) + queryset = self.annotate_last_processed(queryset) with_detections_default = True with_detections = self.request.query_params.get("with_detections", with_detections_default) @@ -678,6 +668,26 @@ def filter_by_has_detections(self, queryset: QuerySet) -> QuerySet: ).filter(has_detections=has_detections) return queryset + def annotate_last_processed(self, queryset: QuerySet) -> QuerySet: + """ + Annotate each capture with ``last_processed`` — the most recent detection + ``created_at`` for that capture, i.e. when it was last run through a + detection pipeline. Null when the capture has never been processed; + NullsLastOrderingFilter sorts those last. + + A correlated subquery (rather than a join + Max) keeps the row count stable + for pagination. The supporting index on Detection(source_image, -created_at) + makes the per-row lookup an index scan, so this stays cheap without + denormalizing a timestamp onto SourceImage. + """ + return queryset.annotate( + last_processed=models.Subquery( + Detection.objects.filter(source_image=models.OuterRef("pk")) + .order_by("-created_at") + .values("created_at")[:1] + ) + ) + def prefetch_detections(self, queryset: QuerySet, project: Project | None = None) -> QuerySet: """ Return all detections for source images, but only include occurrence data diff --git a/ami/main/migrations/0088_detection_det_srcimg_created_idx.py b/ami/main/migrations/0088_detection_det_srcimg_created_idx.py new file mode 100644 index 000000000..185d5621a --- /dev/null +++ b/ami/main/migrations/0088_detection_det_srcimg_created_idx.py @@ -0,0 +1,16 @@ +# Generated by Django 4.2.10 on 2026-05-29 12:14 + +from django.db import migrations, models + + +class Migration(migrations.Migration): + dependencies = [ + ("main", "0087_taxon_parents_json_gin_index"), + ] + + operations = [ + migrations.AddIndex( + model_name="detection", + index=models.Index(fields=["source_image", "-created_at"], name="det_srcimg_created_idx"), + ), + ] diff --git a/ami/main/models.py b/ami/main/models.py index 1de9a01ee..04210cbe6 100644 --- a/ami/main/models.py +++ b/ami/main/models.py @@ -2852,6 +2852,11 @@ class Meta: "frame_num", "timestamp", ] + indexes = [ + # Supports the "last processed" subquery on the captures list: the + # latest detection created_at per source image (index scan, top 1). + models.Index(fields=["source_image", "-created_at"], name="det_srcimg_created_idx"), + ] def best_classification(self): # @TODO where is this used? diff --git a/ami/main/tests.py b/ami/main/tests.py index 63979a5fc..5292a5f07 100644 --- a/ami/main/tests.py +++ b/ami/main/tests.py @@ -1448,30 +1448,35 @@ def test_has_detections_excludes_null_markers(self): self.assertEqual(self._count("&has_detections=false"), 2) -class TestDeploymentLastProcessed(APITestCase): +class TestCapturesLastProcessed(APITestCase): """ - The deployments list annotates and can order by ``last_processed`` — the most - recent detection created_at across the deployment's captures. + The captures list annotates and can order by ``last_processed`` — the most + recent detection created_at for each capture. Captures that were never + processed expose ``last_processed = None``. """ def setUp(self) -> None: self.project, self.deployment = setup_test_project(reuse=False) self.captures = create_captures(self.deployment, num_nights=1, images_per_night=2) + # First capture is processed (has a detection); the second is left untouched. create_detections(self.captures[0], bboxes=[(0.1, 0.1, 0.2, 0.2)]) - self.user = User.objects.create_user(email="lastproc@insectai.org", is_staff=True) # type: ignore + self.user = User.objects.create_user(email="cap-lastproc@insectai.org", is_staff=True) # type: ignore self.client.force_authenticate(user=self.user) - self.url = f"/api/v2/deployments/?project_id={self.project.pk}" + self.url = f"/api/v2/captures/?project_id={self.project.pk}" return super().setUp() - def _deployment_row(self, data: dict) -> dict: - return next(d for d in data["results"] if d["id"] == self.deployment.pk) + def _row(self, data: dict, capture_id: int) -> dict: + return next(c for c in data["results"] if c["id"] == capture_id) def test_last_processed_annotated_and_orderable(self): # One request exercises the annotation, the serializer field, and the # ordering registration together. response = self.client.get(f"{self.url}&ordering=-last_processed") self.assertEqual(response.status_code, status.HTTP_200_OK) - self.assertIsNotNone(self._deployment_row(response.json())["last_processed"]) + data = response.json() + # Processed capture has a timestamp; the untouched one is null. + self.assertIsNotNone(self._row(data, self.captures[0].pk)["last_processed"]) + self.assertIsNone(self._row(data, self.captures[1].pk)["last_processed"]) class TestProjectOwnerAutoAssignment(APITestCase): diff --git a/ui/src/data-services/models/capture.ts b/ui/src/data-services/models/capture.ts index c2f80bdec..054c6b1e7 100644 --- a/ui/src/data-services/models/capture.ts +++ b/ui/src/data-services/models/capture.ts @@ -96,6 +96,12 @@ export class Capture { }) } + get lastProcessed(): Date | undefined { + return this._capture.last_processed + ? new Date(this._capture.last_processed) + : undefined + } + get deploymentId(): string | undefined { return this._capture.deployment ? `${this._capture.deployment.id}` diff --git a/ui/src/data-services/models/deployment.ts b/ui/src/data-services/models/deployment.ts index 17b1015f0..069896bf8 100644 --- a/ui/src/data-services/models/deployment.ts +++ b/ui/src/data-services/models/deployment.ts @@ -74,12 +74,6 @@ export class Deployment extends Entity { return this._deployment.taxa_count } - get lastProcessed(): Date | undefined { - return this._deployment.last_processed - ? new Date(this._deployment.last_processed) - : undefined - } - get device(): Entity | undefined { if (this._deployment.device) { return new Entity(this._deployment.device) diff --git a/ui/src/pages/captures/capture-columns.tsx b/ui/src/pages/captures/capture-columns.tsx index ca27ef4d1..efb60be38 100644 --- a/ui/src/pages/captures/capture-columns.tsx +++ b/ui/src/pages/captures/capture-columns.tsx @@ -3,6 +3,7 @@ import { Capture } from 'data-services/models/capture' import { BasicTableCell, CellTheme, + DateTableCell, ImageCellTheme, ImageTableCell, TableColumn, @@ -151,6 +152,12 @@ export const columns = ({ sortField: 'path', renderCell: (item: Capture) => , }, + { + id: 'last-processed', + name: translate(STRING.FIELD_LABEL_LAST_PROCESSED), + sortField: 'last_processed', + renderCell: (item: Capture) => , + }, { id: 'occurrences', name: translate(STRING.FIELD_LABEL_OCCURRENCES), diff --git a/ui/src/pages/captures/captures.tsx b/ui/src/pages/captures/captures.tsx index 6a16e1ff0..71f2009dc 100644 --- a/ui/src/pages/captures/captures.tsx +++ b/ui/src/pages/captures/captures.tsx @@ -37,6 +37,7 @@ export const Captures = () => { dimensions: true, filename: false, path: false, + 'last-processed': true, }) const { selectedView, setSelectedView } = useSelectedView('table') const { filters } = useFilters() diff --git a/ui/src/pages/deployments/deployment-columns.tsx b/ui/src/pages/deployments/deployment-columns.tsx index 8191486de..f29cb2ab1 100644 --- a/ui/src/pages/deployments/deployment-columns.tsx +++ b/ui/src/pages/deployments/deployment-columns.tsx @@ -225,12 +225,6 @@ export const columns = ({ sortField: 'updated_at', renderCell: (item: Deployment) => , }, - { - id: 'last-processed', - name: translate(STRING.FIELD_LABEL_LAST_PROCESSED), - sortField: 'last_processed', - renderCell: (item: Deployment) => , - }, { id: 'actions', name: '', diff --git a/ui/src/pages/deployments/deployments.tsx b/ui/src/pages/deployments/deployments.tsx index 393cc8054..1389dff62 100644 --- a/ui/src/pages/deployments/deployments.tsx +++ b/ui/src/pages/deployments/deployments.tsx @@ -20,7 +20,6 @@ export const Deployments = () => { taxa: true, 'first-date': true, 'last-date': true, - 'last-processed': true, }) const { sort, setSort } = useSort({ field: 'name', From a3fab5580c6fd12da63f0d9c52b234e7cbbae080 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Fri, 29 May 2026 10:15:15 -0700 Subject: [PATCH 12/13] perf: count processed/has_detections captures by subtraction The captures list COUNT for ?processed= and ?has_detections= ran an EXISTS / NOT EXISTS subquery with no LIMIT to prune, so on a large project (~900k captures) NOT EXISTS became a full anti-join over the wide source-image table (~12s for processed=false, ~11s for has_detections=false). SourceImagePagination now counts the captures *with* detections off the Detection table (COUNT(DISTINCT source_image_id), a small index scan) and subtracts from the project total: processed=true -> processed_count processed=false -> total - processed_count Both counts are exact and cost scales with the number of detection rows rather than the processed/unprocessed ratio, so it is fast in both directions (measured: false ~1.7s vs 12.8s, true ~1.5s vs 4.8s on a 929k-capture project). Any other filter combination falls back to the default count. Co-Authored-By: Claude --- ami/main/api/views.py | 69 +++++++++++++++++++++++++++++++++++++++++++ ami/main/tests.py | 17 +++++++++++ 2 files changed, 86 insertions(+) diff --git a/ami/main/api/views.py b/ami/main/api/views.py index 1bfa55d74..429e74b5e 100644 --- a/ami/main/api/views.py +++ b/ami/main/api/views.py @@ -154,6 +154,59 @@ def get_count(self, queryset): return super().get_count(queryset.order_by().values("pk")) +class SourceImagePagination(LimitOffsetPaginationWithPermissions): + """ + Pagination for the captures list that computes the COUNT for the ``processed`` + and ``has_detections`` filters by subtraction instead of counting an + EXISTS / NOT EXISTS subquery directly. + + On a large project (~900k captures) the page of rows is cheap because the LIMIT + prunes early, but the pagination COUNT has no LIMIT to prune: ``NOT EXISTS`` + becomes a full anti-join over the wide source-image table (~12s). Instead we + count the captures *with* detections off the Detection table (a small index + scan, COUNT(DISTINCT source_image_id)) and subtract from the project total: + + processed=true -> processed_count + processed=false -> total - processed_count + + Both counts are exact and the cost scales with the number of detection rows, + not the processed/unprocessed ratio, so it is fast in both directions. Any + other filter (or no filter) falls back to the default count. + """ + + def paginate_queryset(self, queryset, request, view=None): + # DRF sets self.request *after* get_count() runs, but get_count() needs the + # query params and the view, so stash them here first. + self.request = request + self._view = view + return super().paginate_queryset(queryset, request, view=view) + + def get_count(self, queryset): + params = self.request.query_params + processed = params.get("processed") + has_detections = params.get("has_detections") + + # Only the single-existence-filter case can be expressed as a subtraction. + # If both are set (or neither), fall back to counting the queryset directly. + if (processed is None) == (has_detections is None): + return super().get_count(queryset) + + base = self._view.get_count_base_queryset() + total = base.order_by().values("pk").count() + + detections = Detection.objects.filter(source_image__in=base.values("pk")) + if has_detections is not None: + # has_detections counts *real* detections only (null markers excluded). + detections = detections.exclude(NULL_DETECTIONS_FILTER) + raw = has_detections + else: + raw = processed + processed_count = detections.values("source_image_id").distinct().count() + + want_true = BooleanField(required=False).clean(raw) + return processed_count if want_true else total - processed_count + + class ProjectViewSet(DefaultViewSet, ProjectMixin): """ API endpoint that allows projects to be viewed or edited. @@ -539,6 +592,7 @@ class SourceImageViewSet(DefaultViewSet, ProjectMixin): require_project_for_list = True # Unfiltered list scans are too expensive on this table queryset = SourceImage.objects.all() + pagination_class = SourceImagePagination serializer_class = SourceImageSerializer filterset_fields = [ @@ -632,6 +686,21 @@ def get_queryset(self) -> QuerySet: return queryset + def get_count_base_queryset(self) -> QuerySet: + """ + Captures scoped by the same project/event/deployment/collection filters as + the list, but *without* the ``processed`` / ``has_detections`` predicate. + + Used by ``SourceImagePagination`` to count via subtraction. Only the + ``DjangoFilterBackend`` is applied (not ordering/search) — ordering would + reference the ``last_processed`` annotation, which isn't present here and + doesn't affect the count anyway. + """ + qs = SourceImage.objects.all() + if isinstance(qs, BaseQuerySet): + qs = qs.visible_for_user(self.request.user) # type: ignore[attr-defined] + return DjangoFilterBackend().filter_queryset(self.request, qs, self) + def filter_by_processed(self, queryset: QuerySet) -> QuerySet: """ Filter by whether a capture has been processed by a detection pipeline. diff --git a/ami/main/tests.py b/ami/main/tests.py index 5292a5f07..3d3aefd2a 100644 --- a/ami/main/tests.py +++ b/ami/main/tests.py @@ -1447,6 +1447,23 @@ def test_has_detections_excludes_null_markers(self): self.assertEqual(self._count("&has_detections=true"), 2) self.assertEqual(self._count("&has_detections=false"), 2) + def test_processed_count_respects_narrowing_filter(self): + # The count is computed by subtraction (total - processed) off a base + # queryset that must apply the same narrowing filters as the list. Add a + # second deployment whose captures are all unprocessed, then scope to it: + # the processed/false counts must reflect only that deployment, not the + # whole project. + other_deployment = Deployment.objects.create(project=self.project, name="Empty deployment") + create_captures(other_deployment, num_nights=1, images_per_night=2) + scope = f"&deployment={other_deployment.pk}" + # None of the second deployment's captures are processed. + self.assertEqual(self._count(f"{scope}&processed=true"), 0) + self.assertEqual(self._count(f"{scope}&processed=false"), 2) + # The first deployment is unchanged (3 processed, 1 not). + first_scope = f"&deployment={self.deployment.pk}" + self.assertEqual(self._count(f"{first_scope}&processed=true"), 3) + self.assertEqual(self._count(f"{first_scope}&processed=false"), 1) + class TestCapturesLastProcessed(APITestCase): """ From 7c4d1da4073953c3b078c0922f9ad91dbf217212 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Fri, 29 May 2026 13:51:16 -0700 Subject: [PATCH 13/13] revert: drop processed/has_detections count-by-subtraction paginator Deploy-time benchmarking on the Serbia dev box (hardware comparable to production) showed the slow COUNT that motivated the subtraction paginator does not reproduce there. The 12.8s NOT EXISTS anti-join for processed=false on the 929k-capture project was a local-environment artifact (cold cache, small RAM); Serbia runs the same anti-join in ~1.4s. The index added for the last_processed sort is not used by the count plan either way, and the subtraction's detection-side IN-subquery had its own cold-plan spike (7.71s on a smaller project's first disk touch). Revert to the default DRF count for these filters. The general fix for slow counts on large filtered lists is the estimated-count paginator (#1328), not a per-filter bespoke count. The subtraction strategy, implementation notes, and full benchmarks are kept for reference in docs/claude/reference/captures-processed-count-strategies.md. Co-Authored-By: Claude --- ami/main/api/views.py | 69 --------- ami/main/tests.py | 17 --- .../captures-processed-count-strategies.md | 132 ++++++++++++++++++ 3 files changed, 132 insertions(+), 86 deletions(-) create mode 100644 docs/claude/reference/captures-processed-count-strategies.md diff --git a/ami/main/api/views.py b/ami/main/api/views.py index 429e74b5e..1bfa55d74 100644 --- a/ami/main/api/views.py +++ b/ami/main/api/views.py @@ -154,59 +154,6 @@ def get_count(self, queryset): return super().get_count(queryset.order_by().values("pk")) -class SourceImagePagination(LimitOffsetPaginationWithPermissions): - """ - Pagination for the captures list that computes the COUNT for the ``processed`` - and ``has_detections`` filters by subtraction instead of counting an - EXISTS / NOT EXISTS subquery directly. - - On a large project (~900k captures) the page of rows is cheap because the LIMIT - prunes early, but the pagination COUNT has no LIMIT to prune: ``NOT EXISTS`` - becomes a full anti-join over the wide source-image table (~12s). Instead we - count the captures *with* detections off the Detection table (a small index - scan, COUNT(DISTINCT source_image_id)) and subtract from the project total: - - processed=true -> processed_count - processed=false -> total - processed_count - - Both counts are exact and the cost scales with the number of detection rows, - not the processed/unprocessed ratio, so it is fast in both directions. Any - other filter (or no filter) falls back to the default count. - """ - - def paginate_queryset(self, queryset, request, view=None): - # DRF sets self.request *after* get_count() runs, but get_count() needs the - # query params and the view, so stash them here first. - self.request = request - self._view = view - return super().paginate_queryset(queryset, request, view=view) - - def get_count(self, queryset): - params = self.request.query_params - processed = params.get("processed") - has_detections = params.get("has_detections") - - # Only the single-existence-filter case can be expressed as a subtraction. - # If both are set (or neither), fall back to counting the queryset directly. - if (processed is None) == (has_detections is None): - return super().get_count(queryset) - - base = self._view.get_count_base_queryset() - total = base.order_by().values("pk").count() - - detections = Detection.objects.filter(source_image__in=base.values("pk")) - if has_detections is not None: - # has_detections counts *real* detections only (null markers excluded). - detections = detections.exclude(NULL_DETECTIONS_FILTER) - raw = has_detections - else: - raw = processed - processed_count = detections.values("source_image_id").distinct().count() - - want_true = BooleanField(required=False).clean(raw) - return processed_count if want_true else total - processed_count - - class ProjectViewSet(DefaultViewSet, ProjectMixin): """ API endpoint that allows projects to be viewed or edited. @@ -592,7 +539,6 @@ class SourceImageViewSet(DefaultViewSet, ProjectMixin): require_project_for_list = True # Unfiltered list scans are too expensive on this table queryset = SourceImage.objects.all() - pagination_class = SourceImagePagination serializer_class = SourceImageSerializer filterset_fields = [ @@ -686,21 +632,6 @@ def get_queryset(self) -> QuerySet: return queryset - def get_count_base_queryset(self) -> QuerySet: - """ - Captures scoped by the same project/event/deployment/collection filters as - the list, but *without* the ``processed`` / ``has_detections`` predicate. - - Used by ``SourceImagePagination`` to count via subtraction. Only the - ``DjangoFilterBackend`` is applied (not ordering/search) — ordering would - reference the ``last_processed`` annotation, which isn't present here and - doesn't affect the count anyway. - """ - qs = SourceImage.objects.all() - if isinstance(qs, BaseQuerySet): - qs = qs.visible_for_user(self.request.user) # type: ignore[attr-defined] - return DjangoFilterBackend().filter_queryset(self.request, qs, self) - def filter_by_processed(self, queryset: QuerySet) -> QuerySet: """ Filter by whether a capture has been processed by a detection pipeline. diff --git a/ami/main/tests.py b/ami/main/tests.py index 3d3aefd2a..5292a5f07 100644 --- a/ami/main/tests.py +++ b/ami/main/tests.py @@ -1447,23 +1447,6 @@ def test_has_detections_excludes_null_markers(self): self.assertEqual(self._count("&has_detections=true"), 2) self.assertEqual(self._count("&has_detections=false"), 2) - def test_processed_count_respects_narrowing_filter(self): - # The count is computed by subtraction (total - processed) off a base - # queryset that must apply the same narrowing filters as the list. Add a - # second deployment whose captures are all unprocessed, then scope to it: - # the processed/false counts must reflect only that deployment, not the - # whole project. - other_deployment = Deployment.objects.create(project=self.project, name="Empty deployment") - create_captures(other_deployment, num_nights=1, images_per_night=2) - scope = f"&deployment={other_deployment.pk}" - # None of the second deployment's captures are processed. - self.assertEqual(self._count(f"{scope}&processed=true"), 0) - self.assertEqual(self._count(f"{scope}&processed=false"), 2) - # The first deployment is unchanged (3 processed, 1 not). - first_scope = f"&deployment={self.deployment.pk}" - self.assertEqual(self._count(f"{first_scope}&processed=true"), 3) - self.assertEqual(self._count(f"{first_scope}&processed=false"), 1) - class TestCapturesLastProcessed(APITestCase): """ diff --git a/docs/claude/reference/captures-processed-count-strategies.md b/docs/claude/reference/captures-processed-count-strategies.md new file mode 100644 index 000000000..85cfe4e35 --- /dev/null +++ b/docs/claude/reference/captures-processed-count-strategies.md @@ -0,0 +1,132 @@ +# Captures list: `processed` / `has_detections` COUNT strategies + +**Created:** 2026-05-29 (PR #1326). **Status:** reference — records a strategy that +was prototyped, benchmarked, and deliberately *not* shipped. + +## Context + +The captures list (`SourceImageViewSet`, `ami/main/api/views.py`) supports two +existence filters: + +- `?processed=true|false` — capture has *any* `Detection` row, including the null + markers (`NULL_DETECTIONS_FILTER = Q(bbox__isnull=True) | Q(bbox=[])`) that record + a "processed, found nothing" result. +- `?has_detections=true|false` — capture has a *real* detection (bounding box + present). Null markers excluded. + +Both translate to an `EXISTS` / `NOT EXISTS` subquery against `main_detection`. The +**page of rows** is cheap (the `LIMIT` prunes early), but the **pagination COUNT** +has no `LIMIT`, so `NOT EXISTS` becomes an anti-join over the whole source-image +table. + +## What shipped (PR #1326) + +- `?processed=` / `?has_detections=` filters. +- Sortable `last_processed` column (correlated subquery: most recent detection + `created_at`). +- Index `det_srcimg_created_idx` on `Detection(source_image, -created_at)` + (migration `0088`) — supports the `last_processed` **sort**. +- The pagination COUNT uses the **default DRF count** (the plain anti-join). No + custom count strategy. + +## The strategy that was NOT shipped: count by subtraction + +Prototype (reverted): a `SourceImagePagination` whose `get_count` computed the +existence-filter count without the anti-join: + +``` +total = COUNT(*) over the project/event/deployment/collection-scoped captures +processed_count = COUNT(DISTINCT source_image_id) off main_detection, scoped to the same captures + (has_detections: also .exclude(NULL_DETECTIONS_FILTER)) + +processed=true -> processed_count +processed=false -> total - processed_count +``` + +Both counts are exact. The cost scales with the number of *detection* rows rather +than the processed/unprocessed ratio, so it is symmetric (fast in both directions). +Implementation notes if it is ever revived: + +- DRF sets `paginator.request` *after* `get_count()` runs, so `paginate_queryset` + must stash `request` + `view` first. +- The base queryset (scoped, but *without* the processed/has_detections predicate) + was rebuilt in the view by applying only `DjangoFilterBackend` — *not* the + ordering backend, which would reference the absent `last_processed` annotation. +- The detection-side count used `Detection.objects.filter(source_image__in=base.values("pk"))`. + This is an `IN (subquery)` semi-join; its plan is less predictable than a direct + `source_image__project_id=` join (see cold-spike below). + +## Why it was reverted + +The original justification was a **12.8s** COUNT for `processed=false` on the +929k-capture project. Deploy-time benchmarking on the Serbia dev box (hardware +comparable to production) showed that number does not reproduce there. + +### Benchmarks + +Local dev box (cold, low RAM, 8 GB source-image table not cached) — the numbers +that originally motivated subtraction: + +| project | filter | default anti-join | subtraction | +|---|---|---|---| +| 18 / 929k (local) | processed=false | **12.8s** | ~1.7s | +| 18 / 929k (local) | processed=true | 4.8s | ~1.7s | +| 18 / 929k (local) | has_detections=false | 11.5s | ~1.9s | +| 18 / 929k (local) | has_detections=true | 3.5s | 0.2s | + +Serbia dev box (cold), real data — the numbers that changed the decision: + +| project | filter | default anti-join | subtraction | +|---|---|---|---| +| 18 / 929k | processed=false | **1.38s** | 0.58s | +| 18 / 929k | processed=true | 1.52s | 0.58s | +| 20 / 105k | processed=true | 0.44s | **7.71s cold** / 0.01s warm | +| 20 / 105k | processed=false | 0.27s | 0.04s | + +Counts matched exactly across both approaches (subtraction is correct): +project 18 → 17938 / 910996; project 20 → 8517 / 96574 (processed), +8476 / 96615 (has_detections). + +### Findings + +1. **The 12.8s was environment-dependent, not algorithmic.** `EXPLAIN (ANALYZE)` + for `processed=false` on project 18 (Serbia): + + ``` + Finalize Aggregate (actual time=1541..1567) + -> Parallel Hash Right Anti Join (rows=303665) + -> Parallel Seq Scan on main_detection (rows=455239) + -> Parallel Hash + -> Parallel Seq Scan on main_sourceimage (Filter: project_id = 18) + Execution Time: 1609 ms + ``` + + The anti-join seq-scans the wide source-image table. Serbia's RAM / OS cache / + parallel workers do it in ~1.6s; the local box did it in 12.8s cold. Serbia ≈ + production, so the real-world cost is far smaller than the local measurement. + +2. **`det_srcimg_created_idx` is not used by the COUNT** — the anti-join plan + ignores it. It only helps the `last_processed` sort. So the index already in the + PR does nothing for the count either way. + +3. **Subtraction has its own cold-plan risk.** On the *smaller* project 20 the + detection-side `IN (subquery)` distinct spiked to 7.71s on first disk touch + (cold seq scan of `main_detection`), settling to sub-second warm — *slower* than + the 0.44s default for that case. `EXPLAIN` (warm) = 634ms via a nested-loop + + pkey memoize + distinct. + +Net: subtraction is a modest, real win on the largest project (0.58 vs 1.38s) and +would protect a cold / memory-pressured environment, but it adds a custom paginator ++ base-queryset rebuild + a second query and an unpredictable cold-plan, for a +benefit that is small on production-class hardware. Not worth it for this PR. + +## General direction + +The durable fix for "COUNT is slow on huge filtered lists" is not per-filter +bespoke counting — it is an **estimated-count paginator** (ticket #1328): use the +PostgreSQL planner's row estimate (`EXPLAIN (FORMAT JSON)` → `Plan["Plan Rows"]`, +<15ms, ~3% accurate where it matters) with an exact-count fallback below a +threshold. That handles *any* filter, not just existence filters. Subtraction +(exact, existence-filters-only) remains a possible fast path to layer underneath it +if exactness is required. See also the annotation-strip count trick in +`ProjectPagination.get_count` and PR #1317.