Skip to content

Framework for admins to trigger and review post-processing methods#1289

Merged
mihow merged 23 commits into
mainfrom
feat/post-processing-admin-scaffolding
Jun 26, 2026
Merged

Framework for admins to trigger and review post-processing methods#1289
mihow merged 23 commits into
mainfrom
feat/post-processing-admin-scaffolding

Conversation

@mihow

@mihow mihow commented May 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Establishes the shared pattern for running post-processing filters from the Django admin. Post-processing filters adjust or clean up data after ML processing; the first one (a small-size detection filter) ships here as the worked example. Until now there was no consistent way to find, configure, or run them — you edited code or called them by hand.

With this change an operator can, from the Django admin:

  • See every registered post-processing filter
  • Trigger one, and get an intermediate form to configure its options (the filter's arguments) before it runs
  • Have the run wrapped in a Job, so it appears in the jobs list with progress and logs like any other ML task
  • Trigger a filter on a single occurrence from the Occurrence admin — the fast spot-check path for developing and tuning a filter without processing a whole capture set
  • See run metrics on the job (detections_checked, detections_flagged, occurrences_updated), so a run's effect is visible without reading the log

This is a scaffold/precursor PR: it sets the approach every future post-processing method plugs into (class masking #999, tracking #1272, rank rollup, …), rather than shipping one large feature on its own. The size filter is included to prove the pattern end to end. The original scope was pure scaffolding plus a refactor of the existing small-size-filter trigger to expose its already-coded size_threshold knob; during review the per-occurrence trigger and stage metrics were added on top as small, migration-free extensions of the same pattern.

image image

List of Changes

Change (user-effect) How (implementation)
Operators see every registered post-processing filter as an admin action Admin-trigger pattern on BasePostProcessingTask; tasks registered in ami/ml/post_processing/registry.py
Each filter declares its options as a typed schema; bad inputs are rejected before any job runs Pydantic v1 config_schema contract, validated at task __init__ (extra="forbid")
Configure a filter's options in an intermediate form before it runs BasePostProcessingActionForm with to_config() → dict contract + parameterized confirmation template (admin/post_processing/confirmation.html + _form_fieldset.html partial)
Runs are tracked as Jobs with progress and logs to_config() result feeds Job.params['config']
Small-size filter's threshold is now adjustable from the admin (was hardcoded) Migrated SmallSizeFilterTask: SmallSizeFilterConfig schema + SmallSizeFilterActionForm (0 < x < 1 validation); same 0.0008 default preserved
A filter can be run on one occurrence, not just a whole capture set Discriminated scope on the config (source_image_collection_id XOR occurrence_id, enforced by a root validator) + OccurrenceAdmin action using the same factory with scope_resolver=lambda occ: {"occurrence_id": occ.pk}; SmallSizeFilterTask._scoped_detections() resolves the detection queryset from whichever scope is set
Operators see what a run examined and changed, on the job's detail page BasePostProcessingTask.report_stage_metrics() writes named stage params via JobProgress.add_or_update_stage_param; falls back to a log line when running jobless
Covered by tests Tests across ami/ml/post_processing/tests/ and ami/ml/tests.py::TestPostProcessingTasks, plus the admin changelist tests below, kept cheap for CI (see Test plan)

Admin niceties for reviewing post-processing

Purpose: make the Django admin a practical place to review and test post-processing on a production-sized database (millions of detections and classifications). Together these support the full review loop an operator runs when checking a post-processing method: find an occurrence, run a post-processing filter on it, click through to the detections and classifications it produced to inspect exactly what changed (including the applied_to provenance chain), optionally edit a classification by hand to probe how it moves the result, then recompute the occurrence's determination to see the effect — all without leaving the admin or looking up ids by hand.

Before these changes that loop was impractical: the occurrence/detection/classification changelists timed out (or ran the database out of memory) on real data, the classification change page stalled rendering preloaded dropdowns, there were no links between an occurrence and its detections/classifications, and nothing recomputed an occurrence's determination after a manual classification edit. Each change below removes one of those blockers. They share no code with the framework above and are independent of it.

Change (user-effect) How (implementation)
occurrences_updated counts only occurrences whose determination actually changed Capture the determination before the save and compare after, instead of counting every re-saved occurrence (small_size_filter.py)
After running a filter, the admin success message links to the created Job(s) format_html link to each job's admin change page, so progress and any failure reason are one click away
The Occurrence and Detection changelists load in well under a second on millions of rows (were ~15s, and the detection list could error out of memory) Count related rows with a correlated subquery instead of Count(...) + GROUP BY, and order by the indexed -id instead of an unindexed -created_at
Search occurrences and detections by id (detections also by source-image path) get_search_results treats an all-digit term as an exact id lookup
The Classification change page opens quickly instead of stalling FK fields (taxon, detection, algorithm, category_map, applied_to) render as autocomplete widgets instead of <select>s preloaded with every row
The Classification changelist no longer transfers thousands of floats per row scores/logits counted in SQL (cardinality) with the arrays deferred
Navigate occurrence → detection → classification by clicking through show_change_link on the detection and classification inlines
Recompute an occurrence's determination after editing its classifications by hand New OccurrenceAdmin action — editing classifications doesn't trigger the recompute that Occurrence/Identification saves do

Adding a new post-processing filter

Three pieces, then one wiring line per admin entry point. Using the upcoming rank rollup and class masking (#999) filters as examples:

  1. Task — subclass BasePostProcessingTask in ami/ml/post_processing/<your_filter>.py; declare key, name, and a pydantic config_schema; implement run(). Call self.update_progress(...) and self.report_stage_metrics({...}) as it works. Add the class to the POSTPROCESSING_TASKS dict in ami/ml/post_processing/registry.py.
  2. Config schema — every knob and scope id the task accepts, e.g. RankRollupConfig(source_image_collection_id, target_rank, min_score) or ClassMaskingConfig(source_image_collection_id | occurrence_id, taxa_list_id). Scope ids follow the discriminated pattern shipped here: declare each as optional and enforce exactly-one with a root validator (copy SmallSizeFilterConfig).
  3. Form — subclass BasePostProcessingActionForm in ami/ml/post_processing/admin/<your_filter>_form.py, declaring only the tunable fields — scope ids come from the selected admin rows, not the form. Class masking's taxa-list picker becomes a ModelChoiceField here. Value bounds belong in the schema, not the form.

Then wire it into each admin it should be triggerable from:

# ami/main/admin.py — e.g. on SourceImageCollectionAdmin
run_rank_rollup = make_post_processing_action(
    RankRollupTask,
    RankRollupActionForm,
    scope_resolver=lambda collection: {"source_image_collection_id": collection.pk},
)
actions = [..., run_rank_rollup]

scope_resolver states what each selected row contributes to the config; the form supplies the tunables. An occurrence-scoped filter (class masking) wires the same factory onto OccurrenceAdmin with scope_resolver=lambda occ: {"occurrence_id": occ.pk}. A task whose row→job mapping isn't 1:1 (tracking partitions events across projects, #1272) passes a custom build_jobs callable instead of scope_resolver. The confirmation page, validation, job creation, and metrics display all come from the shared scaffolding — no per-filter admin code beyond the lines above.

Out of scope (deferred to follow-ups)

  • Project-partitioning helper (defer to whichever multi-scope adopter lands first — likely feat(tracking): revive occurrence tracking as a post-processing task #1272's tracking, which partitions events across projects)
  • REST API surface (eventual primary trigger; admin not future-primary)
  • Rank rollup, class masking, tracking — they stay in their own PRs
  • Surfacing a run's parameters in the occurrence prediction history. Two complementary directions, both deferred to keep this PR migration-free: a Job FK on Classification/Detection (existing ticket PSv2: add Job relationship to a Classification and Detection objects #1156, where post-processing tasks would set classification.job), and a flexible JSON metadata field on Classification for pipeline-specific extras and per-classification transformation notes (needs its own ticket)

Rebase impact for downstream PRs

Smoke test (manual, against live stack)

End-to-end validated against project 18 capture set 175 (10 images):

  1. Action dropdown lists "Run Small size filter post-processing task (async)" ✓
  2. Click Go → renders intermediate confirmation page (template admin/post_processing/confirmation.html rendered) ✓
  3. Form shows size_threshold field with default 0.0008 and help text ✓

Confirmation page

  1. Submit → redirect with success message; Job 1571 created and enqueued; celeryworker picked up; ran to SUCCESS ✓
  2. Job.params payload validated typed: {'task': 'small_size_filter', 'config': {'size_threshold': 0.0008, 'source_image_collection_id': 175}}

Success message

The occurrence-scope trigger and stage metrics are covered by the automated tests below, and the branch is deployed on a shared dev box for hands-on review.

Test plan

  • test_base_schema.py — subclass without schema raises, abstract intermediaries exempt, valid config builds BaseModel instance, defaults apply, scope rules (occurrence accepted / both raise / neither raises), out-of-range threshold raises, unknown keys rejected (extra="forbid")
  • test_admin_form.pyto_config() round-trip, type coercion at the form layer, range enforcement delegated to the schema
  • test_action_factory.py — action naming and labels, one Job per row with task key/scope/project FK, schema errors mapped to form fields, mid-creation failure rolls back all jobs, build_jobs override hook
  • test_small_size_filter_admin.py — intermediate page renders (no Job created), select_across refused, invalid threshold re-renders with error, valid POST creates Job with config, occurrence-scoped POST sets occurrence_id (and not the collection id), multi-collection POST creates per-collection jobs with correct project FKs
  • ami/ml/tests.py::TestPostProcessingTasks — filter flags small detections and updates determinations, occurrence scope leaves sibling occurrences untouched, stage metrics land on the job's progress params
  • Manual smoke: trigger from Django admin against existing test data; verify Job runs to completion with custom threshold

A fixture-cost pass keeps CI time in check: the admin/factory tests build bare Project/Collection rows once per class (setUpTestData) instead of calling the full setup_test_project fixture per test, and the pure form tests run without the DB (SimpleTestCase). The module's 29 tests run in ~5 s locally; this also removed a double-fixture setup that intermittently hit a statement timeout in CI.

Design doc

docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md (committed in this branch).

Notes

  • Pydantic v1 syntax throughout (repo pins pydantic<2.0).
  • Pyright will surface __fields__ and validator as deprecated in some envs — these come from local v2 stubs; container is v1 and the syntax is correct.

Summary by CodeRabbit

  • New Features

    • Two-step admin post-processing action for the Small Size Filter with confirmation UI, per-selection scoping (collection or occurrence), per-item job creation, and a size-threshold parameter validated server-side.
    • Tasks now validate typed configs and can report stage metrics (detections checked/flagged, occurrences updated).
  • Tests

    • Expanded coverage for forms, schema contract, admin confirmation flow, multi-selection job creation, transactional safety, and metrics.
  • Documentation

    • Design doc describing the post-processing admin scaffolding and migration plan.

Context: PRs #999 (class masking) and #1272 (tracking) each grew their own
admin-trigger plumbing (registry registration, intermediate confirmation
page with form, freeform dict config). This precursor extracts the shared
pattern using SmallSizeFilterTask as the migration consumer — proves the
abstraction without carving up another contributor's open PR.

Spec covers: pydantic config_schema contract on BasePostProcessingTask,
BasePostProcessingActionForm class, parameterized confirmation template +
partial, SmallSizeFilterTask migration (exposes existing size_threshold
knob via admin), test plan, rebase impact for #999 + #1272.

Co-Authored-By: Claude <noreply@anthropic.com>
@netlify

netlify Bot commented May 1, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit cdbb4e4
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a3dcf1f1542fb00088f7d1c

@netlify

netlify Bot commented May 1, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit cdbb4e4
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a3dcf1fa16cdc0008f74cc8

@coderabbitai

coderabbitai Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Pydantic config contract for post-processing tasks, a shared admin action factory with a two-step confirmation flow, typed SmallSizeFilter config/task, admin forms and templates, wiring in ami/main/admin.py, and comprehensive tests.

Changes

Post-Processing Admin Scaffolding

Layer / File(s) Summary
BasePostProcessingTask schema contract
ami/ml/post_processing/base.py, ami/ml/post_processing/tests/test_base_schema.py
BasePostProcessingTask requires config_schema (Pydantic BaseModel) and constructs self.config = self.config_schema(**config) during init; logs serialize via self.config.dict(). Tests validate contract and error paths.
SmallSizeFilterConfig and task schema binding
ami/ml/post_processing/small_size_filter.py, ami/ml/tests.py
Adds SmallSizeFilterConfig (size_threshold in (0,1), extra="forbid") and sets SmallSizeFilterTask.config_schema; run() reads typed config fields, resolves scoped detections via _scoped_detections(), and reports detections_checked, detections_flagged, and occurrences_updated.
Admin form base and SmallSizeFilter form
ami/ml/post_processing/admin/forms.py, ami/ml/post_processing/admin/small_size_filter_form.py, ami/ml/post_processing/tests/test_admin_form.py
BasePostProcessingActionForm.to_config() maps cleaned_data → config dict. SmallSizeFilterActionForm exposes size_threshold: FloatField with initial value taken from SmallSizeFilterConfig. Tests cover cleaned output, initial defaults, non-numeric rejection, and passthrough behavior.
Confirmation page and form fieldset templates
ami/templates/admin/post_processing/_form_fieldset.html, ami/templates/admin/post_processing/confirmation.html
New admin templates: a reusable _form_fieldset.html that renders fields, labels, help text, and errors, and confirmation.html for the two-step confirmation form with CSRF, hidden selection fields, and submit/cancel controls.
Shared admin action factory and admin wiring
ami/ml/post_processing/admin/actions.py, ami/main/admin.py, ami/ml/post_processing/tests/test_action_factory.py
Introduces make_post_processing_action (with default_build_jobs, pydantic→form error mapping, atomic job creation, and confirmation rendering). SourceImageCollectionAdmin.run_small_size_filter and OccurrenceAdmin.run_small_size_filter are replaced/added as factory-created actions bound to SmallSizeFilterTask and SmallSizeFilterActionForm. Tests cover registration, default behavior, error mapping/atomicity, and the build_jobs override hook.
Form and admin integration tests
ami/ml/post_processing/tests/test_admin_form.py, ami/ml/post_processing/tests/test_small_size_filter_admin.py, ami/ml/tests.py
Adds tests for admin form typing/cleaning and comprehensive admin action flows: intermediate confirmation rendering, validation error re-rendering with no job creation, confirmed job creation with correct params and per-job project_id, multi-collection job creation, occurrence-scoped runs, and task run metrics reporting.
Design specification
docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md
Design doc describing the Pydantic schema contract, admin form contract, confirmation template/flow, refactor into make_post_processing_action(...), and associated tests and rebase notes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

backend, ml, data-processing

Suggested reviewers

  • loppear
  • annavik

Poem

A rabbit hops through admin halls so grand,
With Pydantic schemas held in paw,
Forms and confirmations tidy the land,
Jobs queued neatly without a flaw.
Hooray—small filters march in order now! 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: establishing a framework for admins to trigger and review post-processing methods, which is the core feature added across all the modified and new files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is well structured and covers summary, changes, testing, screenshots, and deployment notes, with only minor template sections left incomplete.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/post-processing-admin-scaffolding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

mihow and others added 2 commits May 1, 2026 15:13
Adds the shared infrastructure that PRs #999 (class masking) and #1272
(tracking) each grew independently. Uses SmallSizeFilterTask as the
migration consumer to prove the pattern without carving up another open PR.

Changes:
- BasePostProcessingTask gains required pydantic config_schema class attr;
  validates Job.params['config'] at task __init__ via self.config_schema(**config).
  self.config is now a BaseModel instance (was: freeform dict).
- SmallSizeFilterTask: adds SmallSizeFilterConfig schema (size_threshold +
  source_image_collection_id, validators for 0<x<1, extra=forbid). Existing
  default 0.0008 preserved.
- New admin form base BasePostProcessingActionForm with to_config() contract,
  plus SmallSizeFilterActionForm exposing size_threshold knob (currently
  unreachable — admin trigger hardcoded empty config).
- Parameterized confirmation template at admin/post_processing/confirmation.html
  with overridable {% block intro %} and a _form_fieldset.html partial.
- run_small_size_filter admin action rewrites onto new pattern: intermediate
  confirmation page on first POST, validate + enqueue on confirm. Per-collection
  Job creation preserved (each Job gets correct project FK).
- 17 tests covering schema contract, form validation, intermediate page render,
  multi-collection partitioning by project FK.

Pydantic v1 syntax throughout (repo pins pydantic<2.0). Memory note:
container is v1, .dict() / .__fields__ are correct here.

Out of scope (explicitly deferred):
- Project-partitioning helper (defer to whichever multi-scope adopter lands
  first, likely #1272's tracking which partitions events across projects)
- REST API trigger surface (eventual primary; admin not future-primary)
- Rank rollup, class masking, tracking tasks themselves — stay in their PRs

Design doc: docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md

Co-Authored-By: Claude <noreply@anthropic.com>
Confirmation page (intermediate page rendering with size_threshold form
field) and success message (after submitting form, Job 1571 enqueued
with typed config payload).

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread ami/main/admin.py Outdated
mihow and others added 2 commits June 4, 2026 16:35
Screenshots for the PR description now live in the ami-media-staging
object store rather than docs/claude/screenshots/. Keeps binary assets
out of the repo history.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow changed the title feat(post-processing): admin scaffolding precursor (pydantic schema, form base, parameterized template) Let admins set the size threshold when running the small size filter Jun 4, 2026
@mihow mihow changed the title Let admins set the size threshold when running the small size filter Trigger, configure, and track post-processing filters from the Django admin Jun 4, 2026
@mihow mihow changed the title Trigger, configure, and track post-processing filters from the Django admin Framework for admins to trigger and review post-processing methods Jun 5, 2026
@mihow mihow marked this pull request as ready for review June 5, 2026 00:08
Copilot AI review requested due to automatic review settings June 5, 2026 00:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Establishes a reusable Django-admin pattern for triggering ML post-processing tasks with an intermediate configuration form, validates config via Pydantic schemas, and migrates the existing Small Size Filter to the new framework (including admin-adjustable size_threshold and new tests).

Changes:

  • Add a config_schema (Pydantic v1) contract to BasePostProcessingTask and validate task config at construction time.
  • Introduce a base admin action form + shared confirmation templates for rendering/confirming post-processing parameters.
  • Migrate the Small Size Filter admin action to the new confirmation + schema-validated config flow and add test coverage for the new behavior.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md Design doc describing the new admin/scaffolding pattern and migration plan.
ami/ml/post_processing/base.py Enforces config_schema and validates config via Pydantic on task initialization.
ami/ml/post_processing/small_size_filter.py Adds SmallSizeFilterConfig schema and switches task to typed config access.
ami/ml/post_processing/admin/forms.py Adds BasePostProcessingActionForm with a to_config() contract.
ami/ml/post_processing/admin/small_size_filter_form.py Adds SmallSizeFilterActionForm exposing size_threshold in admin.
ami/templates/admin/post_processing/confirmation.html Adds shared intermediate confirmation template for admin actions.
ami/templates/admin/post_processing/_form_fieldset.html Adds shared partial to render form fields + errors.
ami/main/admin.py Rewrites run_small_size_filter to render intermediate form and enqueue schema-validated jobs.
ami/ml/post_processing/tests/test_base_schema.py Tests config_schema enforcement and schema validation behavior.
ami/ml/post_processing/tests/test_admin_form.py Tests base form to_config() and size threshold validation.
ami/ml/post_processing/tests/test_small_size_filter_admin.py Tests admin confirmation flow + job creation/config payload behavior.
ami/ml/post_processing/admin/init.py Creates admin module package.
ami/ml/post_processing/tests/init.py Creates tests package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/admin.py Outdated
Comment thread ami/ml/post_processing/tests/test_small_size_filter_admin.py
Comment thread ami/templates/admin/post_processing/confirmation.html
Address review feedback on the repetitive run_/render pattern. The
confirm -> render -> validate -> enqueue flow is now built by
make_post_processing_action() in ami/ml/post_processing/admin/actions.py,
so each task declares only what varies: its task class, knob form, and how
a selected row maps to a Job (scope_resolver / project_resolver /
name_resolver). Tasks whose row->Job mapping isn't one-Job-per-row (e.g.
PR #1272's per-project event partitioning) pass a custom build_jobs
callable; PR #999 likewise inherits the form/template instead of
hand-rolling HTML.

Config validation now has a single source of truth: the task's pydantic
config_schema. The knob form no longer re-encodes the (0, 1) bound, and the
FloatField min/max that contradicted the exclusive interval is gone. Schema
errors raised while building Jobs are mapped back onto the form so the
operator sees them inline on the confirmation page (non-field errors now
render too). This also removes the near-dead per-row try/except in the old
admin action.

Adds test_action_factory.py (action registration, default builder, schema
mapping, all-or-nothing, build_jobs override hook) and updates the form
tests to reflect that range enforcement moved to the schema layer.

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
ami/ml/post_processing/base.py (1)

30-35: ⚡ Quick win

Enforce config_schema type, not just presence.

__init_subclass__ currently allows non-Pydantic values for config_schema; that fails later in __init__ instead of at class definition time.

Proposed tightening
     def __init_subclass__(cls, **kwargs):
         super().__init_subclass__(**kwargs)
         required_attrs = ["key", "name", "config_schema"]
         for attr in required_attrs:
             if not hasattr(cls, attr) or getattr(cls, attr) is None:
                 raise TypeError(f"{cls.__name__} must define '{attr}' class attribute")
+        if not isinstance(cls.config_schema, type) or not issubclass(
+            cls.config_schema, pydantic.BaseModel
+        ):
+            raise TypeError(
+                f"{cls.__name__}.config_schema must be a subclass of pydantic.BaseModel"
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/post_processing/base.py` around lines 30 - 35, The current
__init_subclass__ only checks presence of "config_schema" but not its type;
update __init_subclass__ to also validate that cls.config_schema is a class/type
and a subclass of pydantic.BaseModel (i.e. import pydantic.BaseModel and use
isinstance/issubclass checks), and raise a TypeError if it's missing or not a
Pydantic model; keep the existing checks for "key" and "name" and reference the
same attribute names (config_schema, key, name) and the __init_subclass__ method
when making the change.
ami/templates/admin/post_processing/_form_fieldset.html (1)

1-8: ⚡ Quick win

Render form.non_field_errors before field rows (future-proofing the shared partial).

ami/templates/admin/post_processing/_form_fieldset.html renders only per-field field.errors; it never outputs form.non_field_errors. For the current SmallSizeFilterActionForm, validation happens in clean_size_threshold() (field-level), so operators already see errors via field.errors—but this partial will hide non-field errors if a form later adds clean() / add_error(None, ...)-style validation.

Suggested patch
+{% if form.non_field_errors %}
+  {% for error in form.non_field_errors %}<p class="errornote">{{ error }}</p>{% endfor %}
+{% endif %}
 {% for field in form %}
   <div class="form-row">
     {{ field.label_tag }}
     {{ field }}
     {% if field.help_text %}<p class="help">{{ field.help_text }}</p>{% endif %}
     {% for error in field.errors %}<p class="errornote">{{ error }}</p>{% endfor %}
   </div>
 {% endfor %}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/templates/admin/post_processing/_form_fieldset.html` around lines 1 - 8,
The partial currently only renders per-field errors; modify
ami/templates/admin/post_processing/_form_fieldset.html to output
form.non_field_errors (e.g., {{ form.non_field_errors }}) before the for loop so
any form-level validation added via clean() or add_error(None, ...) (as might
happen in SmallSizeFilterActionForm) will be visible to operators; ensure
non-field errors render before the field rows for future-proofing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/ml/post_processing/admin/small_size_filter_form.py`:
- Around line 18-20: Replace the Unicode multiplication sign (`×`) in the help
string "Minimum bounding-box area as a fraction of the source image area (width
× height). Detections smaller than this are flagged as 'Not identifiable'.
Default 0.0008 ≈ 0.08% of frame area." with a plain ASCII "x" so the text reads
"(width x height)"; update that literal help text in small_size_filter_form.py
(the same string shown in the diff) to satisfy the Ruff RUF001 lint rule.

In `@ami/ml/post_processing/tests/test_small_size_filter_admin.py`:
- Around line 18-21: Replace the hardcoded password string used in
User.objects.create_superuser by avoiding a literal: call
User.objects.create_superuser(email="ssfadmin@example.com", password=None) (or
any non-literal), then immediately call self.superuser.set_unusable_password()
and self.superuser.save() to ensure no secret literal remains; update references
to self.superuser accordingly.

In `@docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md`:
- Around line 50-70: The fenced code block that lists the project tree starting
with "ami/ml/post_processing/" is missing a language tag; update the opening
triple-backtick to include a language identifier (e.g., ```text) so the block
becomes ```text and the closing backticks remain, ensuring markdownlint MD040 is
satisfied for the tree block in the document.

---

Nitpick comments:
In `@ami/ml/post_processing/base.py`:
- Around line 30-35: The current __init_subclass__ only checks presence of
"config_schema" but not its type; update __init_subclass__ to also validate that
cls.config_schema is a class/type and a subclass of pydantic.BaseModel (i.e.
import pydantic.BaseModel and use isinstance/issubclass checks), and raise a
TypeError if it's missing or not a Pydantic model; keep the existing checks for
"key" and "name" and reference the same attribute names (config_schema, key,
name) and the __init_subclass__ method when making the change.

In `@ami/templates/admin/post_processing/_form_fieldset.html`:
- Around line 1-8: The partial currently only renders per-field errors; modify
ami/templates/admin/post_processing/_form_fieldset.html to output
form.non_field_errors (e.g., {{ form.non_field_errors }}) before the for loop so
any form-level validation added via clean() or add_error(None, ...) (as might
happen in SmallSizeFilterActionForm) will be visible to operators; ensure
non-field errors render before the field rows for future-proofing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98c48d31-30bc-478f-af01-8862514274eb

📥 Commits

Reviewing files that changed from the base of the PR and between e3f662d and 93bd529.

📒 Files selected for processing (13)
  • ami/main/admin.py
  • ami/ml/post_processing/admin/__init__.py
  • ami/ml/post_processing/admin/forms.py
  • ami/ml/post_processing/admin/small_size_filter_form.py
  • ami/ml/post_processing/base.py
  • ami/ml/post_processing/small_size_filter.py
  • ami/ml/post_processing/tests/__init__.py
  • ami/ml/post_processing/tests/test_admin_form.py
  • ami/ml/post_processing/tests/test_base_schema.py
  • ami/ml/post_processing/tests/test_small_size_filter_admin.py
  • ami/templates/admin/post_processing/_form_fieldset.html
  • ami/templates/admin/post_processing/confirmation.html
  • docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md

Comment thread ami/ml/post_processing/admin/small_size_filter_form.py Outdated
Comment thread ami/ml/post_processing/tests/test_small_size_filter_admin.py Outdated
Comment thread docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md Outdated
Addresses review feedback. When an operator uses Django admin's "select all
across pages", the action receives the entire filtered table as its queryset,
which would serialize every pk into hidden _selected_action inputs on the
confirmation page (a very large POST body, potentially over request limits).
This trigger is for explicit, bounded selections, so the across-pages case is
now refused with a clear message instead of rendering an unbounded form.

Also folds in two cosmetic review nits: replace the Unicode multiplication
sign in the size-threshold help text with a plain "x", and add a language tag
to the module-layout code fence in the design doc.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow

mihow commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

Note: the previous automated review covered commit 93bd5299, which predates the controller-abstraction refactor. The current head (9b4b7b37) moves the admin trigger into a factory at ami/ml/post_processing/admin/actions.py, single-sources config validation on the pydantic schema, and adds a select_across guard. Re-reviewing against head so the findings reflect the code that will actually merge.

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== ami/ml/post_processing/admin/actions.py ===" 
cat ami/ml/post_processing/admin/actions.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/admin/forms.py ==="
cat ami/ml/post_processing/admin/forms.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/admin/small_size_filter_form.py ==="
cat ami/ml/post_processing/admin/small_size_filter_form.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/base.py ==="
cat ami/ml/post_processing/base.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/small_size_filter.py ==="
cat ami/ml/post_processing/small_size_filter.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/main/admin.py (post_processing section) ==="
grep -n "small_size_filter\|post_processing\|SmallSize\|select_across\|BasePostProcessing\|run_small_size" ami/main/admin.py | head -80

echo ""
echo "=== ami/main/admin.py full relevant section ==="
grep -n "" ami/main/admin.py | grep -A5 -B5 "small_size_filter\|SmallSizeFilter\|post_processing\|select_across" | head -120

echo ""
echo "=== ami/templates/admin/post_processing/confirmation.html ==="
cat ami/templates/admin/post_processing/confirmation.html 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/templates/admin/post_processing/_form_fieldset.html ==="
cat ami/templates/admin/post_processing/_form_fieldset.html 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/tests/ tree ==="
find ami/ml/post_processing/tests -type f | sort

echo ""
echo "=== ami/ml/post_processing/tests/test_admin_form.py ==="
cat ami/ml/post_processing/tests/test_admin_form.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/tests/test_base_schema.py ==="
cat ami/ml/post_processing/tests/test_base_schema.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/tests/test_small_size_filter_admin.py ==="
cat ami/ml/post_processing/tests/test_small_size_filter_admin.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/__init__.py ==="
cat ami/ml/post_processing/__init__.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/admin/__init__.py ==="
cat ami/ml/post_processing/admin/__init__.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/ml/post_processing/tests/__init__.py ==="
cat ami/ml/post_processing/tests/__init__.py 2>/dev/null || echo "NOT FOUND"

Length of output: 38389


🏁 Script executed:

#!/bin/bash
echo "=== ami/ml/post_processing/tests/test_action_factory.py ==="
cat ami/ml/post_processing/tests/test_action_factory.py 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ami/main/admin.py lines around run_small_size_filter (full context) ==="
sed -n '640,680p' ami/main/admin.py

echo ""
echo "=== Check for select_across in confirmation template ==="
grep -rn "select_across" ami/templates/ ami/ml/ ami/main/admin.py

echo ""
echo "=== Check if select_across POST value is preserved across re-renders ==="
grep -n "select_across" ami/ml/post_processing/admin/actions.py

echo ""
echo "=== Check Job.enqueue() signature ==="
ast-grep --pattern 'def enqueue($$$)'

echo ""
echo "=== Check if config_schema attribute check handles abstract intermediary correctly ==="
grep -n "__init_subclass__\|abc.ABC\|ABCMeta\|abstractmethod" ami/ml/post_processing/base.py

echo ""
echo "=== Check how SmallSizeFilterConfig.dict() serializes (pydantic v1 vs v2) ==="
grep -rn "pydantic" ami/ml/post_processing/ --include="*.py" | grep -v "test_\|\.pyc"

echo ""
echo "=== Check for any pydantic version pinning ==="
grep -rn "pydantic" requirements*.txt setup.cfg pyproject.toml 2>/dev/null | head -20

echo ""
echo "=== Check if confirm value is '1' or truthy in the action ==="
grep -n "confirm" ami/ml/post_processing/admin/actions.py

echo ""
echo "=== Check the select_across guard return value ==="
grep -n -A3 "select_across" ami/ml/post_processing/admin/actions.py

echo ""
echo "=== Check queryset.count() call in render_confirmation - N+1 risk ==="
grep -n "count\|values_list" ami/ml/post_processing/admin/actions.py

echo ""
echo "=== Check SmallSizeFilterTask name attribute (has space vs underscore for key) ==="
grep -n "^    key\|^    name\|^    config_schema" ami/ml/post_processing/small_size_filter.py

echo ""
echo "=== Check if test_base_schema tests hit the DB (Algorithm.objects.get_or_create) ==="
grep -n "get_or_create\|Algorithm" ami/ml/post_processing/base.py

echo ""
echo "=== Check if there is any transaction.atomic around job creation ==="
grep -n "atomic\|transaction" ami/ml/post_processing/admin/actions.py

echo ""
echo "=== Check what happens when build_jobs raises ConfigValidationErrors but form has errors too ==="
grep -n "add_error\|non_field_errors\|form.errors" ami/ml/post_processing/admin/actions.py

echo ""
echo "=== Check if 'name = Small size filter' has lowercase 's' which differs from action label 'Small Size Filter' ==="
grep -n "name\s*=\s*['\"]" ami/ml/post_processing/small_size_filter.py ami/ml/post_processing/admin/actions.py

Length of output: 15296



@mihow: The PR images show a clean confirmation page (Django admin "Run Small Size Filter" form with size_threshold: 0.0008) and a green success banner ("Queued Small Size Filter for 1 capture set(s). Jobs: [1571]"). The end-to-end UX is solid.

Here is the review against head (9b4b7b37).


Review of PR #1289 — Post-Processing Admin Scaffolding

🔴 Important — default_build_jobs is not atomic

default_build_jobs in ami/ml/post_processing/admin/actions.py validates all rows first (good, all-or-nothing), but the creation loop is not wrapped in transaction.atomic():

for obj, model in validated:
    job = Job.objects.create(...)   # auto-commits immediately
    job.enqueue()                   # transaction.on_commit fires right away
    job_pks.append(job.pk)

Because Job.enqueue() uses transaction.on_commit, jobs committed before a mid-loop exception are already in the DB and will be dispatched to the queue. An operator selecting 10 collections gets jobs 1–7 running while jobs 8–10 were never created.

Suggested fix:

from django.db import transaction

# inside default_build_jobs, replace the creation loop with:
with transaction.atomic():
    job_pks: list[int] = []
    for obj, model in validated:
        job = Job.objects.create(
            name=name_resolver(task_cls, obj),
            project=project_resolver(obj),
            job_type_key="post_processing",
            params={"task": task_cls.key, "config": model.dict()},
        )
        job.enqueue()   # on_commit fires only after the atomic block commits
        job_pks.append(job.pk)
return job_pks

🟡 Medium — Pydantic v1 API with no version pin

The code uses pydantic v1-only surface:

  • @pydantic.validator (small_size_filter.py Line 13)
  • model.dict() (actions.py Line 119)
  • SmallSizeFilterConfig.__fields__["size_threshold"].default (small_size_filter_form.py Line 9)

No pydantic version constraint was found in requirements*.txt, setup.cfg, or pyproject.toml. If pydantic v2 is installed, these work under the compatibility shim but emit deprecation warnings. Consider either pinning pydantic>=1,<2 or migrating to v2 (@field_validator, .model_dump(), .model_fields).


🟡 Medium — __init_subclass__ blocks abstract intermediary classes

BasePostProcessingTask.__init_subclass__ fires on every subclass, including abstract ones. Any future attempt to add a shared abstract mixin like:

class AbstractFilterTask(BasePostProcessingTask):
    """Common logic for size/shape filters."""
    # config_schema intentionally deferred to concrete subclass
    ...

will immediately raise TypeError. A simple guard:

def __init_subclass__(cls, **kwargs):
    super().__init_subclass__(**kwargs)
    import inspect
    if inspect.isabstract(cls):
        return
    for attr in ["key", "name", "config_schema"]:
        if not hasattr(cls, attr) or getattr(cls, attr) is None:
            raise TypeError(f"{cls.__name__} must define '{attr}' class attribute")

🟡 Medium — Two DB hits in render_confirmation

Lines 151–152 of actions.py issue two separate queries:

"selected_count": queryset.count(),
"selected_pks": [str(pk) for pk in queryset.values_list("pk", flat=True)],

One pass is enough:

selected_pks = [str(pk) for pk in queryset.values_list("pk", flat=True)]
...
"selected_count": len(selected_pks),
"selected_pks": selected_pks,

🟡 Medium — Custom build_jobs receives scope_resolver=None

When make_post_processing_action is called with only build_jobs (no scope_resolver), the factory still passes scope_resolver=None in the kwargs dict forwarded to the custom runner:

kwargs: dict[str, Any] = dict(
    ...
    scope_resolver=scope_resolver,   # None when build_jobs is provided
    ...
)
runner(**kwargs)

A custom build_jobs that blindly unpacks all kwargs and calls scope_resolver(obj) would get an obscure TypeError. The cleanest fix is to omit scope_resolver from kwargs when it is None, or document that custom implementations must use **kwargs and not call scope_resolver directly.


🟡 Medium — TestBuildJobsOverrideHook test doesn't actually invoke the action

test_custom_build_jobs_is_used_instead_of_default only checks action.__name__ and constructs the action — it never calls the returned callable or asserts calls was populated. The custom_build_jobs is never actually exercised:

action = make_post_processing_action(...)
self.assertEqual(action.__name__, "run_small_size_filter")
# ← no call to action(...) or assertion on `calls`

Consider adding a call through the admin client (like the other test classes do) or at minimum invoking action(mock_admin, mock_request, qs) and asserting calls was populated.


🟢 Low — name capitalisation inconsistency

SmallSizeFilterTask.name = "Small size filter" (lowercase 's') while the custom title="Run Small Size Filter" and description="Run Small Size Filter ..." in admin.py use title case. The job name produced by name_resolver will say "Small size filter" while the UI says "Small Size Filter". Not user-facing for now, but could confuse operators reading job logs. Aligning to name = "Small Size Filter" would keep everything consistent.


✅ What's working well

  • The factory pattern in make_post_processing_action cleanly separates task definition from admin wiring — new tasks are genuinely one declaration away.
  • The all-or-nothing schema validation before any Job.objects.create() is the right default.
  • select_across guard with a clear user-facing message is a good defensive touch.
  • _schema_errors_to_form_fields mapping pydantic error locs back to Django form fields is elegant and keeps the UX tight.
  • Test coverage across test_action_factory.py, test_small_size_filter_admin.py, test_base_schema.py, and test_admin_form.py is comprehensive.
  • Job.enqueue() already uses transaction.on_commit, so wrapping the creation loop in atomic() will make enqueues correctly deferred with zero extra effort.

Follow-up to the head-commit CodeRabbit re-review:

- Wrap the Job-creation loop in default_build_jobs in transaction.atomic().
  Admin requests are already atomic (ATOMIC_REQUESTS=True), but this helper
  can also be called outside a request (e.g. a management command), so the
  explicit block keeps creation all-or-nothing there too. Job.enqueue() uses
  transaction.on_commit, so the (millisecond) block only guards row creation;
  the long-running task still runs asynchronously in the worker after commit.
- Guard BasePostProcessingTask.__init_subclass__ with inspect.isabstract so an
  abstract intermediary task class isn't forced to declare key/name/config_schema.
- Resolve the selection once in render_confirmation (len of the materialized
  pk list) instead of a separate .count() query.
- Omit scope_resolver from the build_jobs kwargs when it isn't set, so a custom
  build_jobs supplied without one never receives a None it might call.
- Derive the admin action's labels (title, submit, dropdown description) from
  task.name instead of hardcoding title-case strings, so the operator-facing
  label and the Job name stay consistent and there are fewer per-task strings.
- Make the build_jobs override-hook test actually invoke the action and assert
  the custom builder ran (and that scope_resolver was not forwarded as None).

24/24 post_processing tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow

mihow commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Thanks — the re-review against head caught real issues. Addressed in abcb446e:

  • 🔴 default_build_jobs not atomic — wrapped the creation loop in transaction.atomic(). Worth noting the dispatch-partial scenario doesn't actually arise in the admin path: ATOMIC_REQUESTS = True (config/settings/base.py:59) already wraps the whole admin request, so a mid-loop failure rolls back and the on_commit enqueues never fire. The explicit block is for callers outside a request (e.g. a future management command via feat(tracking): revive occurrence tracking as a post-processing task #1272's custom build_jobs), where there's no ambient transaction. Since enqueue() only registers an on_commit, the block guards row creation (ms) — the long task still runs async in the worker after commit.
  • 🟡 __init_subclass__ blocks abstract intermediaries — guarded with inspect.isabstract(cls); concrete tasks still must declare key/name/config_schema.
  • 🟡 Two DB hits in render_confirmation — selection resolved once; count is len() of the materialized pk list.
  • 🟡 Custom build_jobs gets scope_resolver=Nonescope_resolver is now omitted from the forwarded kwargs when unset.
  • 🟢 name casing — labels (title, submit, dropdown) now derive from task.name rather than hardcoded title-case strings, so the operator-facing label and the Job name stay consistent (both "Small size filter"). Avoided renaming task.name itself since it's the lookup key in Algorithm.objects.get_or_create(name=...).
  • Test gapTestBuildJobsOverrideHook now invokes the action through the confirm leg and asserts the custom builder ran (and that scope_resolver isn't forwarded as None).

One correction: the "pydantic v1, no version pin" finding is a false negative — it's pinned at requirements/base.txt:17 (pydantic<2.0 # ... django pydantic field), and the CI container runs 1.10.26. The search looked at top-level requirements*.txt and missed the requirements/ directory. The v1 syntax is correct and enforced.

24/24 post_processing tests green.

mihow and others added 2 commits June 5, 2026 16:52
…tract guard

End-of-PR test audit (separate from the TDD scaffolding):

- Remove test_default_threshold_applies_when_form_uses_initial: size_threshold is
  a required field and can't be omitted, so it just re-POSTed the default value —
  identical mechanism to the existing valid-post test. The default is already
  covered by the form-initial and schema-default unit tests.
- Rewrite the mislabeled "all-or-nothing" test into a real one. The old version
  gave every row the same out-of-range threshold, so both failed at validation and
  the creation loop was never reached. The new test passes validation, forces a
  failure while creating the second Job, and asserts the first is rolled back —
  actually exercising the transaction.atomic() wrap.
- Add coverage for the inspect.isabstract guard in __init_subclass__: an abstract
  intermediary may defer key/name/config_schema to its concrete subclasses.

24 post_processing tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Make post-processing filters easier to develop against and observe.

Per-occurrence trigger (the dev/spot path): SmallSizeFilterConfig now
accepts either source_image_collection_id or occurrence_id (discriminated
scope, exactly-one enforced by a pydantic root_validator), and the task
resolves its detection queryset from whichever scope is set. OccurrenceAdmin
gains a run_small_size_filter action via the same make_post_processing_action
factory, so an operator can run a filter on a single selected occurrence
without standing up a whole capture set. This discriminated-scope shape is
the pattern future per-occurrence / per-event tasks (#999, #1272) copy.

Job stage metrics (run observability): BasePostProcessingTask.report_stage_metrics
writes named counters onto the job's post_processing stage (falling back to a
log line when run without a job). The small size filter emits detections_checked,
detections_flagged and occurrences_updated at each flush and on completion, so
the Jobs admin page shows what a run examined and changed.

Tests: schema scope cases (occurrence-only valid, both-scopes raise, no-scope
raises); the OccurrenceAdmin action enqueues a job scoped to occurrence_id;
run-time occurrence-scope isolation (sibling occurrences untouched); and
stage-metric reporting on a real job.

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/ml/post_processing/small_size_filter.py`:
- Around line 149-153: The current batch flush loop uses the batch-local
occcurrences_to_update set and then clears it, which allows the same occurrence
to be counted multiple times across batches; introduce a persistent set (e.g.,
updated_occurrence_ids) at the scope of the processing function/class to record
IDs of occurrences already counted, then when iterating occcurrences_to_update
only call occ.save(...) and increment modified_occurrences / occurrences_updated
for occurrences whose id is not in updated_occurrence_ids, add their ids to
updated_occurrence_ids, and finally clear the batch set; apply the same change
to the other flush block that uses occcurrences_to_update so counts are
deduplicated across all batches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b26c680-56a7-47e7-b05e-ff240b54866c

📥 Commits

Reviewing files that changed from the base of the PR and between abcb446 and beacb3f.

📒 Files selected for processing (7)
  • ami/main/admin.py
  • ami/ml/post_processing/base.py
  • ami/ml/post_processing/small_size_filter.py
  • ami/ml/post_processing/tests/test_action_factory.py
  • ami/ml/post_processing/tests/test_base_schema.py
  • ami/ml/post_processing/tests/test_small_size_filter_admin.py
  • ami/ml/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • ami/ml/post_processing/base.py
  • ami/ml/post_processing/tests/test_action_factory.py

Comment thread ami/ml/post_processing/small_size_filter.py
mihow and others added 2 commits June 8, 2026 14:36
The metric incremented from a batch-local set that is cleared each flush, so
an occurrence whose detections span more than one batch was counted once per
batch. Track occurrence ids in a persistent set and report its size instead.
Detections never recur across batches, so detections_flagged was already
correct. Addresses CodeRabbit review on beacb3f.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow and others added 3 commits June 9, 2026 17:08
Replace per-test setup_test_project calls with minimal Project/Collection/
Occurrence rows and class-level setUpTestData in the admin-action and factory
tests; the admin flow only reads FKs, so the full fixture (storage source,
deployment, processing service per call) was wasted cost. Convert the pure
form tests to SimpleTestCase (no DB) and drop one redundant valid-path form
test already covered end to end by the admin and factory flows.
TestPostProcessingTasks now builds its images/events/collection once per
class instead of once per test.

Module wall-clock roughly halves locally (8.8s -> 4.5s warm), and the
double setup_test_project in the multi-collection admin test that
intermittently hit a statement timeout in CI is gone.

Co-Authored-By: Claude <noreply@anthropic.com>
… select_across refusal

The post-processing confirmation form rendered each field's validation errors
with the page-level `errornote` banner style. Switch the form-fieldset partial
to Django's native `{{ field.errors }}` and `{{ form.non_field_errors }}` so
errors render inline as `errorlist`, matching Django admin conventions. The
invalid-threshold test now asserts the inline `errorlist` class.

Also reframe the "select all across pages" comment: the reason to refuse it is
that the action would apply to the entire filtered table rather than the rows
the operator explicitly selected. The oversized POST body is a secondary
symptom, not the primary concern.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow added the post-processing Post-processing task framework and tasks: size filter, class masking, rank rollup, tracking label Jun 23, 2026
mihow and others added 3 commits June 22, 2026 19:37
The small size filter reported `occurrences_updated` as the number of
occurrences it re-saved, but re-saving recomputes the determination in
place. An occurrence pinned to a human identification keeps its taxon when
its detection is flagged "Not identifiable", so it was counted even though
nothing about it actually changed.

Capture each occurrence's determination before the save and count it only
when the determination changes after. Add a regression test pairing an
occurrence that carries a human identification (determination unchanged, not
counted) with an un-identified one (determination flips, counted).

Co-Authored-By: Claude <noreply@anthropic.com>
…b(s)

After an operator runs a post-processing task from the admin, the success
message now links each created Job to its admin change page instead of just
printing the raw pks. From there the operator can follow the job's progress
and read any failure reason, which is otherwise only visible in the worker
log.

The link points at the admin change page because that page is always
reachable from this admin action; the public UI host is not reliably known
in this context. The message is built with format_html so the links render
as anchors.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow

mihow commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Two small fixes landed on top of the main-merge (5e2438bb, b3577df7); branch is up to date with main.

  1. occurrences_updated now counts only occurrences whose determination actually changed. The small size filter re-saves each affected occurrence, which recomputes the determination in place — an occurrence pinned to a human identification keeps its taxon, so it was previously counted even when nothing changed. Regression test added.
  2. The admin action's success message now links each created Job to its admin change page, so an operator can follow progress and read any failure reason there rather than only in the worker log.

The framework was also exercised end-to-end on production-shaped data on a dev box: the admin → Job → Celery path drives small_size_filter and class_masking (#999) on real captures — including the human-ID-precedence case, the excluded-class-can't-win case, and the all-excluded guard (clean job failure, no side effects). 31 post-processing tests pass locally; makemigrations --check, lint, and CodeRabbit are green.

Ready for a final look — blocked only on an approve.

mihow added a commit that referenced this pull request Jun 25, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
The occurrence admin changelist is the entry point for the per-occurrence
post-processing trigger, but on a large table (~1.3M occurrences) it took ~15s
to load a single page. Two causes, both fixed here:

- get_queryset counted detections with a JOIN + GROUP BY. A grouped count must
  aggregate the whole occurrence x detection join before the changelist's
  ORDER BY ... LIMIT can take a page, so it scanned every row just to show 25.
  Replaced with a correlated subquery (Coalesced to 0 for occurrences with no
  detections, matching the old JOIN count), which runs only for the rows on the
  page. Measured 14.9s -> 0.04s for one page on the large table.
- The list ordered by -created_at, which has no index, forcing a full sort of
  the table to find the newest page. Switched to -id, the indexed primary key,
  which increases with insertion time so newest-first is preserved.

Also let an all-digit search term act as an exact occurrence-id lookup, so an
operator can jump straight to an occurrence by id; non-numeric terms still use
the determination-name search.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 25, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
…om occurrences

The detection admin changelist counted classifications with Count() + GROUP BY
and ordered by an unindexed -created_at. On a large table (~1.4M detections,
~2.2M classifications) the grouped aggregate runs over the whole detection x
classification join before the page LIMIT applies — slow enough to exhaust
work_mem and error out. Replaced with a correlated subquery (Coalesced to 0)
and -id ordering, mirroring the occurrence changelist fix.

Also set show_change_link on the detection inline of the occurrence page, so an
operator can open a detection's change page — where the classifications inline
shows which algorithms were applied, including post-processing — without
searching for its id.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 25, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
Add a search box to the detection changelist: an all-digit term is an exact id
lookup (jump straight to a detection by id), other terms search the source
image path. Mirrors the occurrence changelist search.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 25, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
…ation

More admin niceties for reviewing post-processing results on a large database:

- Classification change page: FK fields (taxon, detection, algorithm,
  category_map, applied_to) render as autocomplete widgets instead of <select>s
  preloaded with every row, which made the page unusable.
- Classification changelist: count the scores/logits arrays in SQL (cardinality)
  and defer the arrays, so the list no longer transfers thousands of floats per
  row; order by -id; add an id / taxon-name search box.
- Link the detection page's classification inline rows to their change page.
- New OccurrenceAdmin action "Recompute determination": editing classifications
  by hand doesn't refresh the occurrence determination (only Occurrence and
  Identification saves do), so this re-derives it for the selected occurrences.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 26, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
… count, guard huge ids

Three review follow-ups on the admin niceties added for reviewing post-processing:

- Extract the duplicated digit-term id search from the Occurrence, Detection, and
  Classification admins into a shared IdSearchAdminMixin. The three copies were
  identical; the mixin is now the single place that behavior lives.

- Guard an out-of-range numeric term. A digit string larger than the bigint primary
  key raised a database DataError (500) before; it now returns no results.

- Set show_full_result_count = False on the three large-table admins, so the
  changelist no longer runs a second, unfiltered COUNT(*) over the whole table for
  its footer total -- as expensive as the page query itself on millions of rows.

Tests: add the out-of-range id case. The shared mixin is now tested once in
OccurrenceAdminChangelistTest, so the redundant per-admin copy on the Detection
admin is removed (its docstring points to the canonical test).

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow merged commit 08ca0a4 into main Jun 26, 2026
7 checks passed
@mihow mihow deleted the feat/post-processing-admin-scaffolding branch June 26, 2026 01:17
mihow added a commit that referenced this pull request Jun 26, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 30, 2026
…dmin framework

Rework of #999 onto the #1289 post-processing framework. The branch was cut from
an old main with its own hand-rolled admin action; ClassMaskingTask and
RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas
and are triggered through make_post_processing_action (collection scope on
SourceImageCollection, single-occurrence scope on Occurrence for class masking).

Correctness fixes from the review threads:
- Class masking selects the top class from an -inf-masked softmax, so a class
  excluded by the taxa list can never win even when it had the highest logit;
  raises when the taxa list excludes every class in the category map. Stored
  logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0).
- The masked-output Algorithm is one per (source algorithm, taxa list) and its
  category map is persisted (previously set in memory only, so masked
  classifications referenced a null map).
- applied_to is populated on new masked classifications (the provenance the API
  exposes was left blank).
- Rank rollup preloads category-map labels in two queries and select_relates the
  per-row relations instead of dereferencing category_map per classification.

Surfaces provenance in the API: applied_to is added to the Classification
serializers, and applied_to__algorithm is prefetched in the occurrence
list/detail prefetch and the classification viewset to avoid an N+1 on render.

Tests: pydantic config validation, admin trigger for both scopes, the masking
maths (including the excluded-class guarantee and the all-excluded error),
ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests;
full post_processing suite and occurrence query-count tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

post-processing Post-processing task framework and tasks: size filter, class masking, rank rollup, tracking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants