Framework for admins to trigger and review post-processing methods#1289
Conversation
Context: PRs #999 (class masking) and #1272 (tracking) each grew their own admin-trigger plumbing (registry registration, intermediate confirmation page with form, freeform dict config). This precursor extracts the shared pattern using SmallSizeFilterTask as the migration consumer — proves the abstraction without carving up another contributor's open PR. Spec covers: pydantic config_schema contract on BasePostProcessingTask, BasePostProcessingActionForm class, parameterized confirmation template + partial, SmallSizeFilterTask migration (exposes existing size_threshold knob via admin), test plan, rebase impact for #999 + #1272. Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-preview canceled.
|
✅ Deploy Preview for antenna-ssec canceled.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Pydantic config contract for post-processing tasks, a shared admin action factory with a two-step confirmation flow, typed SmallSizeFilter config/task, admin forms and templates, wiring in ChangesPost-Processing Admin Scaffolding
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Adds the shared infrastructure that PRs #999 (class masking) and #1272 (tracking) each grew independently. Uses SmallSizeFilterTask as the migration consumer to prove the pattern without carving up another open PR. Changes: - BasePostProcessingTask gains required pydantic config_schema class attr; validates Job.params['config'] at task __init__ via self.config_schema(**config). self.config is now a BaseModel instance (was: freeform dict). - SmallSizeFilterTask: adds SmallSizeFilterConfig schema (size_threshold + source_image_collection_id, validators for 0<x<1, extra=forbid). Existing default 0.0008 preserved. - New admin form base BasePostProcessingActionForm with to_config() contract, plus SmallSizeFilterActionForm exposing size_threshold knob (currently unreachable — admin trigger hardcoded empty config). - Parameterized confirmation template at admin/post_processing/confirmation.html with overridable {% block intro %} and a _form_fieldset.html partial. - run_small_size_filter admin action rewrites onto new pattern: intermediate confirmation page on first POST, validate + enqueue on confirm. Per-collection Job creation preserved (each Job gets correct project FK). - 17 tests covering schema contract, form validation, intermediate page render, multi-collection partitioning by project FK. Pydantic v1 syntax throughout (repo pins pydantic<2.0). Memory note: container is v1, .dict() / .__fields__ are correct here. Out of scope (explicitly deferred): - Project-partitioning helper (defer to whichever multi-scope adopter lands first, likely #1272's tracking which partitions events across projects) - REST API trigger surface (eventual primary; admin not future-primary) - Rank rollup, class masking, tracking tasks themselves — stay in their PRs Design doc: docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md Co-Authored-By: Claude <noreply@anthropic.com>
Confirmation page (intermediate page rendering with size_threshold form field) and success message (after submitting form, Job 1571 enqueued with typed config payload). Co-Authored-By: Claude <noreply@anthropic.com>
Screenshots for the PR description now live in the ami-media-staging object store rather than docs/claude/screenshots/. Keeps binary assets out of the repo history. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Establishes a reusable Django-admin pattern for triggering ML post-processing tasks with an intermediate configuration form, validates config via Pydantic schemas, and migrates the existing Small Size Filter to the new framework (including admin-adjustable size_threshold and new tests).
Changes:
- Add a
config_schema(Pydantic v1) contract toBasePostProcessingTaskand validate task config at construction time. - Introduce a base admin action form + shared confirmation templates for rendering/confirming post-processing parameters.
- Migrate the Small Size Filter admin action to the new confirmation + schema-validated config flow and add test coverage for the new behavior.
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md | Design doc describing the new admin/scaffolding pattern and migration plan. |
| ami/ml/post_processing/base.py | Enforces config_schema and validates config via Pydantic on task initialization. |
| ami/ml/post_processing/small_size_filter.py | Adds SmallSizeFilterConfig schema and switches task to typed config access. |
| ami/ml/post_processing/admin/forms.py | Adds BasePostProcessingActionForm with a to_config() contract. |
| ami/ml/post_processing/admin/small_size_filter_form.py | Adds SmallSizeFilterActionForm exposing size_threshold in admin. |
| ami/templates/admin/post_processing/confirmation.html | Adds shared intermediate confirmation template for admin actions. |
| ami/templates/admin/post_processing/_form_fieldset.html | Adds shared partial to render form fields + errors. |
| ami/main/admin.py | Rewrites run_small_size_filter to render intermediate form and enqueue schema-validated jobs. |
| ami/ml/post_processing/tests/test_base_schema.py | Tests config_schema enforcement and schema validation behavior. |
| ami/ml/post_processing/tests/test_admin_form.py | Tests base form to_config() and size threshold validation. |
| ami/ml/post_processing/tests/test_small_size_filter_admin.py | Tests admin confirmation flow + job creation/config payload behavior. |
| ami/ml/post_processing/admin/init.py | Creates admin module package. |
| ami/ml/post_processing/tests/init.py | Creates tests package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback on the repetitive run_/render pattern. The confirm -> render -> validate -> enqueue flow is now built by make_post_processing_action() in ami/ml/post_processing/admin/actions.py, so each task declares only what varies: its task class, knob form, and how a selected row maps to a Job (scope_resolver / project_resolver / name_resolver). Tasks whose row->Job mapping isn't one-Job-per-row (e.g. PR #1272's per-project event partitioning) pass a custom build_jobs callable; PR #999 likewise inherits the form/template instead of hand-rolling HTML. Config validation now has a single source of truth: the task's pydantic config_schema. The knob form no longer re-encodes the (0, 1) bound, and the FloatField min/max that contradicted the exclusive interval is gone. Schema errors raised while building Jobs are mapped back onto the form so the operator sees them inline on the confirmation page (non-field errors now render too). This also removes the near-dead per-row try/except in the old admin action. Adds test_action_factory.py (action registration, default builder, schema mapping, all-or-nothing, build_jobs override hook) and updates the form tests to reflect that range enforcement moved to the schema layer. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
ami/ml/post_processing/base.py (1)
30-35: ⚡ Quick winEnforce
config_schematype, not just presence.
__init_subclass__currently allows non-Pydantic values forconfig_schema; that fails later in__init__instead of at class definition time.Proposed tightening
def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) required_attrs = ["key", "name", "config_schema"] for attr in required_attrs: if not hasattr(cls, attr) or getattr(cls, attr) is None: raise TypeError(f"{cls.__name__} must define '{attr}' class attribute") + if not isinstance(cls.config_schema, type) or not issubclass( + cls.config_schema, pydantic.BaseModel + ): + raise TypeError( + f"{cls.__name__}.config_schema must be a subclass of pydantic.BaseModel" + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/ml/post_processing/base.py` around lines 30 - 35, The current __init_subclass__ only checks presence of "config_schema" but not its type; update __init_subclass__ to also validate that cls.config_schema is a class/type and a subclass of pydantic.BaseModel (i.e. import pydantic.BaseModel and use isinstance/issubclass checks), and raise a TypeError if it's missing or not a Pydantic model; keep the existing checks for "key" and "name" and reference the same attribute names (config_schema, key, name) and the __init_subclass__ method when making the change.ami/templates/admin/post_processing/_form_fieldset.html (1)
1-8: ⚡ Quick winRender
form.non_field_errorsbefore field rows (future-proofing the shared partial).
ami/templates/admin/post_processing/_form_fieldset.htmlrenders only per-fieldfield.errors; it never outputsform.non_field_errors. For the currentSmallSizeFilterActionForm, validation happens inclean_size_threshold()(field-level), so operators already see errors viafield.errors—but this partial will hide non-field errors if a form later addsclean()/add_error(None, ...)-style validation.Suggested patch
+{% if form.non_field_errors %} + {% for error in form.non_field_errors %}<p class="errornote">{{ error }}</p>{% endfor %} +{% endif %} {% for field in form %} <div class="form-row"> {{ field.label_tag }} {{ field }} {% if field.help_text %}<p class="help">{{ field.help_text }}</p>{% endif %} {% for error in field.errors %}<p class="errornote">{{ error }}</p>{% endfor %} </div> {% endfor %}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/templates/admin/post_processing/_form_fieldset.html` around lines 1 - 8, The partial currently only renders per-field errors; modify ami/templates/admin/post_processing/_form_fieldset.html to output form.non_field_errors (e.g., {{ form.non_field_errors }}) before the for loop so any form-level validation added via clean() or add_error(None, ...) (as might happen in SmallSizeFilterActionForm) will be visible to operators; ensure non-field errors render before the field rows for future-proofing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/ml/post_processing/admin/small_size_filter_form.py`:
- Around line 18-20: Replace the Unicode multiplication sign (`×`) in the help
string "Minimum bounding-box area as a fraction of the source image area (width
× height). Detections smaller than this are flagged as 'Not identifiable'.
Default 0.0008 ≈ 0.08% of frame area." with a plain ASCII "x" so the text reads
"(width x height)"; update that literal help text in small_size_filter_form.py
(the same string shown in the diff) to satisfy the Ruff RUF001 lint rule.
In `@ami/ml/post_processing/tests/test_small_size_filter_admin.py`:
- Around line 18-21: Replace the hardcoded password string used in
User.objects.create_superuser by avoiding a literal: call
User.objects.create_superuser(email="ssfadmin@example.com", password=None) (or
any non-literal), then immediately call self.superuser.set_unusable_password()
and self.superuser.save() to ensure no secret literal remains; update references
to self.superuser accordingly.
In `@docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md`:
- Around line 50-70: The fenced code block that lists the project tree starting
with "ami/ml/post_processing/" is missing a language tag; update the opening
triple-backtick to include a language identifier (e.g., ```text) so the block
becomes ```text and the closing backticks remain, ensuring markdownlint MD040 is
satisfied for the tree block in the document.
---
Nitpick comments:
In `@ami/ml/post_processing/base.py`:
- Around line 30-35: The current __init_subclass__ only checks presence of
"config_schema" but not its type; update __init_subclass__ to also validate that
cls.config_schema is a class/type and a subclass of pydantic.BaseModel (i.e.
import pydantic.BaseModel and use isinstance/issubclass checks), and raise a
TypeError if it's missing or not a Pydantic model; keep the existing checks for
"key" and "name" and reference the same attribute names (config_schema, key,
name) and the __init_subclass__ method when making the change.
In `@ami/templates/admin/post_processing/_form_fieldset.html`:
- Around line 1-8: The partial currently only renders per-field errors; modify
ami/templates/admin/post_processing/_form_fieldset.html to output
form.non_field_errors (e.g., {{ form.non_field_errors }}) before the for loop so
any form-level validation added via clean() or add_error(None, ...) (as might
happen in SmallSizeFilterActionForm) will be visible to operators; ensure
non-field errors render before the field rows for future-proofing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 98c48d31-30bc-478f-af01-8862514274eb
📒 Files selected for processing (13)
ami/main/admin.pyami/ml/post_processing/admin/__init__.pyami/ml/post_processing/admin/forms.pyami/ml/post_processing/admin/small_size_filter_form.pyami/ml/post_processing/base.pyami/ml/post_processing/small_size_filter.pyami/ml/post_processing/tests/__init__.pyami/ml/post_processing/tests/test_admin_form.pyami/ml/post_processing/tests/test_base_schema.pyami/ml/post_processing/tests/test_small_size_filter_admin.pyami/templates/admin/post_processing/_form_fieldset.htmlami/templates/admin/post_processing/confirmation.htmldocs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md
Addresses review feedback. When an operator uses Django admin's "select all across pages", the action receives the entire filtered table as its queryset, which would serialize every pk into hidden _selected_action inputs on the confirmation page (a very large POST body, potentially over request limits). This trigger is for explicit, bounded selections, so the across-pages case is now refused with a clear message instead of rendering an unbounded form. Also folds in two cosmetic review nits: replace the Unicode multiplication sign in the size-threshold help text with a plain "x", and add a language tag to the module-layout code fence in the design doc. Co-Authored-By: Claude <noreply@anthropic.com>
|
@coderabbitai review Note: the previous automated review covered commit |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== ami/ml/post_processing/admin/actions.py ==="
cat ami/ml/post_processing/admin/actions.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/admin/forms.py ==="
cat ami/ml/post_processing/admin/forms.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/admin/small_size_filter_form.py ==="
cat ami/ml/post_processing/admin/small_size_filter_form.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/base.py ==="
cat ami/ml/post_processing/base.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/small_size_filter.py ==="
cat ami/ml/post_processing/small_size_filter.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/main/admin.py (post_processing section) ==="
grep -n "small_size_filter\|post_processing\|SmallSize\|select_across\|BasePostProcessing\|run_small_size" ami/main/admin.py | head -80
echo ""
echo "=== ami/main/admin.py full relevant section ==="
grep -n "" ami/main/admin.py | grep -A5 -B5 "small_size_filter\|SmallSizeFilter\|post_processing\|select_across" | head -120
echo ""
echo "=== ami/templates/admin/post_processing/confirmation.html ==="
cat ami/templates/admin/post_processing/confirmation.html 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/templates/admin/post_processing/_form_fieldset.html ==="
cat ami/templates/admin/post_processing/_form_fieldset.html 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/tests/ tree ==="
find ami/ml/post_processing/tests -type f | sort
echo ""
echo "=== ami/ml/post_processing/tests/test_admin_form.py ==="
cat ami/ml/post_processing/tests/test_admin_form.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/tests/test_base_schema.py ==="
cat ami/ml/post_processing/tests/test_base_schema.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/tests/test_small_size_filter_admin.py ==="
cat ami/ml/post_processing/tests/test_small_size_filter_admin.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/__init__.py ==="
cat ami/ml/post_processing/__init__.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/admin/__init__.py ==="
cat ami/ml/post_processing/admin/__init__.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/ml/post_processing/tests/__init__.py ==="
cat ami/ml/post_processing/tests/__init__.py 2>/dev/null || echo "NOT FOUND"Length of output: 38389 🏁 Script executed: #!/bin/bash
echo "=== ami/ml/post_processing/tests/test_action_factory.py ==="
cat ami/ml/post_processing/tests/test_action_factory.py 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== ami/main/admin.py lines around run_small_size_filter (full context) ==="
sed -n '640,680p' ami/main/admin.py
echo ""
echo "=== Check for select_across in confirmation template ==="
grep -rn "select_across" ami/templates/ ami/ml/ ami/main/admin.py
echo ""
echo "=== Check if select_across POST value is preserved across re-renders ==="
grep -n "select_across" ami/ml/post_processing/admin/actions.py
echo ""
echo "=== Check Job.enqueue() signature ==="
ast-grep --pattern 'def enqueue($$$)'
echo ""
echo "=== Check if config_schema attribute check handles abstract intermediary correctly ==="
grep -n "__init_subclass__\|abc.ABC\|ABCMeta\|abstractmethod" ami/ml/post_processing/base.py
echo ""
echo "=== Check how SmallSizeFilterConfig.dict() serializes (pydantic v1 vs v2) ==="
grep -rn "pydantic" ami/ml/post_processing/ --include="*.py" | grep -v "test_\|\.pyc"
echo ""
echo "=== Check for any pydantic version pinning ==="
grep -rn "pydantic" requirements*.txt setup.cfg pyproject.toml 2>/dev/null | head -20
echo ""
echo "=== Check if confirm value is '1' or truthy in the action ==="
grep -n "confirm" ami/ml/post_processing/admin/actions.py
echo ""
echo "=== Check the select_across guard return value ==="
grep -n -A3 "select_across" ami/ml/post_processing/admin/actions.py
echo ""
echo "=== Check queryset.count() call in render_confirmation - N+1 risk ==="
grep -n "count\|values_list" ami/ml/post_processing/admin/actions.py
echo ""
echo "=== Check SmallSizeFilterTask name attribute (has space vs underscore for key) ==="
grep -n "^ key\|^ name\|^ config_schema" ami/ml/post_processing/small_size_filter.py
echo ""
echo "=== Check if test_base_schema tests hit the DB (Algorithm.objects.get_or_create) ==="
grep -n "get_or_create\|Algorithm" ami/ml/post_processing/base.py
echo ""
echo "=== Check if there is any transaction.atomic around job creation ==="
grep -n "atomic\|transaction" ami/ml/post_processing/admin/actions.py
echo ""
echo "=== Check what happens when build_jobs raises ConfigValidationErrors but form has errors too ==="
grep -n "add_error\|non_field_errors\|form.errors" ami/ml/post_processing/admin/actions.py
echo ""
echo "=== Check if 'name = Small size filter' has lowercase 's' which differs from action label 'Small Size Filter' ==="
grep -n "name\s*=\s*['\"]" ami/ml/post_processing/small_size_filter.py ami/ml/post_processing/admin/actions.pyLength of output: 15296
Here is the review against head ( Review of PR
|
Follow-up to the head-commit CodeRabbit re-review: - Wrap the Job-creation loop in default_build_jobs in transaction.atomic(). Admin requests are already atomic (ATOMIC_REQUESTS=True), but this helper can also be called outside a request (e.g. a management command), so the explicit block keeps creation all-or-nothing there too. Job.enqueue() uses transaction.on_commit, so the (millisecond) block only guards row creation; the long-running task still runs asynchronously in the worker after commit. - Guard BasePostProcessingTask.__init_subclass__ with inspect.isabstract so an abstract intermediary task class isn't forced to declare key/name/config_schema. - Resolve the selection once in render_confirmation (len of the materialized pk list) instead of a separate .count() query. - Omit scope_resolver from the build_jobs kwargs when it isn't set, so a custom build_jobs supplied without one never receives a None it might call. - Derive the admin action's labels (title, submit, dropdown description) from task.name instead of hardcoding title-case strings, so the operator-facing label and the Job name stay consistent and there are fewer per-task strings. - Make the build_jobs override-hook test actually invoke the action and assert the custom builder ran (and that scope_resolver was not forwarded as None). 24/24 post_processing tests green. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Thanks — the re-review against head caught real issues. Addressed in
One correction: the "pydantic v1, no version pin" finding is a false negative — it's pinned at 24/24 |
…tract guard End-of-PR test audit (separate from the TDD scaffolding): - Remove test_default_threshold_applies_when_form_uses_initial: size_threshold is a required field and can't be omitted, so it just re-POSTed the default value — identical mechanism to the existing valid-post test. The default is already covered by the form-initial and schema-default unit tests. - Rewrite the mislabeled "all-or-nothing" test into a real one. The old version gave every row the same out-of-range threshold, so both failed at validation and the creation loop was never reached. The new test passes validation, forces a failure while creating the second Job, and asserts the first is rolled back — actually exercising the transaction.atomic() wrap. - Add coverage for the inspect.isabstract guard in __init_subclass__: an abstract intermediary may defer key/name/config_schema to its concrete subclasses. 24 post_processing tests green. Co-Authored-By: Claude <noreply@anthropic.com>
Make post-processing filters easier to develop against and observe. Per-occurrence trigger (the dev/spot path): SmallSizeFilterConfig now accepts either source_image_collection_id or occurrence_id (discriminated scope, exactly-one enforced by a pydantic root_validator), and the task resolves its detection queryset from whichever scope is set. OccurrenceAdmin gains a run_small_size_filter action via the same make_post_processing_action factory, so an operator can run a filter on a single selected occurrence without standing up a whole capture set. This discriminated-scope shape is the pattern future per-occurrence / per-event tasks (#999, #1272) copy. Job stage metrics (run observability): BasePostProcessingTask.report_stage_metrics writes named counters onto the job's post_processing stage (falling back to a log line when run without a job). The small size filter emits detections_checked, detections_flagged and occurrences_updated at each flush and on completion, so the Jobs admin page shows what a run examined and changed. Tests: schema scope cases (occurrence-only valid, both-scopes raise, no-scope raises); the OccurrenceAdmin action enqueues a job scoped to occurrence_id; run-time occurrence-scope isolation (sibling occurrences untouched); and stage-metric reporting on a real job. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/ml/post_processing/small_size_filter.py`:
- Around line 149-153: The current batch flush loop uses the batch-local
occcurrences_to_update set and then clears it, which allows the same occurrence
to be counted multiple times across batches; introduce a persistent set (e.g.,
updated_occurrence_ids) at the scope of the processing function/class to record
IDs of occurrences already counted, then when iterating occcurrences_to_update
only call occ.save(...) and increment modified_occurrences / occurrences_updated
for occurrences whose id is not in updated_occurrence_ids, add their ids to
updated_occurrence_ids, and finally clear the batch set; apply the same change
to the other flush block that uses occcurrences_to_update so counts are
deduplicated across all batches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b26c680-56a7-47e7-b05e-ff240b54866c
📒 Files selected for processing (7)
ami/main/admin.pyami/ml/post_processing/base.pyami/ml/post_processing/small_size_filter.pyami/ml/post_processing/tests/test_action_factory.pyami/ml/post_processing/tests/test_base_schema.pyami/ml/post_processing/tests/test_small_size_filter_admin.pyami/ml/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
- ami/ml/post_processing/base.py
- ami/ml/post_processing/tests/test_action_factory.py
The metric incremented from a batch-local set that is cleared each flush, so an occurrence whose detections span more than one batch was counted once per batch. Track occurrence ids in a persistent set and report its size instead. Detections never recur across batches, so detections_flagged was already correct. Addresses CodeRabbit review on beacb3f. Co-Authored-By: Claude <noreply@anthropic.com>
…admin-scaffolding
Replace per-test setup_test_project calls with minimal Project/Collection/ Occurrence rows and class-level setUpTestData in the admin-action and factory tests; the admin flow only reads FKs, so the full fixture (storage source, deployment, processing service per call) was wasted cost. Convert the pure form tests to SimpleTestCase (no DB) and drop one redundant valid-path form test already covered end to end by the admin and factory flows. TestPostProcessingTasks now builds its images/events/collection once per class instead of once per test. Module wall-clock roughly halves locally (8.8s -> 4.5s warm), and the double setup_test_project in the multi-collection admin test that intermittently hit a statement timeout in CI is gone. Co-Authored-By: Claude <noreply@anthropic.com>
…admin-scaffolding
… select_across refusal
The post-processing confirmation form rendered each field's validation errors
with the page-level `errornote` banner style. Switch the form-fieldset partial
to Django's native `{{ field.errors }}` and `{{ form.non_field_errors }}` so
errors render inline as `errorlist`, matching Django admin conventions. The
invalid-threshold test now asserts the inline `errorlist` class.
Also reframe the "select all across pages" comment: the reason to refuse it is
that the action would apply to the entire filtered table rather than the rows
the operator explicitly selected. The oversized POST body is a secondary
symptom, not the primary concern.
Co-Authored-By: Claude <noreply@anthropic.com>
…admin-scaffolding
The small size filter reported `occurrences_updated` as the number of occurrences it re-saved, but re-saving recomputes the determination in place. An occurrence pinned to a human identification keeps its taxon when its detection is flagged "Not identifiable", so it was counted even though nothing about it actually changed. Capture each occurrence's determination before the save and count it only when the determination changes after. Add a regression test pairing an occurrence that carries a human identification (determination unchanged, not counted) with an un-identified one (determination flips, counted). Co-Authored-By: Claude <noreply@anthropic.com>
…b(s) After an operator runs a post-processing task from the admin, the success message now links each created Job to its admin change page instead of just printing the raw pks. From there the operator can follow the job's progress and read any failure reason, which is otherwise only visible in the worker log. The link points at the admin change page because that page is always reachable from this admin action; the public UI host is not reliably known in this context. The message is built with format_html so the links render as anchors. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Two small fixes landed on top of the main-merge (
The framework was also exercised end-to-end on production-shaped data on a dev box: the admin → Job → Celery path drives Ready for a final look — blocked only on an approve. |
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
The occurrence admin changelist is the entry point for the per-occurrence post-processing trigger, but on a large table (~1.3M occurrences) it took ~15s to load a single page. Two causes, both fixed here: - get_queryset counted detections with a JOIN + GROUP BY. A grouped count must aggregate the whole occurrence x detection join before the changelist's ORDER BY ... LIMIT can take a page, so it scanned every row just to show 25. Replaced with a correlated subquery (Coalesced to 0 for occurrences with no detections, matching the old JOIN count), which runs only for the rows on the page. Measured 14.9s -> 0.04s for one page on the large table. - The list ordered by -created_at, which has no index, forcing a full sort of the table to find the newest page. Switched to -id, the indexed primary key, which increases with insertion time so newest-first is preserved. Also let an all-digit search term act as an exact occurrence-id lookup, so an operator can jump straight to an occurrence by id; non-numeric terms still use the determination-name search. Co-Authored-By: Claude <noreply@anthropic.com>
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
…om occurrences The detection admin changelist counted classifications with Count() + GROUP BY and ordered by an unindexed -created_at. On a large table (~1.4M detections, ~2.2M classifications) the grouped aggregate runs over the whole detection x classification join before the page LIMIT applies — slow enough to exhaust work_mem and error out. Replaced with a correlated subquery (Coalesced to 0) and -id ordering, mirroring the occurrence changelist fix. Also set show_change_link on the detection inline of the occurrence page, so an operator can open a detection's change page — where the classifications inline shows which algorithms were applied, including post-processing — without searching for its id. Co-Authored-By: Claude <noreply@anthropic.com>
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
Add a search box to the detection changelist: an all-digit term is an exact id lookup (jump straight to a detection by id), other terms search the source image path. Mirrors the occurrence changelist search. Co-Authored-By: Claude <noreply@anthropic.com>
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
…ation More admin niceties for reviewing post-processing results on a large database: - Classification change page: FK fields (taxon, detection, algorithm, category_map, applied_to) render as autocomplete widgets instead of <select>s preloaded with every row, which made the page unusable. - Classification changelist: count the scores/logits arrays in SQL (cardinality) and defer the arrays, so the list no longer transfers thousands of floats per row; order by -id; add an id / taxon-name search box. - Link the detection page's classification inline rows to their change page. - New OccurrenceAdmin action "Recompute determination": editing classifications by hand doesn't refresh the occurrence determination (only Occurrence and Identification saves do), so this re-derives it for the selected occurrences. Co-Authored-By: Claude <noreply@anthropic.com>
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
… count, guard huge ids Three review follow-ups on the admin niceties added for reviewing post-processing: - Extract the duplicated digit-term id search from the Occurrence, Detection, and Classification admins into a shared IdSearchAdminMixin. The three copies were identical; the mixin is now the single place that behavior lives. - Guard an out-of-range numeric term. A digit string larger than the bigint primary key raised a database DataError (500) before; it now returns no results. - Set show_full_result_count = False on the three large-table admins, so the changelist no longer runs a second, unfiltered COUNT(*) over the whole table for its footer total -- as expensive as the page query itself on millions of rows. Tests: add the out-of-range id case. The shared mixin is now tested once in OccurrenceAdminChangelistTest, so the redundant per-admin copy on the Detection admin is removed (its docstring points to the canonical test). Co-Authored-By: Claude <noreply@anthropic.com>
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
…dmin framework Rework of #999 onto the #1289 post-processing framework. The branch was cut from an old main with its own hand-rolled admin action; ClassMaskingTask and RankRollupTask now subclass BasePostProcessingTask with pydantic config schemas and are triggered through make_post_processing_action (collection scope on SourceImageCollection, single-occurrence scope on Occurrence for class masking). Correctness fixes from the review threads: - Class masking selects the top class from an -inf-masked softmax, so a class excluded by the taxa list can never win even when it had the highest logit; raises when the taxa list excludes every class in the category map. Stored logits stay raw (JSON-safe) and the mask is captured in scores (excluded -> 0). - The masked-output Algorithm is one per (source algorithm, taxa list) and its category map is persisted (previously set in memory only, so masked classifications referenced a null map). - applied_to is populated on new masked classifications (the provenance the API exposes was left blank). - Rank rollup preloads category-map labels in two queries and select_relates the per-row relations instead of dereferencing category_map per classification. Surfaces provenance in the API: applied_to is added to the Classification serializers, and applied_to__algorithm is prefetched in the occurrence list/detail prefetch and the classification viewset to avoid an N+1 on render. Tests: pydantic config validation, admin trigger for both scopes, the masking maths (including the excluded-class guarantee and the all-excluded error), ClassMaskingTask.run() end to end for both scopes, and rank rollup. 20 new tests; full post_processing suite and occurrence query-count tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
Summary
Establishes the shared pattern for running post-processing filters from the Django admin. Post-processing filters adjust or clean up data after ML processing; the first one (a small-size detection filter) ships here as the worked example. Until now there was no consistent way to find, configure, or run them — you edited code or called them by hand.
With this change an operator can, from the Django admin:
detections_checked,detections_flagged,occurrences_updated), so a run's effect is visible without reading the logThis is a scaffold/precursor PR: it sets the approach every future post-processing method plugs into (class masking #999, tracking #1272, rank rollup, …), rather than shipping one large feature on its own. The size filter is included to prove the pattern end to end. The original scope was pure scaffolding plus a refactor of the existing small-size-filter trigger to expose its already-coded
size_thresholdknob; during review the per-occurrence trigger and stage metrics were added on top as small, migration-free extensions of the same pattern.List of Changes
BasePostProcessingTask; tasks registered inami/ml/post_processing/registry.pyconfig_schemacontract, validated at task__init__(extra="forbid")BasePostProcessingActionFormwithto_config()→ dict contract + parameterized confirmation template (admin/post_processing/confirmation.html+_form_fieldset.htmlpartial)to_config()result feedsJob.params['config']SmallSizeFilterTask:SmallSizeFilterConfigschema +SmallSizeFilterActionForm(0 < x < 1validation); same0.0008default preservedsource_image_collection_idXORoccurrence_id, enforced by a root validator) +OccurrenceAdminaction using the same factory withscope_resolver=lambda occ: {"occurrence_id": occ.pk};SmallSizeFilterTask._scoped_detections()resolves the detection queryset from whichever scope is setBasePostProcessingTask.report_stage_metrics()writes named stage params viaJobProgress.add_or_update_stage_param; falls back to a log line when running joblessami/ml/post_processing/tests/andami/ml/tests.py::TestPostProcessingTasks, plus the admin changelist tests below, kept cheap for CI (see Test plan)Admin niceties for reviewing post-processing
Purpose: make the Django admin a practical place to review and test post-processing on a production-sized database (millions of detections and classifications). Together these support the full review loop an operator runs when checking a post-processing method: find an occurrence, run a post-processing filter on it, click through to the detections and classifications it produced to inspect exactly what changed (including the
applied_toprovenance chain), optionally edit a classification by hand to probe how it moves the result, then recompute the occurrence's determination to see the effect — all without leaving the admin or looking up ids by hand.Before these changes that loop was impractical: the occurrence/detection/classification changelists timed out (or ran the database out of memory) on real data, the classification change page stalled rendering preloaded dropdowns, there were no links between an occurrence and its detections/classifications, and nothing recomputed an occurrence's determination after a manual classification edit. Each change below removes one of those blockers. They share no code with the framework above and are independent of it.
occurrences_updatedcounts only occurrences whose determination actually changedsmall_size_filter.py)format_htmllink to each job's admin change page, so progress and any failure reason are one click awayCount(...)+GROUP BY, and order by the indexed-idinstead of an unindexed-created_atget_search_resultstreats an all-digit term as an exact id lookup<select>s preloaded with every rowscores/logitscounted in SQL (cardinality) with the arrays deferredshow_change_linkon the detection and classification inlinesOccurrenceAdminaction — editing classifications doesn't trigger the recompute that Occurrence/Identification saves doAdding a new post-processing filter
Three pieces, then one wiring line per admin entry point. Using the upcoming rank rollup and class masking (#999) filters as examples:
BasePostProcessingTaskinami/ml/post_processing/<your_filter>.py; declarekey,name, and a pydanticconfig_schema; implementrun(). Callself.update_progress(...)andself.report_stage_metrics({...})as it works. Add the class to thePOSTPROCESSING_TASKSdict inami/ml/post_processing/registry.py.RankRollupConfig(source_image_collection_id, target_rank, min_score)orClassMaskingConfig(source_image_collection_id | occurrence_id, taxa_list_id). Scope ids follow the discriminated pattern shipped here: declare each as optional and enforce exactly-one with a root validator (copySmallSizeFilterConfig).BasePostProcessingActionForminami/ml/post_processing/admin/<your_filter>_form.py, declaring only the tunable fields — scope ids come from the selected admin rows, not the form. Class masking's taxa-list picker becomes aModelChoiceFieldhere. Value bounds belong in the schema, not the form.Then wire it into each admin it should be triggerable from:
scope_resolverstates what each selected row contributes to the config; the form supplies the tunables. An occurrence-scoped filter (class masking) wires the same factory ontoOccurrenceAdminwithscope_resolver=lambda occ: {"occurrence_id": occ.pk}. A task whose row→job mapping isn't 1:1 (tracking partitions events across projects, #1272) passes a custombuild_jobscallable instead ofscope_resolver. The confirmation page, validation, job creation, and metrics display all come from the shared scaffolding — no per-filter admin code beyond the lines above.Out of scope (deferred to follow-ups)
classification.job), and a flexible JSON metadata field on Classification for pipeline-specific extras and per-classification transformation notes (needs its own ticket)Rebase impact for downstream PRs
admin_forms.py→admin/tracking_form.py. Form gainsto_config(). AddsTrackingConfigschema. Per-project partition loop becomes a custombuild_jobs.<select>becomesClassMaskingActionForm(BasePostProcessingActionForm)withModelChoiceFields. AddsClassMaskingConfig+RankRollupConfigschemas. The occurrence-scoped discriminated config shipped here is the template to copy. Will message author once this lands.Smoke test (manual, against live stack)
End-to-end validated against project 18 capture set 175 (10 images):
admin/post_processing/confirmation.htmlrendered) ✓size_thresholdfield with default0.0008and help text ✓Job.paramspayload validated typed:{'task': 'small_size_filter', 'config': {'size_threshold': 0.0008, 'source_image_collection_id': 175}}✓The occurrence-scope trigger and stage metrics are covered by the automated tests below, and the branch is deployed on a shared dev box for hands-on review.
Test plan
test_base_schema.py— subclass without schema raises, abstract intermediaries exempt, valid config buildsBaseModelinstance, defaults apply, scope rules (occurrence accepted / both raise / neither raises), out-of-range threshold raises, unknown keys rejected (extra="forbid")test_admin_form.py—to_config()round-trip, type coercion at the form layer, range enforcement delegated to the schematest_action_factory.py— action naming and labels, one Job per row with task key/scope/project FK, schema errors mapped to form fields, mid-creation failure rolls back all jobs,build_jobsoverride hooktest_small_size_filter_admin.py— intermediate page renders (no Job created),select_acrossrefused, invalid threshold re-renders with error, valid POST creates Job with config, occurrence-scoped POST setsoccurrence_id(and not the collection id), multi-collection POST creates per-collection jobs with correct project FKsami/ml/tests.py::TestPostProcessingTasks— filter flags small detections and updates determinations, occurrence scope leaves sibling occurrences untouched, stage metrics land on the job's progress paramsA fixture-cost pass keeps CI time in check: the admin/factory tests build bare Project/Collection rows once per class (
setUpTestData) instead of calling the fullsetup_test_projectfixture per test, and the pure form tests run without the DB (SimpleTestCase). The module's 29 tests run in ~5 s locally; this also removed a double-fixture setup that intermittently hit a statement timeout in CI.Design doc
docs/claude/planning/2026-05-01-post-processing-admin-scaffolding-design.md(committed in this branch).Notes
pydantic<2.0).__fields__andvalidatoras deprecated in some envs — these come from local v2 stubs; container is v1 and the syntax is correct.Summary by CodeRabbit
New Features
Tests
Documentation