Skip to content

Add COCO json export type#1270

Draft
vanessavmac wants to merge 3 commits into
mainfrom
feat/coco-export
Draft

Add COCO json export type#1270
vanessavmac wants to merge 3 commits into
mainfrom
feat/coco-export

Conversation

@vanessavmac

@vanessavmac vanessavmac commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

This adds a third occurrence export format, COCO JSON, alongside the existing simple CSV and API JSON exports. A project member exporting their occurrences can now download a single JSON file shaped like a COCO object-detection dataset: an images list (the source captures), an annotations list (one bounding box per occurrence, taken from that occurrence's best detection), and a categories list (the taxa those occurrences were identified as). The goal is to let Antenna occurrence data feed ML tooling that already speaks COCO without a manual reshaping step.

The COCO rows reuse the same occurrence fields as the simple CSV export and add the source-image metadata a detection dataset needs — image id, capture path, width/height, and capture timestamp. Each annotation carries both the determination (as the COCO category_id) and the supporting model-prediction metadata (determination score, verification status, best machine prediction, etc.) as extra fields.

This is a draft. The file builds and the structure round-trips through a test, but several format and scope decisions are still open and at least one merge blocker remains — see "Open questions before merge" below.

List of Changes

# What changed (user-facing) How (implementation)
1 New "Occurrences (COCO JSON)" option in the export type dropdown CocoJSONExporter registered as occurrences_coco_json in ami/exports/registry.py; added to SERVER_EXPORT_TYPES and the label map in ui/src/data-services/models/export.ts
2 The exported file is a COCO-style dataset (info / images / annotations / categories) build_coco_dict_from_occurrence_rows() and corner_bbox_to_coco_bbox() in ami/exports/format_types.py; bounding boxes are converted from [x1, y1, x2, y2] corners to COCO [x, y, width, height] plus area
3 Each occurrence's best detection becomes one annotation, attached to its source capture OccurrenceCocoTabularSerializer extends the CSV serializer with source-image fields; CocoJSONExporter.get_queryset()
4 Categories carry taxon rank and parent lineage the determination and predicted taxa are fetched once by id and serialized from parents_json (_serialize_parents_json)
5 Export help text now describes three formats ui/src/utils/language.ts
6 The best-detection queryset also exposes source-image id, capture timestamp, and capture width/height four new subqueries in Occurrence.objects.with_best_detection() (ami/main/models.py)
7 Drive-by fix: CaptureTaxonSerializer.parents now reads the cached parents_json aligns it with the sibling taxon serializers, which already use source="parents_json" on main; affects the live occurrence/capture/detection determination responses, not just the export

Related Issues

Part of the occurrence data-export work. No tracking issue is linked yet — add Closes #N if one exists.

Detailed Description

The export is occurrence-rooted. The queryset is Occurrence.objects.valid().filter(project=...) with the same annotation chain the CSV export uses, plus the new source-image columns. For every valid occurrence with a determination and a usable bounding box:

  • the occurrence's best detection contributes one annotation (its bbox, on its source capture);
  • the annotation.id is the occurrence id and the category_id is the determination's taxon id;
  • the source capture becomes one image entry (deduplicated by source-image id), with coco_url set to the public capture URL;
  • the determination taxon (and the best machine-prediction taxon, when present) becomes a category, enriched with rank, parent_id, and the parent lineage from parents_json.

category_id is therefore the determination (human or automated), not a fixed label set; the model-prediction fields on each annotation are provided alongside it for reference.

How to Test the Changes

  • Automated: ami/exports/tests.py::DataExportTest::test_coco_json_export_structure checks that the payload has the four top-level keys, that every annotation references an existing image and category, that bounding boxes have positive width and height, and that the annotation count equals the number of collection occurrences that have a determination.
  • Manual: create a DataExport with format="occurrences_coco_json" for a project, run it, and load the resulting file with pycocotools.COCO(path) to confirm a standard COCO loader accepts it. This loader round-trip is not yet automated (see below).

Open questions before merge

These are draft-stage decisions and gaps, framed as things to confirm rather than settled choices:

  1. Missing migration (blocker). DataExport.format has choices=get_export_choices(), which is derived from the registry. Registering occurrences_coco_json changes the field's choices, so makemigrations --check --dry-run (enforced in CI) will fail without a new state-only migration. This is the same gotcha as Job.job_type_key. A migration needs to be added before this can merge.
  2. Granularity: occurrence best-detection vs. all detections. Each occurrence contributes exactly one annotation (its single best detection), even though an occurrence can span many detections across many captures. An insect seen in 100 frames yields one annotation on one image. That is a reasonable "representative crop per occurrence" dataset, but it is not the usual "every box on every image" detection-training set. Worth an explicit decision on which the consumer needs.
  3. Validate against a real COCO consumer. The test checks dict structure, not COCO-spec validity. Before merge, the output should load through pycocotools (or the intended downstream tool). licenses is absent and categories omit the conventional supercategory; confirm the consumer tolerates that.
  4. assert in production code. build_coco_dict_from_occurrence_rows() uses assert for the determination-name consistency check. Assertions are stripped under python -O; this should be a raised exception or a logged warning per the project convention.
  5. file_name is os.path.basename(capture_path). Dropping the directory can collide across deployments/events; coco_url keeps the full URL, so confirm consumers key on the URL rather than file_name.
  6. Private buckets. coco_url comes from the public capture URL, which is None for private-bucket projects; those images would export with an empty file_name and null URL. Decide whether to skip or flag them.
  7. Scope of the parents fix (change 7). It is a genuine live-API bugfix but is unrelated to COCO and ships without a test. Consider splitting it out or adding a regression test and calling it out, so it does not ride silently in an export PR.

Deployment Notes

No data migration. A state-only migration for the DataExport.format choices change is still required (see open question 1). No new model fields are added otherwise; the rest of the change is queryset annotations and serializers.

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

@netlify

netlify Bot commented Apr 28, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-ssec ready!

Name Link
🔨 Latest commit 03e5367
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a3dd28f31095000088268af
😎 Deploy Preview https://deploy-preview-1270--antenna-ssec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented Apr 28, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit 03e5367
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a3dd28f729d120008f02629
😎 Deploy Preview https://deploy-preview-1270--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 57 (🔴 down 8 from production)
Accessibility: 81 (🔴 down 8 from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1aa52257-d5f8-451f-bb29-4191ce60b03f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/coco-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants