Add COCO json export type#1270
Conversation
✅ Deploy Preview for antenna-ssec ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for antenna-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |

Summary
This adds a third occurrence export format, COCO JSON, alongside the existing simple CSV and API JSON exports. A project member exporting their occurrences can now download a single JSON file shaped like a COCO object-detection dataset: an
imageslist (the source captures), anannotationslist (one bounding box per occurrence, taken from that occurrence's best detection), and acategorieslist (the taxa those occurrences were identified as). The goal is to let Antenna occurrence data feed ML tooling that already speaks COCO without a manual reshaping step.The COCO rows reuse the same occurrence fields as the simple CSV export and add the source-image metadata a detection dataset needs — image id, capture path, width/height, and capture timestamp. Each annotation carries both the determination (as the COCO
category_id) and the supporting model-prediction metadata (determination score, verification status, best machine prediction, etc.) as extra fields.This is a draft. The file builds and the structure round-trips through a test, but several format and scope decisions are still open and at least one merge blocker remains — see "Open questions before merge" below.
List of Changes
CocoJSONExporterregistered asoccurrences_coco_jsoninami/exports/registry.py; added toSERVER_EXPORT_TYPESand the label map inui/src/data-services/models/export.tsinfo/images/annotations/categories)build_coco_dict_from_occurrence_rows()andcorner_bbox_to_coco_bbox()inami/exports/format_types.py; bounding boxes are converted from[x1, y1, x2, y2]corners to COCO[x, y, width, height]plus areaOccurrenceCocoTabularSerializerextends the CSV serializer with source-image fields;CocoJSONExporter.get_queryset()parents_json(_serialize_parents_json)ui/src/utils/language.tsOccurrence.objects.with_best_detection()(ami/main/models.py)CaptureTaxonSerializer.parentsnow reads the cachedparents_jsonsource="parents_json"onmain; affects the live occurrence/capture/detection determination responses, not just the exportRelated Issues
Part of the occurrence data-export work. No tracking issue is linked yet — add
Closes #Nif one exists.Detailed Description
The export is occurrence-rooted. The queryset is
Occurrence.objects.valid().filter(project=...)with the same annotation chain the CSV export uses, plus the new source-image columns. For every valid occurrence with a determination and a usable bounding box:annotation(its bbox, on its source capture);annotation.idis the occurrence id and thecategory_idis the determination's taxon id;imageentry (deduplicated by source-image id), withcoco_urlset to the public capture URL;category, enriched withrank,parent_id, and the parent lineage fromparents_json.category_idis therefore the determination (human or automated), not a fixed label set; the model-prediction fields on each annotation are provided alongside it for reference.How to Test the Changes
ami/exports/tests.py::DataExportTest::test_coco_json_export_structurechecks that the payload has the four top-level keys, that every annotation references an existing image and category, that bounding boxes have positive width and height, and that the annotation count equals the number of collection occurrences that have a determination.DataExportwithformat="occurrences_coco_json"for a project, run it, and load the resulting file withpycocotools.COCO(path)to confirm a standard COCO loader accepts it. This loader round-trip is not yet automated (see below).Open questions before merge
These are draft-stage decisions and gaps, framed as things to confirm rather than settled choices:
DataExport.formathaschoices=get_export_choices(), which is derived from the registry. Registeringoccurrences_coco_jsonchanges the field's choices, somakemigrations --check --dry-run(enforced in CI) will fail without a new state-only migration. This is the same gotcha asJob.job_type_key. A migration needs to be added before this can merge.pycocotools(or the intended downstream tool).licensesis absent andcategoriesomit the conventionalsupercategory; confirm the consumer tolerates that.assertin production code.build_coco_dict_from_occurrence_rows()usesassertfor the determination-name consistency check. Assertions are stripped underpython -O; this should be a raised exception or a logged warning per the project convention.file_nameisos.path.basename(capture_path). Dropping the directory can collide across deployments/events;coco_urlkeeps the full URL, so confirm consumers key on the URL rather thanfile_name.coco_urlcomes from the public capture URL, which isNonefor private-bucket projects; those images would export with an emptyfile_nameand null URL. Decide whether to skip or flag them.parentsfix (change 7). It is a genuine live-API bugfix but is unrelated to COCO and ships without a test. Consider splitting it out or adding a regression test and calling it out, so it does not ride silently in an export PR.Deployment Notes
No data migration. A state-only migration for the
DataExport.formatchoices change is still required (see open question 1). No new model fields are added otherwise; the rest of the change is queryset annotations and serializers.Checklist