Add common media file definitions for BEP044/BEP047 by yarikoptic · Pull Request #2367 · bids-standard/bids-specification

yarikoptic · 2026-03-19T00:13:23Z

Summary

Shortcuts:

Rendered appendix section "Media files".

Two BEPs independently define audio/video media file support with significant overlap:

BEP044 (Stimuli, [ENH] BEP044 - Stim-BIDS #2022 by @neuromechanist @bids-standard/bep044 ): media files as experimental stimuli in /stimuli/
BEP047 (Behavioral A/V, [ENH] BEP047 - Add audio/video recordings to behavioral experiments #2231 by @bendichter ): media files as behavioral recordings in beh/

Both need overlapping suffixes (audio, video), file extensions (.mp4, .wav, …), and similar technical metadata. Rather than define these independently in each BEP — risking inconsistencies and merge conflicts — this PR extracts the common foundation that both can build upon.

Following my own recommendation of [FOUNDATIONAL] Facilitate small atomic enhancements #371 I am proposing to agree on media files independently of the rest of the naming and placement in those BEPs.

For those who want "anecdotal" argumentation, here is a loose translation of one from russian

The French and British were planning the Channel Tunnel and looking for contractors. The Americans proposed digging from both sides, promising to meet in the middle with a maximum 15-meter margin of error. Time: two years. The Japanese agreed to the same plan, but guaranteed a 5-meter accuracy within one year. Then a Russian contractor walks in and says, “We’ll dig from both sides. Two weeks. No guarantees, but in the worst-case scenario, you’ll end up with two tunnels...”

kudos to @vmdocua for reminding of this one

What this PR adds

Component	Details
Suffixes	`audio`, `video`, `audiovideo`, `image`
Extensions	`.wav`, `.mp3`, `.aac`, `.ogg`, `.mp4`, `.avi`, `.mkv`, `.webm`, `.svg`, `.webp`, `.tiff`
More to add?	`.mjpeg` for snapshots from videos for training data for pose estimation? (@talmo? #2057)
Metadata fields	`Duration`, `FrameRate`, `Width`, `Height`, `AudioChannelCount`, `AudioSampleRate`, `VideoCodec`, `AudioCodec`, `VideoCodecRFC6381`, `AudioCodecRFC6381`
Sidecar rules	`rules/sidecars/media.yaml` — suffix-based rules that auto-apply to any datatype using these suffixes
Appendix	`appendices/media-files.md` — supported formats, codec identification (FFmpeg + RFC 6381), privacy considerations, example JSON

Design decisions

Suffix-only selectors in sidecar rules (no datatype constraint), so they automatically apply to both stimuli and beh datatypes without duplication
FFmpeg codec names as RECOMMENDED convention — de facto standard in scientific computing, auto-extractable via ffprobe
RFC 6381 codec strings as OPTIONAL — for web/broadcast interoperability, provided as separate fields since the mapping from FFmpeg names is one-to-many (e.g., h264 → multiple profile/level strings)
Descriptions are context-neutral — not tied to "behavioral" or "stimulus" use cases

What each BEP would then add on top

BEP044: file rules under stimuli datatype, provenance metadata (license, copyright, URL), stimulus-specific entities
BEP047: file rules under beh datatype, device metadata, behavioral entities (task, recording, split)

although we could even shift some , like provenance into this common one, WDYT?

And both would get the common media.yaml sidecar rules for free.

Relation to existing PRs

This branch is based on master and is intentionally independent of both BEP PRs.
I can furnish PRs for that after we agree to agree on this to be a reasonable (even if not final) common ground!
We can even refine this further until satisfied and then first BEP to be accepted would "drag" this PR in as well.

Alternatively we could keep those PR separate of this until we finalize it really to simplify review of both BEPs by separating "what are media files in BIDS" from "how does datatype X use them."

CC: @bids-standard/bep044 @ree-gupta @neuromechanist @Remi-Gau @effigies @talmo — feedback welcome from both BEP teams and maintainers.

Test plan

All YAML files parse correctly
Schema tests pass (tools/schemacode pytest)
Pre-commit hooks pass
mkdocs serve renders appendix correctly
verify correspondence between ffmpeg codecs and RFC6381
consider mjpeg
consider more metadata (copyright/license) to move here
Review by BEP044 and BEP047 teams for completeness of shared definitions

🤖 Generated with Claude Code and "manual touches"

…decar rules) Introduce shared media file infrastructure for BEP044 (stimuli) and BEP047 (behavioral A/V). Both BEPs need overlapping audio/video/image support, so this extracts the common foundation: - Suffixes: audio, video, audiovideo, image - Extensions: .wav, .mp3, .aac, .ogg, .mp4, .avi, .mkv, .webm, .svg, .webp, .tiff - Metadata: Duration, FrameRate, Width, Height, AudioChannelCount, AudioSampleRate, VideoCodec, AudioCodec, VideoCodecRFC6381, AudioCodecRFC6381 - Sidecar rules (media.yaml): suffix-based rules that auto-apply to any datatype - Appendix (media-files.md): formats, codec identification, privacy, examples Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add spaces between pipes and dashes in all separator rows (e.g., `| --- |` instead of `|---|`) to satisfy the remark-lint table-cell-padding rule. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

effigies

I approve in principle. Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.

yarikoptic · 2026-03-19T19:56:42Z

Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.

I tried to search up what you might have in mind here but failed, could you please elaborate?

effigies · 2026-03-19T21:10:48Z

The main ones that come to mind:

[ENH] Allow MED format for iEEG data (*_ieeg.medd/) #1956 (reverted in Revert "[ENH] Allow MED format for iEEG data (*_ieeg.medd/) (#1956)" #2122)
Add .nwb as supported format for EEG #2111

yarikoptic · 2026-03-20T00:25:49Z

ah - those beasts! gotcha. I think here situation is different since we are talking about commodity formats, but it brought me into the realm of a different 'conflict' that we have already photo

thus better align with them and having here image? (image is better suited since not necessarily a photo for stimuli or even behavior capture sketch)

Replace the newly added `Duration` metadata field with the existing `RecordingDuration` field, which already has the same semantics ("length of the recording in seconds") and unit. This avoids introducing a near-duplicate field for media files. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add a note in the appendix explaining why AudioSampleRate is used instead of the existing SamplingFrequency: audio-video containers need to distinguish the audio sampling rate from the video frame rate, so the Audio prefix is necessary for multi-stream files. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

The existing photo suffix rules use .tif, so document both .tif and .tiff as valid TIFF extensions for image contexts. This ensures consistency when BEPs define file rules for the image suffix. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add a section explaining that the media file definitions generalize all media in BIDS. The existing photo suffix covers a narrower use case (still images in electrophysiology/microscopy) and predates this framework. A "photo" could equally be a video with narration, an audio description, or a drawing. The media suffixes should be adopted for new datatypes, and a future proposal may deprecate photo in favor of the broader image suffix with migration tooling. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

yarikoptic · 2026-03-21T16:35:38Z

ok, pushed some commits which I think are bringing it very close to a reviewable state. Review commits , but 'major' one is the adding relationship to 'photo' we already have, rendered shortcut. I think it would be worth a separate PR to introduce that migration if we do proceed with "media files", and it would establish media files potentially even before 044/047, WDYT? IMHO it would make much sense since it could really be not just photo, but sketch, video, dictophone recording -- any media really IMHO to associate with data acquisition to describe locations etc.

effigies · 2026-03-22T19:15:17Z

Well, I guess it's a question of whether we care what the subject of the image is. _image tells me that it is an image, _photo tells me that it is a photograph (and asllabeling is a diagram of the ASL labeling protocol). _image definitely loses information here.

To compare to another collection of cases in BIDS, single-volume EPI images may have suffix sbref, m0scan or epi depending on the context of their acquisition or intended use.

It leads me to wonder: Would it make more sense to treat this as a discussion of permissible formats and codecs and common metadata, but leave the suffixes up to the BEP. I think audio/video/image/audiovideo could be completely reasonable for one context (under stimuli, they're quite clear), but too generic for another.

neuromechanist

LGTM in principle and +1 for implementing items with shared interest more atomically.
IMO, the tables and requirement levels could benefit from being pulled from the schema.

I bet Claude can figure out the minimal changes to the schema needed to make such implementation.

neuromechanist · 2026-03-23T01:08:02Z

+| Waveform Audio (WAV)   | `.wav`    | Uncompressed PCM audio; lossless, large files |
+| MP3                    | `.mp3`    | Lossy compressed audio; widely supported      |
+| Advanced Audio Coding  | `.aac`    | Lossy compressed audio; successor to MP3      |
+| Ogg Vorbis             | `.ogg`    | Open lossy compressed audio format            |


Should these be markdown tables, or schema-rendered macros?

indeed... looked at it, I think we indeed can produce out of src/schema/objects/extensions.yaml thus removing duplications and unifying description

TODO -- should be auto rendered based on schema using macroses, potentially adjusting descriptions in src/schema/objects/extensions.yaml so most expressive and consistent.

neuromechanist · 2026-03-23T01:09:01Z

+| `AudioCodec`        | `audio`, `audiovideo` | RECOMMENDED       |
+| `AudioSampleRate`   | `audio`, `audiovideo` | RECOMMENDED       |
+| `AudioChannelCount` | `audio`, `audiovideo` | RECOMMENDED       |
+| `AudioCodecRFC6381` | `audio`, `audiovideo` | OPTIONAL          |


Also, the requirement levels might benefit to be pulled from the schema.

YES! Duplication is evil and I forgot about this aspect while reviewing this one although I typically remember when reviewing PRs of others! TODO -- should be auto rendered based on schema using macroses!

The current macros don't support this central column. Some additional design would be needed.

yarikoptic · 2026-03-23T16:16:34Z

I think audio/video/image/audiovideo could be completely reasonable for one context (under stimuli, they're quite clear), but too generic for another.

I am yet to think about it more, in particular

_photo is more specific that _image... could/should photo be allowed as such more specific 'subclass' of image? but then we are jumping into a potentially huge extra ontology (could have drawing , schematic, diagram) without clear boundaries and potentially non-orthogonal description. So photo could be an image of a schematic, and what matters really that it is a schematic not that it is a photo
asllabeling , which we describe as "A deidentified screenshot of the planning of the labeling slab/plane ...". On a first thought it is a very nice description of the underlying content of that image (with a little overspecification in description that it is "screenshot"). And seems to be very similar in purpose to where _photo is used to capture EEG etc electrodes location, or overall "I have this image of something which would describe what I do not have a standard form to describe in ATM, but might deduce later by looking at this image!". In other words -- in those two specific applications, it was to capture provenance in image form (photo or "screenshot" which might be a photo, or PrtSc capture) . Moreover I bet many other modalities could/would need similar ones for similar or related needs (e.g. thinking about @bids-standard/bep037 ATM). I am wondering if they would be better fit to some dedicated and generic entity to annotate with for that reason.

but

someone's "behavior" could be some else's "stimuli". Think about movies ;) reminded me of David Leopold experiments of monkeys watching freely behaving monkeys. Hence what matters is the "content" not "purpose" of use.
those examples you brought up are IMHO specific parametrizations/samples of less specific instrumentation. If to bring into media domain --- I might have proposed 360audiovideo or 360image with video/image recordings from cams allowing for such acquisitions (yet to figure how to play my hand-gliding video damn it). They would still be video and audio but of specific instrumental characteristics worth highlighting. But in all of those suffixes more about describing the content, not the purpose of those files use (e.g. T1w could be used for so many purposes).
somewhat similar to recent refactoring of going from "IntendedFor" to "B0Field" -- again, decoupling away the purpose "what for" from description of characteristics making it appropriate for specific use (e.g. fieldmap correction vs assessment of distortions overall for QA or alike)

So, overall, I feel that those counter-examples are valuable to consider and relate to, but I feel that we still might want to separate description of "data content" vs "purpose" (stimuli vs capture of beh; description of instrumentation setup as an appendix or just not expressable in machine readable form;) here and hence overall this PR for media files.

@neuromechanist

Replace hand-written metadata tables with MACROS___make_sidecar_table() calls that pull field names, requirement levels, types, and descriptions directly from the schema (rules/sidecars/media.yaml + objects/metadata.yaml). This eliminates duplication between the appendix prose and the schema, addressing review feedback from @neuromechanist and @effigies. The suffix applicability is noted in prose above each table since the existing macro does not render a "Suffix" column. Format/extension tables remain as manual markdown since no macro exists for that layout. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add MACROS___make_suffix_table() call in the introduction to render the audio, video, audiovideo, and image suffix definitions directly from the schema, keeping the appendix in sync with suffixes.yaml. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add MACROS___make_extension_table() that renders a table of file extensions from the schema (objects/extensions.yaml), with columns for format name, extension (linked to glossary), and description. Replace the 3 hand-written format tables in media-files.md (audio, video, image) with macro calls, eliminating duplication between the appendix prose and extensions.yaml. Other spec files with similar hand-written extension tables (EEG, iEEG, EMG, MEG appendix) can adopt this macro in follow-up PRs. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Test that the macro correctly renders extension information from the schema, including display names, extension values, glossary links, and proper table structure. Follows the same pattern as existing render table tests. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

h-mayorquin

@bendichter asked me to take a look since I have been working with both images and video on the NWB side. I hope my review is useful.

The proposal looks good to me and I think it covers the basics very well.

I have one main suggestion about how to specify the resolution (clarifying Width and Height definitions, particularly for images where the convention is less established than for video) and some minor suggestions about adding extra metadata fields to the sidecars that could be useful for scientific reuse: bit depth, color channels, variable frame rate handling, and frame count. I also suggest including an openness angle in the recommendation for video containers.

There are other concerns I considered but think are too niche for the BIDS proposal: keyframe interval (which determines random access performance for inter-frame codecs), moov atom placement for MP4 and Cues placement for WebM/MKV (which determine whether a file is efficiently streamable over HTTP), and color spaces and gamma correction for images (which would matter for researchers who need precise physical representation of color in their data). I think those can be deferred and dealt with later.

h-mayorquin · 2026-03-25T17:03:45Z

+| Field               | Suffix                | Requirement Level |
+| ------------------- | --------------------- | ----------------- |
+| `VideoCodec`        | `video`, `audiovideo` | RECOMMENDED       |
+| `FrameRate`         | `video`, `audiovideo` | RECOMMENDED       |


The proposal includes FrameRate as a recommended field, but it should clarify how to handle variable frame rate (VFR) video. With constant frame rate, a single number is sufficient and any frame's timestamp can be computed as frame_number / frame_rate. With VFR, that arithmetic breaks down and each frame needs an explicit timestamp to be aligned with data on other recordings.

The spec should indicate whether FrameRate is expected to be the average rate, the nominal rate, or undefined for VFR files, and whether a boolean field like VariableFrameRate should accompany it so that downstream tools know they cannot rely on uniform spacing.

h-mayorquin · 2026-03-25T17:50:47Z

+because audio-video files require distinguishing the audio sampling rate from the
+video frame rate. The `Audio` prefix makes this unambiguous in multi-stream containers.
+
+### Visual properties


The proposal groups images and videos under shared "visual properties" with Width and Height fields. For video, this convention is well established: every extraction tool (ffprobe, mediainfo, pymediainfo) reports named width and height fields, and the meaning is unambiguous because video inherits from the display/broadcast tradition where horizontal is width and vertical is height. The spec can operationalize this directly: extract Width and Height from ffprobe -show_streams.

For images, the convention is less clear and can be field-dependent. A 1920x1080 photograph has an obvious width and height, but a 512x512 microscopy image of a tissue slice has no inherent horizontal or vertical axis. Different imaging domains and tools disagree on ordering: TIFF stores ImageLength (height) before ImageWidth, DICOM uses Rows and Columns, and modern microscopy libraries (aicsimageio, nd2) use named dimension labels like X and Y to avoid the ambiguity entirely. A user looking at a microscopy image has no intuitive way to decide which axis is "width."

For videos, Width and Height can be defined operationally: they are the values reported by ffprobe -v quiet -select_streams v:0 -show_entries stream=width,height -of csv=p=0 <file>. This is unambiguous, and is consistent with the proposal already relying on FFmpeg codec names as the authoritative source for codec identification. The same tool is the source of truth for both.

For images, there is no equivalent single authoritative tool, so the spec needs a conceptual definition instead. Something like: "Width is the number of columns and Height is the number of rows in the stored pixel grid. These describe the array dimensions of the file, not a physical orientation of the imaged subject." This is necessary because a 512x512 microscopy image has no inherent "wide" or "tall" axis, and different imaging tools disagree on ordering (TIFF stores height before width, PNG stores width before height). The conceptual anchor to columns/rows makes the fields unambiguous regardless of domain. Users extracting values from loaded arrays (NumPy, OpenCV .shape, scikit-image) should note that these libraries return (height, width) order, which is the reverse of the field order defined here.

I have argued on the NWB side that we should use (rows, columns) as unambiguous for images:

NeurodataWithoutBorders/nwb-schema#660 (comment)

But I think because this proposal mixes both videos and images, it can use the video terminology with a clarification for images.

h-mayorquin · 2026-03-25T17:51:02Z

+| `Width`  | `video`, `audiovideo`, `image`      | RECOMMENDED       |
+| `Height` | `video`, `audiovideo`, `image`      | RECOMMENDED       |
+
+### Video stream properties


I suggest adding a BitDepth field. Bit depth determines how many intensity levels each pixel can represent, which directly affects whether quantitative analyses on pixel values (e.g., delta F/F in calcium imaging, sub-pixel tracking in behavioral videos) have enough precision to be meaningful. Scientific cameras commonly record at 10, 12, or 16-bit, and a researcher reusing the data needs to know this before deciding what analyses are appropriate.

h-mayorquin · 2026-03-25T17:52:23Z

+| `VideoCodec`        | `video`, `audiovideo` | RECOMMENDED       |
+| `FrameRate`         | `video`, `audiovideo` | RECOMMENDED       |
+| `VideoCodecRFC6381` | `video`, `audiovideo` | OPTIONAL          |
+


I suggest adding a FrameCount field.

For constant frame rate video, frame count can be derived from FrameRate and RecordingDuration, but for variable frame rate video that derivation is undefined. An explicit frame count is also useful as a basic integrity check: a tool can verify that the number of frames it decodes matches the expected count in the sidecar, catching truncated or corrupted files without needing a full reference.

h-mayorquin · 2026-03-25T17:53:21Z

+| `Height` | `video`, `audiovideo`, `image`      | RECOMMENDED       |
+
+### Video stream properties
+


Whether data is grayscale, color, or has an alpha channel affects which analysis pipelines are applicable: pose estimation and segmentation tools often depend on color, while calcium imaging is inherently single-channel. I suggest separating how this is captured for images and video:

For video, I suggest adding a PixelFormat field using the ffprobe pix_fmt value (e.g., yuv420p, gray16le). This single string already encodes color model, channel count, chroma subsampling, and bit depth, so it avoids defining multiple separate fields that would partially duplicate each other. You could separate these into different fields, but since the proposal already relies on FFmpeg as the authoritative source of truth, this might be fine.

For images, I suggest adding ColorChannels (integer: 1, 3, 4) and ColorMode (e.g., grayscale, RGB, RGBA, LA). Image formats have no equivalent to pix_fmt, and these two fields capture what a researcher needs to know: how many channels the data has and what they represent. The Python Imaging Library (PIL) provides a comprehensive list of image modes that could serve as a reference for the controlled vocabulary.

h-mayorquin · 2026-03-25T17:57:13Z

+| WebP                      | `.webp`         | Modern format supporting lossy and lossless  |
+| Tag Image File Format     | `.tif`, `.tiff` | Lossless format common in scientific imaging |
+
+When choosing a format, consider the trade-off between file size and data fidelity.


The proposal's guidance on format choice mentions the trade-off between file size and data fidelity, but could also mention openness as another axis.

Where it does not add friction to existing pipelines, researchers should prefer open, royalty-free formats: Ogg Vorbis for lossy audio, FLAC for lossless audio, AV1 or VP9 for lossy video, and FFV1 for lossless video.

However, the spec should be honest about the practical reality: MP4 with H.264 works out of the box on every operating system, and MP4 with AV1 is supported on recent systems (Windows 10/11, macOS Ventura+, all major browsers) so I think it stands on equal grounds. MKV and WebM require third-party software (e.g., VLC) on both Windows and macOS, as neither Windows Media Player nor QuickTime supports them natively.

For a standard that aims to make data accessible, acknowledging this trade-off between openness and practical compatibility is important.

See the recent discussion on the NWB side:
NeurodataWithoutBorders/nwbinspector#669 (comment)

yarikoptic and others added 2 commits March 18, 2026 16:50

Fix table separator padding for remark-lint compliance

bd55318

Add spaces between pipes and dashes in all separator rows (e.g., `| --- |` instead of `|---|`) to satisfy the remark-lint table-cell-padding rule. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

effigies reviewed Mar 19, 2026

View reviewed changes

yarikoptic and others added 4 commits March 21, 2026 11:45

neuromechanist approved these changes Mar 23, 2026

View reviewed changes

yarikoptic and others added 4 commits March 23, 2026 13:09

h-mayorquin reviewed Mar 25, 2026

View reviewed changes

yarikoptic mentioned this pull request Mar 27, 2026

Formalization of "entities" flow or new-function-in-life for _mod- bids-standard/bids-2-devel#97

Open

		\| `Height` \| `video`, `audiovideo`, `image` \| RECOMMENDED \|

		### Video stream properties

Conversation

yarikoptic commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR adds

Design decisions

What each BEP would then add on top

Relation to existing PRs

Test plan

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Mar 19, 2026

Uh oh!

effigies commented Mar 19, 2026

Uh oh!

yarikoptic commented Mar 20, 2026

Uh oh!

yarikoptic commented Mar 21, 2026

Uh oh!

effigies commented Mar 22, 2026

Uh oh!

neuromechanist left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-mayorquin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-mayorquin Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-mayorquin Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yarikoptic commented Mar 19, 2026 •

edited

Loading

yarikoptic commented Mar 23, 2026 •

edited

Loading

h-mayorquin Mar 25, 2026 •

edited

Loading

h-mayorquin Mar 25, 2026 •

edited

Loading