Add common media file definitions for BEP044/BEP047#2367
Add common media file definitions for BEP044/BEP047#2367yarikoptic wants to merge 10 commits intobids-standard:masterfrom
Conversation
…decar rules) Introduce shared media file infrastructure for BEP044 (stimuli) and BEP047 (behavioral A/V). Both BEPs need overlapping audio/video/image support, so this extracts the common foundation: - Suffixes: audio, video, audiovideo, image - Extensions: .wav, .mp3, .aac, .ogg, .mp4, .avi, .mkv, .webm, .svg, .webp, .tiff - Metadata: Duration, FrameRate, Width, Height, AudioChannelCount, AudioSampleRate, VideoCodec, AudioCodec, VideoCodecRFC6381, AudioCodecRFC6381 - Sidecar rules (media.yaml): suffix-based rules that auto-apply to any datatype - Appendix (media-files.md): formats, codec identification, privacy, examples Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add spaces between pipes and dashes in all separator rows (e.g., `| --- |` instead of `|---|`) to satisfy the remark-lint table-cell-padding rule. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
effigies
left a comment
There was a problem hiding this comment.
I approve in principle. Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.
I tried to search up what you might have in mind here but failed, could you please elaborate? |
|
The main ones that come to mind: |
Replace the newly added `Duration` metadata field with the existing
`RecordingDuration` field, which already has the same semantics
("length of the recording in seconds") and unit. This avoids
introducing a near-duplicate field for media files.
Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add a note in the appendix explaining why AudioSampleRate is used instead of the existing SamplingFrequency: audio-video containers need to distinguish the audio sampling rate from the video frame rate, so the Audio prefix is necessary for multi-stream files. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
The existing photo suffix rules use .tif, so document both .tif and .tiff as valid TIFF extensions for image contexts. This ensures consistency when BEPs define file rules for the image suffix. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add a section explaining that the media file definitions generalize all media in BIDS. The existing photo suffix covers a narrower use case (still images in electrophysiology/microscopy) and predates this framework. A "photo" could equally be a video with narration, an audio description, or a drawing. The media suffixes should be adopted for new datatypes, and a future proposal may deprecate photo in favor of the broader image suffix with migration tooling. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
|
ok, pushed some commits which I think are bringing it very close to a reviewable state. Review commits , but 'major' one is the adding relationship to 'photo' we already have, rendered shortcut. I think it would be worth a separate PR to introduce that migration if we do proceed with "media files", and it would establish media files potentially even before 044/047, WDYT? IMHO it would make much sense since it could really be not just photo, but sketch, video, dictophone recording -- any media really IMHO to associate with data acquisition to describe locations etc. |
|
Well, I guess it's a question of whether we care what the subject of the image is. To compare to another collection of cases in BIDS, single-volume EPI images may have suffix It leads me to wonder: Would it make more sense to treat this as a discussion of permissible formats and codecs and common metadata, but leave the suffixes up to the BEP. I think |
neuromechanist
left a comment
There was a problem hiding this comment.
LGTM in principle and +1 for implementing items with shared interest more atomically.
IMO, the tables and requirement levels could benefit from being pulled from the schema.
I bet Claude can figure out the minimal changes to the schema needed to make such implementation.
| | Waveform Audio (WAV) | `.wav` | Uncompressed PCM audio; lossless, large files | | ||
| | MP3 | `.mp3` | Lossy compressed audio; widely supported | | ||
| | Advanced Audio Coding | `.aac` | Lossy compressed audio; successor to MP3 | | ||
| | Ogg Vorbis | `.ogg` | Open lossy compressed audio format | |
There was a problem hiding this comment.
Should these be markdown tables, or schema-rendered macros?
There was a problem hiding this comment.
indeed... looked at it, I think we indeed can produce out of src/schema/objects/extensions.yaml thus removing duplications and unifying description
TODO -- should be auto rendered based on schema using macroses, potentially adjusting descriptions in src/schema/objects/extensions.yaml so most expressive and consistent.
| | `AudioCodec` | `audio`, `audiovideo` | RECOMMENDED | | ||
| | `AudioSampleRate` | `audio`, `audiovideo` | RECOMMENDED | | ||
| | `AudioChannelCount` | `audio`, `audiovideo` | RECOMMENDED | | ||
| | `AudioCodecRFC6381` | `audio`, `audiovideo` | OPTIONAL | |
There was a problem hiding this comment.
Also, the requirement levels might benefit to be pulled from the schema.
There was a problem hiding this comment.
YES! Duplication is evil and I forgot about this aspect while reviewing this one although I typically remember when reviewing PRs of others! TODO -- should be auto rendered based on schema using macroses!
There was a problem hiding this comment.
The current macros don't support this central column. Some additional design would be needed.
I am yet to think about it more, in particular
but
So, overall, I feel that those counter-examples are valuable to consider and relate to, but I feel that we still might want to separate description of "data content" vs "purpose" (stimuli vs capture of beh; description of instrumentation setup as an appendix or just not expressable in machine readable form;) here and hence overall this PR for media files. |
Replace hand-written metadata tables with MACROS___make_sidecar_table() calls that pull field names, requirement levels, types, and descriptions directly from the schema (rules/sidecars/media.yaml + objects/metadata.yaml). This eliminates duplication between the appendix prose and the schema, addressing review feedback from @neuromechanist and @effigies. The suffix applicability is noted in prose above each table since the existing macro does not render a "Suffix" column. Format/extension tables remain as manual markdown since no macro exists for that layout. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add MACROS___make_suffix_table() call in the introduction to render the audio, video, audiovideo, and image suffix definitions directly from the schema, keeping the appendix in sync with suffixes.yaml. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add MACROS___make_extension_table() that renders a table of file extensions from the schema (objects/extensions.yaml), with columns for format name, extension (linked to glossary), and description. Replace the 3 hand-written format tables in media-files.md (audio, video, image) with macro calls, eliminating duplication between the appendix prose and extensions.yaml. Other spec files with similar hand-written extension tables (EEG, iEEG, EMG, MEG appendix) can adopt this macro in follow-up PRs. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Test that the macro correctly renders extension information from the schema, including display names, extension values, glossary links, and proper table structure. Follows the same pattern as existing render table tests. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
h-mayorquin
left a comment
There was a problem hiding this comment.
@bendichter asked me to take a look since I have been working with both images and video on the NWB side. I hope my review is useful.
The proposal looks good to me and I think it covers the basics very well.
I have one main suggestion about how to specify the resolution (clarifying Width and Height definitions, particularly for images where the convention is less established than for video) and some minor suggestions about adding extra metadata fields to the sidecars that could be useful for scientific reuse: bit depth, color channels, variable frame rate handling, and frame count. I also suggest including an openness angle in the recommendation for video containers.
There are other concerns I considered but think are too niche for the BIDS proposal: keyframe interval (which determines random access performance for inter-frame codecs), moov atom placement for MP4 and Cues placement for WebM/MKV (which determine whether a file is efficiently streamable over HTTP), and color spaces and gamma correction for images (which would matter for researchers who need precise physical representation of color in their data). I think those can be deferred and dealt with later.
| | Field | Suffix | Requirement Level | | ||
| | ------------------- | --------------------- | ----------------- | | ||
| | `VideoCodec` | `video`, `audiovideo` | RECOMMENDED | | ||
| | `FrameRate` | `video`, `audiovideo` | RECOMMENDED | |
There was a problem hiding this comment.
The proposal includes FrameRate as a recommended field, but it should clarify how to handle variable frame rate (VFR) video. With constant frame rate, a single number is sufficient and any frame's timestamp can be computed as frame_number / frame_rate. With VFR, that arithmetic breaks down and each frame needs an explicit timestamp to be aligned with data on other recordings.
The spec should indicate whether FrameRate is expected to be the average rate, the nominal rate, or undefined for VFR files, and whether a boolean field like VariableFrameRate should accompany it so that downstream tools know they cannot rely on uniform spacing.
| because audio-video files require distinguishing the audio sampling rate from the | ||
| video frame rate. The `Audio` prefix makes this unambiguous in multi-stream containers. | ||
|
|
||
| ### Visual properties |
There was a problem hiding this comment.
The proposal groups images and videos under shared "visual properties" with Width and Height fields. For video, this convention is well established: every extraction tool (ffprobe, mediainfo, pymediainfo) reports named width and height fields, and the meaning is unambiguous because video inherits from the display/broadcast tradition where horizontal is width and vertical is height. The spec can operationalize this directly: extract Width and Height from ffprobe -show_streams.
For images, the convention is less clear and can be field-dependent. A 1920x1080 photograph has an obvious width and height, but a 512x512 microscopy image of a tissue slice has no inherent horizontal or vertical axis. Different imaging domains and tools disagree on ordering: TIFF stores ImageLength (height) before ImageWidth, DICOM uses Rows and Columns, and modern microscopy libraries (aicsimageio, nd2) use named dimension labels like X and Y to avoid the ambiguity entirely. A user looking at a microscopy image has no intuitive way to decide which axis is "width."
For videos, Width and Height can be defined operationally: they are the values reported by ffprobe -v quiet -select_streams v:0 -show_entries stream=width,height -of csv=p=0 <file>. This is unambiguous, and is consistent with the proposal already relying on FFmpeg codec names as the authoritative source for codec identification. The same tool is the source of truth for both.
For images, there is no equivalent single authoritative tool, so the spec needs a conceptual definition instead. Something like: "Width is the number of columns and Height is the number of rows in the stored pixel grid. These describe the array dimensions of the file, not a physical orientation of the imaged subject." This is necessary because a 512x512 microscopy image has no inherent "wide" or "tall" axis, and different imaging tools disagree on ordering (TIFF stores height before width, PNG stores width before height). The conceptual anchor to columns/rows makes the fields unambiguous regardless of domain. Users extracting values from loaded arrays (NumPy, OpenCV .shape, scikit-image) should note that these libraries return (height, width) order, which is the reverse of the field order defined here.
I have argued on the NWB side that we should use (rows, columns) as unambiguous for images:
NeurodataWithoutBorders/nwb-schema#660 (comment)
But I think because this proposal mixes both videos and images, it can use the video terminology with a clarification for images.
| | `Width` | `video`, `audiovideo`, `image` | RECOMMENDED | | ||
| | `Height` | `video`, `audiovideo`, `image` | RECOMMENDED | | ||
|
|
||
| ### Video stream properties |
There was a problem hiding this comment.
I suggest adding a BitDepth field. Bit depth determines how many intensity levels each pixel can represent, which directly affects whether quantitative analyses on pixel values (e.g., delta F/F in calcium imaging, sub-pixel tracking in behavioral videos) have enough precision to be meaningful. Scientific cameras commonly record at 10, 12, or 16-bit, and a researcher reusing the data needs to know this before deciding what analyses are appropriate.
| | `VideoCodec` | `video`, `audiovideo` | RECOMMENDED | | ||
| | `FrameRate` | `video`, `audiovideo` | RECOMMENDED | | ||
| | `VideoCodecRFC6381` | `video`, `audiovideo` | OPTIONAL | | ||
|
|
There was a problem hiding this comment.
I suggest adding a FrameCount field.
For constant frame rate video, frame count can be derived from FrameRate and RecordingDuration, but for variable frame rate video that derivation is undefined. An explicit frame count is also useful as a basic integrity check: a tool can verify that the number of frames it decodes matches the expected count in the sidecar, catching truncated or corrupted files without needing a full reference.
| | `Height` | `video`, `audiovideo`, `image` | RECOMMENDED | | ||
|
|
||
| ### Video stream properties | ||
|
|
There was a problem hiding this comment.
Whether data is grayscale, color, or has an alpha channel affects which analysis pipelines are applicable: pose estimation and segmentation tools often depend on color, while calcium imaging is inherently single-channel. I suggest separating how this is captured for images and video:
For video, I suggest adding a PixelFormat field using the ffprobe pix_fmt value (e.g., yuv420p, gray16le). This single string already encodes color model, channel count, chroma subsampling, and bit depth, so it avoids defining multiple separate fields that would partially duplicate each other. You could separate these into different fields, but since the proposal already relies on FFmpeg as the authoritative source of truth, this might be fine.
For images, I suggest adding ColorChannels (integer: 1, 3, 4) and ColorMode (e.g., grayscale, RGB, RGBA, LA). Image formats have no equivalent to pix_fmt, and these two fields capture what a researcher needs to know: how many channels the data has and what they represent. The Python Imaging Library (PIL) provides a comprehensive list of image modes that could serve as a reference for the controlled vocabulary.
| | WebP | `.webp` | Modern format supporting lossy and lossless | | ||
| | Tag Image File Format | `.tif`, `.tiff` | Lossless format common in scientific imaging | | ||
|
|
||
| When choosing a format, consider the trade-off between file size and data fidelity. |
There was a problem hiding this comment.
The proposal's guidance on format choice mentions the trade-off between file size and data fidelity, but could also mention openness as another axis.
Where it does not add friction to existing pipelines, researchers should prefer open, royalty-free formats: Ogg Vorbis for lossy audio, FLAC for lossless audio, AV1 or VP9 for lossy video, and FFV1 for lossless video.
However, the spec should be honest about the practical reality: MP4 with H.264 works out of the box on every operating system, and MP4 with AV1 is supported on recent systems (Windows 10/11, macOS Ventura+, all major browsers) so I think it stands on equal grounds. MKV and WebM require third-party software (e.g., VLC) on both Windows and macOS, as neither Windows Media Player nor QuickTime supports them natively.
For a standard that aims to make data accessible, acknowledging this trade-off between openness and practical compatibility is important.
See the recent discussion on the NWB side:
NeurodataWithoutBorders/nwbinspector#669 (comment)
Summary
Shortcuts:
Two BEPs independently define audio/video media file support with significant overlap:
/stimuli/beh/Both need overlapping suffixes (
audio,video), file extensions (.mp4,.wav, …), and similar technical metadata. Rather than define these independently in each BEP — risking inconsistencies and merge conflicts — this PR extracts the common foundation that both can build upon.For those who want "anecdotal" argumentation, here is a loose translation of one from russian
The French and British were planning the Channel Tunnel and looking for contractors. The Americans proposed digging from both sides, promising to meet in the middle with a maximum 15-meter margin of error. Time: two years. The Japanese agreed to the same plan, but guaranteed a 5-meter accuracy within one year. Then a Russian contractor walks in and says, “We’ll dig from both sides. Two weeks. No guarantees, but in the worst-case scenario, you’ll end up with two tunnels...”
kudos to @vmdocua for reminding of this one
What this PR adds
audio,video,audiovideo,image.wav,.mp3,.aac,.ogg,.mp4,.avi,.mkv,.webm,.svg,.webp,.tiff.mjpegfor snapshots from videos for training data for pose estimation? (@talmo? #2057)Duration,FrameRate,Width,Height,AudioChannelCount,AudioSampleRate,VideoCodec,AudioCodec,VideoCodecRFC6381,AudioCodecRFC6381rules/sidecars/media.yaml— suffix-based rules that auto-apply to any datatype using these suffixesappendices/media-files.md— supported formats, codec identification (FFmpeg + RFC 6381), privacy considerations, example JSONDesign decisions
stimuliandbehdatatypes without duplicationffprobeh264→ multiple profile/level strings)What each BEP would then add on top
stimulidatatype, provenance metadata (license, copyright, URL), stimulus-specific entitiesbehdatatype, device metadata, behavioral entities (task, recording, split)although we could even shift some , like provenance into this common one, WDYT?
And both would get the common
media.yamlsidecar rules for free.Relation to existing PRs
This branch is based on
masterand is intentionally independent of both BEP PRs.I can furnish PRs for that after we agree to agree on this to be a reasonable (even if not final) common ground!
We can even refine this further until satisfied and then first BEP to be accepted would "drag" this PR in as well.
Alternatively we could keep those PR separate of this until we finalize it really to simplify review of both BEPs by separating "what are media files in BIDS" from "how does datatype X use them."
CC: @bids-standard/bep044 @ree-gupta @neuromechanist @Remi-Gau @effigies @talmo — feedback welcome from both BEP teams and maintainers.
Test plan
tools/schemacodepytest)mkdocs serverenders appendix correctly🤖 Generated with Claude Code and "manual touches"