-
Notifications
You must be signed in to change notification settings - Fork 197
Add common media file definitions for BEP044/BEP047 #2367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
e9500ed
bd55318
0a9addd
4faad34
56be0f6
8381389
311e335
4267efe
96dca84
933b390
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| # Media Files | ||
|
|
||
| ## Introduction | ||
|
|
||
| Several BIDS datatypes make use of media files — audio recordings, video recordings, | ||
| combined audio-video recordings, and still images. | ||
| This appendix defines the common file formats, metadata conventions, | ||
| and codec identification schemes shared across all datatypes that use media files. | ||
|
|
||
| The following media suffixes are defined: | ||
|
|
||
| {{ MACROS___make_suffix_table(["audio", "video", "audiovideo", "image"]) }} | ||
|
|
||
| Datatypes that incorporate media files (for example, behavioral recordings or stimuli) | ||
| define their own file-naming rules, directory placement, and datatype-specific metadata. | ||
| The conventions described here apply uniformly to all such datatypes. | ||
|
|
||
| ### Relationship to the `photo` suffix | ||
|
|
||
| The media file definitions introduced here generalize the concept of all media in BIDS. | ||
| The existing `photo` suffix (used for photographs of anatomical landmarks, | ||
| head localization coils, and tissue samples) predates this framework and covers | ||
| a narrower use case — still images in specific electrophysiology and microscopy datatypes. | ||
|
|
||
| The media suffixes (`audio`, `video`, `audiovideo`, `image`) are intended as the | ||
| general-purpose mechanism for all media content in BIDS. | ||
| In practice, a "photo" could equally be a video of an experimental setup with verbal | ||
| narration, an audio recording describing electrode placement, or a drawing rather than | ||
| a photograph. | ||
| The media file framework should be generally adopted for new datatypes, | ||
| and a future proposal may deprecate the `photo` suffix in favor of the broader `image` | ||
| suffix with appropriate migration tooling | ||
| (see [bids-utils](https://github.com/bids-standard/bids-utils)). | ||
|
|
||
| ## Supported Formats | ||
|
|
||
| ### Audio formats | ||
|
|
||
| {{ MACROS___make_extension_table(["wav", "mp3", "aac", "ogg"]) }} | ||
|
|
||
| ### Video container formats | ||
|
|
||
| {{ MACROS___make_extension_table(["mp4", "avi", "mkv", "webm"]) }} | ||
|
|
||
| ### Image formats | ||
|
|
||
| {{ MACROS___make_extension_table(["jpg", "png", "svg", "webp", "tif", "tiff"]) }} | ||
|
|
||
| When choosing a format, consider the trade-off between file size and data fidelity. | ||
| Uncompressed or lossless formats (WAV, PNG, TIFF) preserve full quality | ||
| but produce larger files. | ||
| Lossy formats (MP3, AAC, JPEG) significantly reduce file size | ||
| at the cost of some data loss. | ||
|
|
||
| ## Media Stream Metadata | ||
|
|
||
| Media files SHOULD be accompanied by a JSON sidecar file | ||
| containing technical metadata about the media streams. | ||
| The following metadata fields are defined for media files. | ||
|
|
||
| ### Duration | ||
|
|
||
| Applies to suffixes: `audio`, `video`, `audiovideo`. | ||
|
|
||
| {{ MACROS___make_sidecar_table("media.MediaDuration") }} | ||
|
|
||
| `RecordingDuration` reuses the existing BIDS metadata field already defined for | ||
| electrophysiology recordings (EEG, iEEG, MEG, and others). | ||
|
|
||
| ### Audio stream properties | ||
|
|
||
| Applies to suffixes: `audio`, `audiovideo`. | ||
|
|
||
| {{ MACROS___make_sidecar_table("media.MediaAudioProperties") }} | ||
|
|
||
| Note: `AudioSampleRate` is used instead of the existing `SamplingFrequency` field | ||
| because audio-video files require distinguishing the audio sampling rate from the | ||
| video frame rate. The `Audio` prefix makes this unambiguous in multi-stream containers. | ||
|
|
||
| ### Visual properties | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposal groups images and videos under shared "visual properties" with For images, the convention is less clear and can be field-dependent. A 1920x1080 photograph has an obvious width and height, but a 512x512 microscopy image of a tissue slice has no inherent horizontal or vertical axis. Different imaging domains and tools disagree on ordering: TIFF stores For videos, For images, there is no equivalent single authoritative tool, so the spec needs a conceptual definition instead. Something like: " I have argued on the NWB side that we should use (rows, columns) as unambiguous for images: NeurodataWithoutBorders/nwb-schema#660 (comment) But I think because this proposal mixes both videos and images, it can use the video terminology with a clarification for images. |
||
|
|
||
| Applies to suffixes: `video`, `audiovideo`, `image`. | ||
|
|
||
| {{ MACROS___make_sidecar_table("media.MediaVisualProperties") }} | ||
|
|
||
| ### Video stream properties | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest adding a |
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whether data is grayscale, color, or has an alpha channel affects which analysis pipelines are applicable: pose estimation and segmentation tools often depend on color, while calcium imaging is inherently single-channel. I suggest separating how this is captured for images and video: For video, I suggest adding a For images, I suggest adding |
||
| Applies to suffixes: `video`, `audiovideo`. | ||
|
|
||
| {{ MACROS___make_sidecar_table("media.MediaVideoProperties") }} | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest adding a For constant frame rate video, frame count can be derived from |
||
| ## Codec Identification | ||
|
|
||
| Codec identification uses two complementary naming systems: | ||
|
|
||
| ### FFmpeg codec names (RECOMMENDED) | ||
|
|
||
| The `AudioCodec` and `VideoCodec` fields use | ||
| [FFmpeg codec names](https://www.ffmpeg.org/ffmpeg-codecs.html) as the RECOMMENDED | ||
| convention. These names are the de facto standard in scientific computing and can be | ||
| auto-extracted from media files using: | ||
|
|
||
| ```bash | ||
| ffprobe -v quiet -print_format json -show_streams <file> | ||
| ``` | ||
|
|
||
| ### RFC 6381 codec strings (OPTIONAL) | ||
|
|
||
| The `AudioCodecRFC6381` and `VideoCodecRFC6381` fields use | ||
| [RFC 6381](https://datatracker.ietf.org/doc/html/rfc6381) codec strings. | ||
| These provide precise codec profile and level information useful for | ||
| web and broadcast interoperability. | ||
|
|
||
| ### Common codec reference | ||
|
|
||
| | Codec | FFmpeg Name | RFC 6381 String | Notes | | ||
| | -------------- | ----------- | ------------------ | ----------------------- | | ||
| | H.264 / AVC | `h264` | `avc1.640028` | Most widely supported | | ||
| | H.265 / HEVC | `hevc` | `hev1.1.6.L93.B0` | High efficiency | | ||
| | VP9 | `vp9` | `vp09.00.10.08` | Open, royalty-free | | ||
| | AV1 | `av1` | `av01.0.01M.08` | Next-gen open codec | | ||
| | AAC-LC | `aac` | `mp4a.40.2` | Default audio for MP4 | | ||
| | MP3 | `mp3` | `mp4a.6B` | Legacy lossy audio | | ||
| | Opus | `opus` | `Opus` | Open, low-latency audio | | ||
| | FLAC | `flac` | `fLaC` | Open lossless audio | | ||
| | PCM 16-bit LE | `pcm_s16le` | — | Uncompressed (WAV) | | ||
|
|
||
| The FFmpeg name column shows the value to use for `VideoCodec` or `AudioCodec`. | ||
| The RFC 6381 column shows the value for `VideoCodecRFC6381` or `AudioCodecRFC6381`. | ||
| RFC 6381 strings vary by profile and level; | ||
| the values shown are representative examples. | ||
|
|
||
| ## Privacy Considerations | ||
|
|
||
| Media files — particularly audio and video recordings — may contain | ||
| personally identifiable information (PII), including but not limited to: | ||
|
|
||
| - Voices and speech content | ||
| - Facial features and other physical characteristics | ||
| - Background environments that could identify locations | ||
| - Metadata embedded in file headers (for example, GPS coordinates, device identifiers) | ||
|
|
||
| Researchers MUST ensure that sharing of media files complies with the | ||
| informed consent obtained from participants and with applicable privacy regulations. | ||
| De-identification techniques (for example, voice distortion, face blurring, | ||
| metadata stripping) SHOULD be applied where appropriate before data sharing. | ||
|
|
||
| ## Example | ||
|
|
||
| A complete sidecar JSON file for an audio-video recording: | ||
|
|
||
| ```json | ||
| { | ||
| "RecordingDuration": 312.5, | ||
| "VideoCodec": "h264", | ||
| "VideoCodecRFC6381": "avc1.640028", | ||
| "FrameRate": 30, | ||
| "Width": 1920, | ||
| "Height": 1080, | ||
| "AudioCodec": "aac", | ||
| "AudioCodecRFC6381": "mp4a.40.2", | ||
| "AudioSampleRate": 48000, | ||
| "AudioChannelCount": 2 | ||
| } | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal's guidance on format choice mentions the trade-off between file size and data fidelity, but could also mention openness as another axis.
Where it does not add friction to existing pipelines, researchers should prefer open, royalty-free formats: Ogg Vorbis for lossy audio, FLAC for lossless audio, AV1 or VP9 for lossy video, and FFV1 for lossless video.
However, the spec should be honest about the practical reality: MP4 with H.264 works out of the box on every operating system, and MP4 with AV1 is supported on recent systems (Windows 10/11, macOS Ventura+, all major browsers) so I think it stands on equal grounds. MKV and WebM require third-party software (e.g., VLC) on both Windows and macOS, as neither Windows Media Player nor QuickTime supports them natively.
For a standard that aims to make data accessible, acknowledging this trade-off between openness and practical compatibility is important.
See the recent discussion on the NWB side:
NeurodataWithoutBorders/nwbinspector#669 (comment)