-
Notifications
You must be signed in to change notification settings - Fork 198
Add common media file definitions for BEP044/BEP047 #2367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
e9500ed
bd55318
0a9addd
4faad34
56be0f6
8381389
311e335
4267efe
96dca84
933b390
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| # Media Files | ||
|
|
||
| ## Introduction | ||
|
|
||
| Several BIDS datatypes make use of media files — audio recordings, video recordings, | ||
| combined audio-video recordings, and still images. | ||
| This appendix defines the common file formats, metadata conventions, | ||
| and codec identification schemes shared across all datatypes that use media files. | ||
|
|
||
| Datatypes that incorporate media files (for example, behavioral recordings or stimuli) | ||
| define their own file-naming rules, directory placement, and datatype-specific metadata. | ||
| The conventions described here apply uniformly to all such datatypes. | ||
|
|
||
| ## Supported Formats | ||
|
|
||
| ### Audio formats | ||
|
|
||
| | Format | Extension | Description | | ||
| | ---------------------- | --------- | --------------------------------------------- | | ||
| | Waveform Audio (WAV) | `.wav` | Uncompressed PCM audio; lossless, large files | | ||
| | MP3 | `.mp3` | Lossy compressed audio; widely supported | | ||
| | Advanced Audio Coding | `.aac` | Lossy compressed audio; successor to MP3 | | ||
| | Ogg Vorbis | `.ogg` | Open lossy compressed audio format | | ||
|
|
||
| ### Video container formats | ||
|
|
||
| | Format | Extension | Description | | ||
| | ---------------------- | --------- | ---------------------------------------- | | ||
| | MPEG-4 Part 14 | `.mp4` | Widely supported multimedia container | | ||
| | Audio Video Interleave | `.avi` | Legacy multimedia container | | ||
| | Matroska | `.mkv` | Open, flexible multimedia container | | ||
| | WebM | `.webm` | Open format optimized for web delivery | | ||
|
|
||
| ### Image formats | ||
|
|
||
| | Format | Extension | Description | | ||
| | ------------------------- | --------- | -------------------------------------------- | | ||
| | JPEG | `.jpg` | Lossy compressed photographic images | | ||
| | Portable Network Graphics | `.png` | Lossless compressed images with transparency | | ||
| | Scalable Vector Graphics | `.svg` | XML-based vector image format | | ||
| | WebP | `.webp` | Modern format supporting lossy and lossless | | ||
| | Tag Image File Format | `.tiff` | Lossless format common in scientific imaging | | ||
|
|
||
| When choosing a format, consider the trade-off between file size and data fidelity. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposal's guidance on format choice mentions the trade-off between file size and data fidelity, but could also mention openness as another axis. Where it does not add friction to existing pipelines, researchers should prefer open, royalty-free formats: Ogg Vorbis for lossy audio, FLAC for lossless audio, AV1 or VP9 for lossy video, and FFV1 for lossless video. However, the spec should be honest about the practical reality: MP4 with H.264 works out of the box on every operating system, and MP4 with AV1 is supported on recent systems (Windows 10/11, macOS Ventura+, all major browsers) so I think it stands on equal grounds. MKV and WebM require third-party software (e.g., VLC) on both Windows and macOS, as neither Windows Media Player nor QuickTime supports them natively. For a standard that aims to make data accessible, acknowledging this trade-off between openness and practical compatibility is important. See the recent discussion on the NWB side: |
||
| Uncompressed or lossless formats (WAV, PNG, TIFF) preserve full quality | ||
| but produce larger files. | ||
| Lossy formats (MP3, AAC, JPEG) significantly reduce file size | ||
| at the cost of some data loss. | ||
|
|
||
| ## Media Stream Metadata | ||
|
|
||
| Media files SHOULD be accompanied by a JSON sidecar file | ||
| containing technical metadata about the media streams. | ||
| The following metadata fields are defined for media files: | ||
|
|
||
| ### Duration | ||
|
|
||
| | Field | Suffix | Requirement Level | | ||
| | ---------- | ------------------------------- | ----------------- | | ||
| | `Duration` | `audio`, `video`, `audiovideo` | RECOMMENDED | | ||
|
|
||
| `Duration` is the total duration of the media file in seconds. | ||
| For audio-video files, this is the duration of the longest stream. | ||
|
|
||
| ### Audio stream properties | ||
|
|
||
| | Field | Suffix | Requirement Level | | ||
| | ------------------- | --------------------- | ----------------- | | ||
| | `AudioCodec` | `audio`, `audiovideo` | RECOMMENDED | | ||
| | `AudioSampleRate` | `audio`, `audiovideo` | RECOMMENDED | | ||
| | `AudioChannelCount` | `audio`, `audiovideo` | RECOMMENDED | | ||
| | `AudioCodecRFC6381` | `audio`, `audiovideo` | OPTIONAL | | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, the requirement levels might benefit to be pulled from the schema.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. YES! Duplication is evil and I forgot about this aspect while reviewing this one although I typically remember when reviewing PRs of others! TODO -- should be auto rendered based on schema using macroses!
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current macros don't support this central column. Some additional design would be needed. |
||
|
|
||
| ### Visual properties | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposal groups images and videos under shared "visual properties" with For images, the convention is less clear and can be field-dependent. A 1920x1080 photograph has an obvious width and height, but a 512x512 microscopy image of a tissue slice has no inherent horizontal or vertical axis. Different imaging domains and tools disagree on ordering: TIFF stores For videos, For images, there is no equivalent single authoritative tool, so the spec needs a conceptual definition instead. Something like: " I have argued on the NWB side that we should use (rows, columns) as unambiguous for images: NeurodataWithoutBorders/nwb-schema#660 (comment) But I think because this proposal mixes both videos and images, it can use the video terminology with a clarification for images. |
||
|
|
||
| | Field | Suffix | Requirement Level | | ||
| | -------- | ----------------------------------- | ----------------- | | ||
| | `Width` | `video`, `audiovideo`, `image` | RECOMMENDED | | ||
| | `Height` | `video`, `audiovideo`, `image` | RECOMMENDED | | ||
|
|
||
| ### Video stream properties | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest adding a |
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whether data is grayscale, color, or has an alpha channel affects which analysis pipelines are applicable: pose estimation and segmentation tools often depend on color, while calcium imaging is inherently single-channel. I suggest separating how this is captured for images and video: For video, I suggest adding a For images, I suggest adding |
||
| | Field | Suffix | Requirement Level | | ||
| | ------------------- | --------------------- | ----------------- | | ||
| | `VideoCodec` | `video`, `audiovideo` | RECOMMENDED | | ||
| | `FrameRate` | `video`, `audiovideo` | RECOMMENDED | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposal includes The spec should indicate whether |
||
| | `VideoCodecRFC6381` | `video`, `audiovideo` | OPTIONAL | | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest adding a For constant frame rate video, frame count can be derived from |
||
| ## Codec Identification | ||
|
|
||
| Codec identification uses two complementary naming systems: | ||
|
|
||
| ### FFmpeg codec names (RECOMMENDED) | ||
|
|
||
| The `AudioCodec` and `VideoCodec` fields use | ||
| [FFmpeg codec names](https://www.ffmpeg.org/ffmpeg-codecs.html) as the RECOMMENDED | ||
| convention. These names are the de facto standard in scientific computing and can be | ||
| auto-extracted from media files using: | ||
|
|
||
| ```bash | ||
| ffprobe -v quiet -print_format json -show_streams <file> | ||
| ``` | ||
|
|
||
| ### RFC 6381 codec strings (OPTIONAL) | ||
|
|
||
| The `AudioCodecRFC6381` and `VideoCodecRFC6381` fields use | ||
| [RFC 6381](https://datatracker.ietf.org/doc/html/rfc6381) codec strings. | ||
| These provide precise codec profile and level information useful for | ||
| web and broadcast interoperability. | ||
|
|
||
| ### Common codec reference | ||
|
|
||
| | Codec | FFmpeg Name | RFC 6381 String | Notes | | ||
| | -------------- | ----------- | ------------------ | ----------------------- | | ||
| | H.264 / AVC | `h264` | `avc1.640028` | Most widely supported | | ||
| | H.265 / HEVC | `hevc` | `hev1.1.6.L93.B0` | High efficiency | | ||
| | VP9 | `vp9` | `vp09.00.10.08` | Open, royalty-free | | ||
| | AV1 | `av1` | `av01.0.01M.08` | Next-gen open codec | | ||
| | AAC-LC | `aac` | `mp4a.40.2` | Default audio for MP4 | | ||
| | MP3 | `mp3` | `mp4a.6B` | Legacy lossy audio | | ||
| | Opus | `opus` | `Opus` | Open, low-latency audio | | ||
| | FLAC | `flac` | `fLaC` | Open lossless audio | | ||
| | PCM 16-bit LE | `pcm_s16le` | — | Uncompressed (WAV) | | ||
|
|
||
| The FFmpeg name column shows the value to use for `VideoCodec` or `AudioCodec`. | ||
| The RFC 6381 column shows the value for `VideoCodecRFC6381` or `AudioCodecRFC6381`. | ||
| RFC 6381 strings vary by profile and level; | ||
| the values shown are representative examples. | ||
|
|
||
| ## Privacy Considerations | ||
|
|
||
| Media files — particularly audio and video recordings — may contain | ||
| personally identifiable information (PII), including but not limited to: | ||
|
|
||
| - Voices and speech content | ||
| - Facial features and other physical characteristics | ||
| - Background environments that could identify locations | ||
| - Metadata embedded in file headers (for example, GPS coordinates, device identifiers) | ||
|
|
||
| Researchers MUST ensure that sharing of media files complies with the | ||
| informed consent obtained from participants and with applicable privacy regulations. | ||
| De-identification techniques (for example, voice distortion, face blurring, | ||
| metadata stripping) SHOULD be applied where appropriate before data sharing. | ||
|
|
||
| ## Example | ||
|
|
||
| A complete sidecar JSON file for an audio-video recording: | ||
|
|
||
| ```json | ||
| { | ||
| "Duration": 312.5, | ||
| "VideoCodec": "h264", | ||
| "VideoCodecRFC6381": "avc1.640028", | ||
| "FrameRate": 30, | ||
| "Width": 1920, | ||
| "Height": 1080, | ||
| "AudioCodec": "aac", | ||
| "AudioCodecRFC6381": "mp4a.40.2", | ||
| "AudioSampleRate": 48000, | ||
| "AudioChannelCount": 2 | ||
| } | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be markdown tables, or schema-rendered macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed... looked at it, I think we indeed can produce out of src/schema/objects/extensions.yaml thus removing duplications and unifying description
TODO -- should be auto rendered based on schema using macroses, potentially adjusting descriptions in
src/schema/objects/extensions.yamlso most expressive and consistent.