Skip to content

recover from H264/H265 DTS extractor failures instead of disconnecting#5714

Open
JohanG-LAS wants to merge 1 commit intobluenviron:mainfrom
JohanG-LAS:srt-fix-dts-extractor-recovery
Open

recover from H264/H265 DTS extractor failures instead of disconnecting#5714
JohanG-LAS wants to merge 1 commit intobluenviron:mainfrom
JohanG-LAS:srt-fix-dts-extractor-recovery

Conversation

@JohanG-LAS
Copy link
Copy Markdown
Contributor

Cloud providers are regularly doing VM live migrations, causing VM's to freeze during a couple of seconds. See Azure and Google
TCP transport protocols often handles these "outages" well, but UDP protocols can struggle.

The H264/H265 DTS extractors in mediacommon return an error when they encounter conditions they cannot handle (e.g. "too many reordered frames" on a Picture Order Count discontinuity). Until now, MediaMTX propagated that error out of the per-format OnDataFunc callback, which terminated the stream.Reader / recorder instance and dropped the publisher. For live ingest (SRT, RTMP) this is particularly painful: a transient network-induced reorder kills an otherwise healthy session and forces the publisher to reconnect.

Fix

This change adopts a "reset and warn" recovery strategy at every DTS extractor call site:

  • log a Warn describing the failure and that the extractor is being reset
  • set the extractor back to nil so the next IDR re-primes it
  • return nil so the reader / recorder keeps running

The extractor is automatically re-created from the next random access frame, exactly like on the very first frame of a stream. Non-IDR units that arrive before the next IDR are skipped by the existing dtsExtractor == nil branch, so no malformed timestamps are forwarded downstream.

Updated call sites:

  • internal/protocols/mpegts/from_stream.go (H264, H265)
  • internal/protocols/rtmp/from_stream.go (H264, H265)
  • internal/recorder/format_mpegts.go (H264, H265)
  • internal/recorder/format_fmp4.go (H264, H265)

Test

  • internal/protocols/mpegts/from_stream_recovery_test.go: new focused tests for H264 and H265 that prime the extractor, inject a backwards-PTS unit to trigger the failure, verify the warn log, and confirm the next IDR successfully re-primes the extractor.
  • internal/protocols/rtmp/from_stream_recovery_test.go: same scenario for the RTMP H264 path.
  • internal/recorder/recorder_test.go: updated to reflect that a DTS extractor failure no longer tears down the recorder; the discontinuity is now absorbed and the segment closes cleanly on shutdown.

Made-with: Cursor

The H264/H265 DTS extractors in mediacommon return an error when they
encounter conditions they cannot handle (e.g. "too many reordered
frames" on a Picture Order Count discontinuity). Until now,
MediaMTX propagated that error out of the per-format OnDataFunc
callback, which terminated the stream.Reader / recorder instance
and dropped the publisher. For live ingest (SRT, RTMP) this is
particularly painful: a transient network-induced reorder kills an
otherwise healthy session and forces the publisher to reconnect.

This change adopts a "reset and warn" recovery strategy at every
DTS extractor call site:

  - log a Warn describing the failure and that the extractor is
    being reset
  - set the extractor back to nil so the next IDR re-primes it
  - return nil so the reader / recorder keeps running

The extractor is automatically re-created from the next random
access frame, exactly like on the very first frame of a stream.
Non-IDR units that arrive before the next IDR are skipped by the
existing dtsExtractor == nil branch, so no malformed timestamps
are forwarded downstream.

Updated call sites:
  - internal/protocols/mpegts/from_stream.go (H264, H265)
  - internal/protocols/rtmp/from_stream.go (H264, H265)
  - internal/recorder/format_mpegts.go (H264, H265)
  - internal/recorder/format_fmp4.go (H264, H265)

Tests: - internal/protocols/mpegts/from_stream_recovery_test.go: new
    focused tests for H264 and H265 that prime the extractor,
    inject a backwards-PTS unit to trigger the failure, verify
    the warn log, and confirm the next IDR successfully re-primes
    the extractor.
  - internal/protocols/rtmp/from_stream_recovery_test.go: same
    scenario for the RTMP H264 path.
  - internal/recorder/recorder_test.go: updated to reflect that a
    DTS extractor failure no longer tears down the recorder; the
    discontinuity is now absorbed and the segment closes cleanly
    on shutdown.
Made-with: Cursor
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 62.50000% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.32%. Comparing base (cae9920) to head (7501f0a).
⚠️ Report is 29 commits behind head on main.

Files with missing lines Patch % Lines
internal/protocols/rtmp/from_stream.go 50.00% 5 Missing ⚠️
internal/recorder/format_fmp4.go 50.00% 5 Missing ⚠️
internal/recorder/format_mpegts.go 50.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5714      +/-   ##
==========================================
+ Coverage   62.08%   63.32%   +1.23%     
==========================================
  Files         214      217       +3     
  Lines       17602    18278     +676     
==========================================
+ Hits        10929    11574     +645     
+ Misses       5766     5764       -2     
- Partials      907      940      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant