Skip to content

Support DTS-HD with packets across PES boundary#3147

Merged
copybara-service[bot] merged 4 commits into
androidx:mainfrom
nift4:dtshdints
May 12, 2026
Merged

Support DTS-HD with packets across PES boundary#3147
copybara-service[bot] merged 4 commits into
androidx:mainfrom
nift4:dtshdints

Conversation

@nift4
Copy link
Copy Markdown
Contributor

@nift4 nift4 commented Mar 28, 2026

In DTS-HD streams, it can happen that core packet is in one PES and the
extension substream is in another PES. Ensure we can read ahead into next PES
in this case. We need to use the old timestamp as extss and core always belong
together.

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 28, 2026

Not enough. Please follow the ffmpeg implementation.

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 28, 2026

@FongMi please can you elaborate a bit?

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 28, 2026

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 28, 2026

@FongMi Well, I see that you did a lot of changes and most of them seem to address the same issue that we shouldn't emit core format but extension format instead (I solved it with the PatientTrackOutput wrapper and you added a new state and coreFormatPendingEmit instead). My fix is a bit subtle because it intercepts the format() call and waits until extension header is parsed before sending, allowing to overwrite the format which is cached. But it works and it's readable.

I also see another issue which I didn't solve but you solved it, that we should wait for core frame after seeking instead of sending extension substream frame first - it makes sense and I'll add it to my patch.

Is there any other problem I need to address?

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 28, 2026

That should be all. I'll test the ISO file again after you finish.

@nift4 nift4 changed the title Support DTS-HD in ts extractor Support differentiating between DTS-HD and DTS Express in TS and Matroska extractors Mar 28, 2026
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 28, 2026

@FongMi It should be good now, let me know if it works. Thanks :)

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 29, 2026

Users have reported that the playback quality of DTS-HD is not as good as my version, and that stuttering occurs. This may be because your Core+EXTSS is sent in segments, while mine is sent all at once.

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 29, 2026

FongMi@6755c3e
Please also implement DTS-MA and DTS-X recognition.
I've switched DtsUtil to your version to reduce merge conflicts.

@nift4 nift4 changed the title Support differentiating between DTS-HD and DTS Express in TS and Matroska extractors Support differentiating between DTS-HD and DTS Express in TS, MP4 and Matroska extractors Mar 29, 2026
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 29, 2026

@FongMi

Users have reported that the playback quality of DTS-HD is not as good as my version, and that stuttering occurs. This may be because your Core+EXTSS is sent in segments, while mine is sent all at once.

Should be fixed, please check again. Thanks for testing!

Please also implement DTS-MA and DTS-X recognition.

It's on my TODO list, but I will do it in a seperate PR when this is merged. This one is already quite big.

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 30, 2026

@nift4 Users say it's still very laggy.

Core Architectural Difference

nick — PatientTrackOutput double-buffer

TS input → PatientTrackOutput internal buffer → flush() → TrackOutput

Every frame goes through two copies:

// Copy 1: into internal ParsableByteArray
public void sampleData(ParsableByteArray data, int length) {
    this.data.ensureCapacity(this.data.limit() + length);
    ByteBuffer tmp = ByteBuffer.wrap(this.data.getData()); // new object every call
    tmp.put(data.getData(), data.getPosition(), length);
}

// Copy 2: out to the real TrackOutput
public void flush() {
    output.sampleData(data, data.limit());
}

fongmi — zero-copy passthrough

TS input → TrackOutput (direct)

Instead of buffering, it adds a STATE_CHECKING_FOR_EXTSS_AFTER_CORE state: after reading a core frame, it peeks 4 bytes to check whether an EXSS follows, then decides to combine or emit immediately — no intermediate buffer needed.


Performance Comparison

Criterion fongmi nick
Data copies per frame 1 (direct write) 2 (buffered)
Per-frame heap allocation None ByteBuffer.wrap() each call
ensureCapacity reallocation risk None Yes (large frames)
DTS:X (XLL-X) auto-detection Yes No
Post-seek resync skipExtssUntilCore (precise) waitingForResyncAfterSeek (basic)
GC pressure Low Higher

Verdict

fongmi is faster, for three reasons:

  1. Zero-copy: data is written directly to TrackOutput, eliminating the PatientTrackOutput intermediate buffer and second memcpy. For large DTS-HD MA frames (potentially hundreds of KB each), this is significant.

  2. No per-frame heap allocation: nick allocates a new ByteBuffer wrapper on every sampleData() call, increasing GC pressure.

  3. Richer functionality: fongmi adds XLL-X scanning to auto-detect DTS:X object audio and properly upgrades the track format to AUDIO_DTS_X, which nick lacks entirely.

nick's PatientTrackOutput is a cleaner abstraction conceptually, but the double-copy cost makes it a poor trade-off for a hot path that processes audio frames continuously.

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 30, 2026

Hi @FongMi, I ran my sample file through both versions of extractors and only difference I saw in result was average bitrate was set on mine. That's set from core header, I imagine it might cause buffer allocation to be too small, so I removed it. Can you re-test?

No per-frame heap allocation: nick allocates a new ByteBuffer wrapper on every sampleData() call, increasing GC pressure.

Also, I don't think that's the problem causing audible lags, but I changed it to only do that on the first frame (after seek).

@rohitjoins
Copy link
Copy Markdown
Contributor

Hi @nift4,

Thank you for all the work and iterations on this!

The PR in its current form, touches / fixes lots of different issues in extractor. Could we break this down further into smaller issues, focusing one thing at a time?

Suggestion:

  • Fix for mime type in TsExtractor
  • Format emission issue in TsExtractor
  • Fix for MatroskaExtractor
  • Fix for Mp4Extractor.

Once done, I'm happy to look at them individually and merge them.

@nift4 nift4 force-pushed the dtshdints branch 2 times, most recently from ea7a9c6 to 90611d7 Compare April 2, 2026 15:08
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 2, 2026

Hi @rohitjoins,

The PR in its current form, touches / fixes lots of different issues in extractor. Could we break this down further into smaller issues, focusing one thing at a time?

Because the other parts depend on mime type detection change in TsExtractor I had it as one branch, otherwise there would be conflicts. But we can instead merge this change first (in this PR, I removed everything except "Fix for mime type in TsExtractor") and I will add the other changes to new PRs once this one is merged.

Also please note that the format = null; change is required (and not left over by mistake) in order to have the DTS-HD MA sample ever work with DumpFileAsserts, because without this line, a seek back to 0 would yield different result and test would not work.

Looking forward to review, thanks!

@nift4 nift4 changed the title Support differentiating between DTS-HD and DTS Express in TS, MP4 and Matroska extractors Support DTS-HD mime type detection in ts extractor Apr 2, 2026
@rohitjoins
Copy link
Copy Markdown
Contributor

Also please note that the format = null; change is required (and not left over by mistake) in order to have the DTS-HD MA sample ever work with DumpFileAsserts, because without this line, a seek back to 0 would yield different result and test would not work.

Thanks for clarifying! While setting format = null; makes the test pass, we unfortunately can't use this in production.

Clearing the format on seek() forces the extractor to re-emit the format. This causes the player to reinitialize the audio decoder on every seek, which leads directly to audio dropouts and the stuttering reported IMO.

The correct way to make the DumpFileAsserts test pass is to avoid the two-step (Core -> HD) format emission entirely. If we wait to check for the extension substream before emitting the format (e.g., using a peek state), the extractor will only ever emit the final DTS-HD format. This ensures the initial playback and the seek() behavior match naturally, fixing the test without causing decoder resets.

@nift4 nift4 changed the title Support DTS-HD mime type detection in ts extractor Support DTS-HD in ts extractor Apr 2, 2026
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 2, 2026

Hi @rohitjoins,

The correct way to make the DumpFileAsserts test pass is to avoid the two-step (Core -> HD) format emission entirely. If we wait to check for the extension substream before emitting the format (e.g., using a peek state), the extractor will only ever emit the final DTS-HD format. This ensures the initial playback and the seek() behavior match naturally, fixing the test without causing decoder resets.

Yes, I did do this, as you can see on my final branch here it's no longer needed to reset the format:
https://github.com/nift4/media/blob/dtshdints-backup/libraries/extractor/src/main/java/androidx/media3/extractor/ts/DtsReader.java#L124

It's only in this PR because the DTS-HD MA sample (which is added to demonstrate the mime type change) is separated from the fix to emit format only once.

So, I guess it was the wrong order to merge changes. Instead, I will remove mime type change from the branch, and add format-only-once fix here. Then we can merge that first, then the mime type fix, and then the MKV/MP4 related changes; and format = null; won't be needed at any point in time.

I pushed this, so now the format-only-fix is in this PR. I opted for this wrapper in order to be minimally invasive, and most importantly, don't duplicate DTS related business logic. FongMi has an approach where there are a bunch of additional states added, which is a different set of tradeoffs for code flow with the same end result. Let me know if mine is OK or if you'd rather have the different state approach.

@rohitjoins
Copy link
Copy Markdown
Contributor

Hi @nift4,

Thanks for explaining your rationale. While I have not compared the performance differences for either of the approaches, I think it would make sense to avoid temporary copies in the memory before being flushed out to the actual output?

don't duplicate DTS related business logic

Can you please expand more on this? Can we avoid this by refactoring into reusable helper methods?

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 8, 2026

Hi @rohitjoins,

I think it would make sense to avoid temporary copies in the memory before being flushed out to the actual output?

Please note there is only one copy, for the very first core frame ,before the format is emitted. After format is emitted to ExtractorOutput, there is no copy anymore and buffer is released (the flush() calls are only used to trigger emitting sampleMetadata, and not sampleData, at this point), as I removed it since FongMi pointed it out. (edit: force pushed to fix a small bug causing useless memory copies)

Given the constraints that:

  • consume() in DtsReader cannot peek into future bytes, or go back
  • there is no guarantee we get all bytes of a frame or even just header on first invocation of consume()
  • we need to read extension substream's header before calling output.format() - and hence before calling output.sampleData()

...I think there is no way to get rid of this one temporary memory copy in any way.

Can you please expand more on this? Can we avoid this by refactoring into reusable helper methods?

We need to read extension substream's header before calling format(). That involves:

  1. first finding core syncword if any (there may be none for DTS Express or DTS-HD MA without backward compatible)
    a. parsing core header
    b. storing core sample data in temporary buffer
  2. then finding extension syncword if any (if there is another core one, emit the core sample data and format BEFORE parsing new one)
    a. then parsing its header
    b. call output.format() based on extension header data
    c. only now we can emit core sampleData()

This is the business logic I was talking about. Almost entirety of consume() method code is required for this job. But consume() method is optimized to directly pass data to ExtractorOutput, even though we cannot pass any data to it until we called format().

The difficulty in extracting helper method here is that consume() method may be called multiple times until we finish with any one specific aspect of this parsing. So there is a need to keep state in the class, as is currently done already.

But due to different needs of:

  • first frame (cannot call ExtractorOutput methods in any case and must cache data in the class with temporary buffer)
  • second frame (based on first frame type, we have to maybe parse header before emitting format, or maybe after, then emit old sample data if any, then proceed as normal)
  • subsequent frames (should instantly output data to ExtractorOutput without special logic)

I am not sure how code can be shared in elegant way. I'd be interested if you see any way to do it nicely.

Let me know if that's clear or if you have further questions, Thanks for the feedback!

@FongMi
Copy link
Copy Markdown

FongMi commented May 2, 2026

Sorry, I don't have the physical ISO file either, because the file is stored on Quark Cloud Drive in China, and you need a Chinese mobile phone number to register for the cloud drive.

@FongMi
Copy link
Copy Markdown

FongMi commented May 2, 2026

I can ask my users for the cloud drive URL, but you may need to find a way to download it yourself.😂

@FongMi
Copy link
Copy Markdown

FongMi commented May 2, 2026

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented May 2, 2026

Hi, I see, indeed I have no way to download from that site. Is it possible to use another site such as https://transfer.it/ ?

@FongMi
Copy link
Copy Markdown

FongMi commented May 2, 2026

I can't download it either, because I'm Taiwanese.😂

@nift4 nift4 force-pushed the dtshdints branch 3 times, most recently from 5bd1f1a to 4c17803 Compare May 7, 2026 16:53
@nift4 nift4 changed the title Support DTS-HD in ts extractor Support DTS-HD with packets across PES boundary May 7, 2026
nift4 and others added 2 commits May 11, 2026 16:18
In DTS-HD streams, it can happen that core packet is in one PES and the
extension substream is in another PES. Ensure we can read ahead into next PES
in this case. We need to use the old timestamp as extss and core always belong
together.
@rohitjoins
Copy link
Copy Markdown
Contributor

Hi @nift4,

I have added a test with single sample across PES boundary. When run on main branch, it would output two samples in the dump file as expected and the reason for stuttering.

It runs correctly to produce a single sample at this PR with the fixes.

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented May 11, 2026

Hi @rohitjoins,

I still don't have an affected sample myself (and I have to admit I'm a bit curious how you made this one), but it's good to know we can verify PR fixes it! Thanks again. It looks good to merge from my side.

@rohitjoins
Copy link
Copy Markdown
Contributor

I first tried using ffmpeg with small PES payload sizes, but it prefers to keep complete audio frames together. So I used a custom Python script to take a raw DTS frame and manually mux it into TS packets, forcing a new PES boundary right in the middle of the frame!

Based on internal review feedback to prevent stale timestamps overriding the clock after a sync loss.
@copybara-service copybara-service Bot merged commit 38f5c46 into androidx:main May 12, 2026
1 check passed
@nift4 nift4 deleted the dtshdints branch May 12, 2026 21:54
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented May 12, 2026

Thanks!

rohitjoins pushed a commit to nift4/media that referenced this pull request May 13, 2026
Read more of the DTS-HD header in order to find out extension
substream type, to get correct mime type which is relevant for
buffer size decision logic (as DTS Express has way lower maximum
bit rate than DTS-HD).

Issue: androidx#2487
Issue: androidx#3147
nift4 added a commit to nift4/media that referenced this pull request May 15, 2026
MP4 sadly doesn't store information required to understand which DTS format is
which either (similar to MKV), so use the helper to detect format in MP4 container.

Issue: androidx#3147
rohitjoins pushed a commit to nift4/media that referenced this pull request May 15, 2026
MP4 sadly doesn't store information required to understand which DTS format is
which either (similar to MKV), so use the helper to detect format in MP4 container.

Issue: androidx#3147
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants