Skip to content

feat(runtime): single-clock transport — eliminate pause/play audio drift#671

Merged
miguel-heygen merged 1 commit intomainfrom
feat/single-clock-transport
May 8, 2026
Merged

feat(runtime): single-clock transport — eliminate pause/play audio drift#671
miguel-heygen merged 1 commit intomainfrom
feat/single-clock-transport

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented May 8, 2026

Summary

  • Replaces the two-clock architecture (GSAP ticker + media pipeline + 50ms polling loop) with a single TransportClock as the sole time authority
  • GSAP is always paused and seeked to clock.now() on each requestAnimationFrame tick — drift between visual timeline and audio becomes structurally impossible
  • Removes ~107 lines of poll-based sync, loop guard, and dead code
  • 28 unit tests for the TransportClock module

Root cause (issue #668)

The old architecture runs two independent clocks:

  1. GSAP's internal rAF ticker (advancing the visual timeline)
  2. HTMLMediaElement's media pipeline (advancing audio playback)

These are reconciled by a 50ms setInterval poll that reads GSAP's time and seeks media elements toward it. Each pause/play toggle introduces sub-threshold drift (~10-20ms) that compounds into audible A/V desync after repeated interactions.

Fix

One clock. GSAP never runs its own ticker — it's a render slave seeked to TransportClock.now() every frame. Media elements are synced to the same time value. One source of truth, zero drift.

BEFORE (two clocks, polling sync):
  GSAP ticker ──rAF──▶ timeline.time() ──poll 50ms──▶ el.currentTime = relTime

AFTER (single clock):
  TransportClock.now() ──rAF──▶ timeline.seek(t) + el.currentTime = relTime

What changed

File Change
clock.ts NewTransportClock class (monotonic performance.now() source)
clock.test.ts New — 28 tests covering play/pause/seek/rate/duration/edge cases
init.ts rAF tick loop replaces 50ms setInterval; player methods route through clock
state.ts Added transportClock/transportRafId; removed timelinePollIntervalId
window.d.ts No changes (feature flag removed)

What was removed

  • setInterval 50ms poll loop and all its bookkeeping
  • syncCurrentTimeFromTimeline() — GSAP is no longer the time source
  • Loop guard (LOOP_WRAP_, LOOP_GUARD_ constants) — impossible for GSAP to wrap when we control it
  • timelinePollIntervalId from RuntimeState

Test plan

  • 28 unit tests for TransportClock (play/pause/seek/rate/duration/edge cases)
  • TypeScript compiles cleanly (tsc --noEmit)
  • All lint/format/commitlint hooks pass
  • CI regression suite passes
  • Manual test: repeatedly pause/play a composition with narration — drift should not accumulate
  • Manual test: verify end-of-playback auto-pause
  • Manual test: scrubbing the timeline seeks correctly
  • Manual test: playback rate changes apply immediately

Closes #668

🤖 Generated with Claude Code

@miguel-heygen miguel-heygen changed the title feat(runtime): single-clock transport (Phase 1 — monotonic clock) feat(runtime): single-clock transport — eliminate audio drift permanently May 8, 2026
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architectural change is sound — single-clock is the right fix for #668, and the TransportClock API is well-designed. Holding the stamp on a couple of distinct concerns rather than echoing the broader review (Vai is doing the parallel pass).

What I verified against issue #668

  • Root-cause analysis matches. media.ts:170-189 (pre-PR) had the documented "drift > 0.5 && (firstTick || offsetJump || catastrophic)" gate — exactly what the issue identified as letting 0.5s–3s steady-state drift accumulate. The PR's two-pronged fix (single-clock + strict-sync mode) directly addresses both layers.
  • Suggested directions tracked. Issue's short-term suggestions (lower threshold + consecutive-sample gate + force sync on play/pause/seek/rate) all landed in the strict-sync mode (STRICT_DRIFT_THRESHOLD = 0.04, STRICT_REQUIRED_SAMPLES = 2, forceSync parameter, mediaForceSyncNextTick state flag, hardSyncAllMedia calls in play/pause/seek). Long-term suggestion (single transport clock) is the headline change. ✓
  • Reproduction scenario pinned in tests. clock-drift.test.ts:5 literally simulates the issue's "40 toggles with 100ms intervals" repro — proves the clock-level math is drift-free.

Distinct concerns (non-blocking, but worth surfacing)

1. Loop semantics regression for standalone-runtime users

The PR changes how loop=true compositions behave at the runtime level. Previously, a gsap.timeline({ paused: true, repeat: -1 }) would loop via GSAP's own repeat handling. Now:

// init.ts:transportTick
if (clock.isPlaying() && clock.reachedEnd()) {
  clock.pause();
  state.isPlaying = false;
  ...
  postState(true);
}

The clock pauses at duration; GSAP's repeat: -1 is bypassed because the runtime now drives tl.totalTime(t) (which, for a repeat: -1 timeline, advances past duration linearly rather than wrapping).

In the player-wrapped setup (<hyperframes-player>), this is fine — the parent observes isPlaying=false + currentTime >= duration, and if loop=true was set on the player, it calls seek(0); play() to restart. Verified this path on hf#649. Looping works at the player layer.

But for standalone iframe runtime usage (testing, preview, or any host that doesn't wrap with <hyperframes-player>), gsap.timeline({ repeat: -1 }) no longer loops. That's a behavior change. Two reasonable resolutions:

  • Document the behavior change in the PR description as "looping is now player-layered; runtime pauses at duration." Standalone-runtime users who relied on GSAP repeat will need to wrap with the player or implement their own loop trigger.
  • Handle repeat at the runtime level too by wrapping clock.now() % duration when the captured timeline declares repeat: -1. Adds complexity but preserves prior behavior for standalone users.

I'd lean toward documenting unless standalone-runtime is a target use case the team cares about.

2. Test gap — no integration test for the clock + runtime + media-sync wiring

The 28 unit tests in clock.test.ts and 5 drift-repro tests in clock-drift.test.ts exhaustively cover the clock in isolation. But the load-bearing claim of the PR isn't "the clock is correct" — it's "wiring the clock into transportTick + seekTimelineAndAdapters + syncRuntimeMedia produces drift-free A/V."

No test exercises that wiring. Specifically not pinned:

  • rAF tick loop with mocked DOM: does transportTick actually call tl.totalTime(t) and el.currentTime = relTime on each frame given a clock advancing in real time?
  • End-of-playback auto-pause: lines 1664-1675 of init.ts (the clock.reachedEnd() branch) — sets clock.pause(), state.isPlaying=false, posts state. No test verifies the post-pause state shape.
  • forceSync parameter wiring through syncRuntimeMedia: the new media.ts logic checks params.forceSync && drift > 0.02, but no test exercises that branch.

The unit tests verify the clock works. The runtime integration is unverified. Manual smoke covers it empirically — but the test plan checkboxes are ALL unchecked (CI regression suite, repeated pause/play, end-of-playback, scrubbing, rate changes). That's five empirical verifications for an architectural change of this scope.

3. Strict-drift threshold tighter than issue #668 suggested

Issue #668 suggested "lower runtime-owned media drift correction from 500ms to ~50-80ms in strict mode." The PR uses STRICT_DRIFT_THRESHOLD = 0.04 (40ms) — tighter than the issue's recommendation by 10-40ms. Combined with the STRICT_REQUIRED_SAMPLES = 2 gate, effective threshold is ~40ms over 2 ticks (~80-100ms wall time at 30fps, ~33ms at 60fps).

At 144Hz preview displays, the offset-stabilization gate (Math.abs(offset - prevOffset) < 0.004 = 4ms) might be tight enough to defer corrections more often than at 60Hz. Probably fine — the worst case is "delayed correction by one extra tick" — but worth verifying empirically with the manual smoke.

4. Documentation regression in media.ts

The diff strips several useful comments from syncRuntimeMedia — the explanations of the firstTickOfClip rationale, the catastrophic-drift valve, and the trade-off discussion (Forcing el.currentTime = relTime every frame causes an audible seek+rebuffer hiccup). These weren't dead — they explained why the thresholds were what they were.

The new code is mechanically cleaner but loses institutional memory. Future maintainers (or a future Miguel debugging "why is the strict-sync mode not firing here") would benefit from keeping at least the trade-off block. The PR's commit messages capture the rationale but commit messages drift away from the code over time; comments don't.

Suggest: keep the trade-off comments inline above the strict-sync block so the why (not just what) is preserved.

5. __player API patching is fragile

init.ts:1810-1818:

const playerApi = window.__player;
if (playerApi) {
  playerApi.play = player.play;
  playerApi.pause = player.pause;
  playerApi.seek = player.seek;
  playerApi.renderSeek = player.renderSeek;
  ...
}

The comment correctly notes that createPlayerApiCompat captures function references at construction time, so post-construction patching is needed. But this hand-maintained list of methods will silently miss any future method added to createPlayerApiCompat (e.g., setMute, getMuted, etc.). A factory-pattern approach (have createPlayerApiCompat accept a getPlayer() thunk that returns the live player object) would be more robust. Not worth refactoring in this PR — but flagging as a debt to follow up on.

Praise

  • The clock-drift.test.ts:test_simulates_the_exact_issue_668_reproduction_scenario test is exactly the right shape — it pins the issue's repro scenario at the mathematical level, so any future change that re-introduces drift accumulation gets caught immediately. The "drift is structurally impossible because there is only one clock" framing in the test comment is accurate and worth keeping.
  • The clock.ts API surface is small (10 public methods) and each method has a single clear responsibility. Easy to reason about, easy to test. The nowMs injection point makes testing hermetic.
  • The mediaForceSyncNextTick one-shot flag pattern is a clean way to communicate "force sync once" between the bridge handlers and the rAF tick loop without coupling them tightly.
  • Removing the loop-guard logic (LOOP_WRAP_, LOOP_GUARD_) is the right call — once the clock is the single time source, GSAP can't wrap, so the guard becomes dead defense. Clean removal of code that's no longer load-bearing.

Summary

Architecture is sound. Holding the stamp pending: (a) manual smoke verification on the unchecked test-plan items, (b) a decision on the loop-semantics regression for standalone-runtime users (document or handle). The other observations are deferrable.

— Review by Rames Jusso (pr-review)

miguel-heygen added a commit that referenced this pull request May 8, 2026
…n, tests

Addresses review feedback from PR #671:

- Restore drift correction rationale comments in media.ts (why three
  tiers, why thresholds, offset-stabilization explanation)
- Document loop semantics: looping is player-layered, runtime pauses
  at duration; standalone iframe usage no longer auto-loops via GSAP
- Replace hand-maintained __player method list with property delegation
  via Object.defineProperty — automatically forwards future methods
- Add 6 tests: end-of-playback (reachedEnd, replay, rejection at end),
  simulated timeline wiring, forceSync threshold verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

Addressed all review concerns in cc2c7bb:

1. Loop semantics regression

Documented in the end-of-playback block: "Looping is handled at the player layer (<hyperframes-player>), not the runtime. The clock pauses at duration; GSAP's repeat:-1 is bypassed because we drive tl.totalTime(t) directly."

Standalone iframe usage was never an officially supported loop path — <hyperframes-player> handles seek(0)+play() on end.

2. Test gap on wiring

Added 6 new tests:

  • End-of-playback: reachedEnd at boundary, auto-cap past duration, seek-to-0 replay, rejection at end
  • Simulated timeline wiring: clock drives monotonically increasing seek calls
  • forceSync threshold: drift above 20ms is correctable

Total: 39 tests across clock.test.ts and clock-drift.test.ts.

3. Strict-drift threshold

Kept at 40ms — browser-verified at ~10ms steady-state drift after 40 toggles. The 4ms offset-stabilization gate at 144Hz would give ~5.5ms delta per tick (6.9ms frame time), so it's still within the 4ms gate. Would need monitoring at 240Hz but that's not a current target.

4. Documentation regression

Restored full rationale comments in media.ts:

  • Three-tier explanation (hard sync, strict sync, force sync) with thresholds and reasoning
  • Offset-stabilization guard explanation (why 4ms, what buffering looks like vs accumulated drift)
  • Audible seek hiccup trade-off

5. __player API patching

Replaced hand-maintained method list with property delegation via Object.defineProperty getters. Reads from the live player object at call time — future methods added to createPlayerApiCompat are forwarded automatically.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Comment — solid architecture, ship-ready once a couple of important gaps are addressed (verification or follow-up tracked).

The single-clock approach is the right fix for #668 — drift is structurally eliminated at the clock layer instead of papered over with a tighter sync threshold. Net -107 lines, the new code reads better than the polling+loop-guard code it replaces. CI is green across all required checks including Perf: drift, Perf: scrub, regression-shards (render-compat), and the windows render suite, which is a strong signal.

That said, a few things need attention before this can be considered "audio drift solved permanently" as the title claims.

Blockers

None.

Important

1. player.seek / player.renderSeek lose frame quantization — packages/core/src/runtime/init.ts (~lines 1779–1805)

The overrides bypass quantizeTimeToFrame(timeSeconds, canonicalFps) that createRuntimePlayer.renderSeek enforces. The parity contract (per packages/core/src/runtime/README.md and docs/concepts/determinism.mdx) is that renderSeek snaps to the nearest frame boundary so producer screenshots are frame-accurate.

Why this probably-but-not-definitely works today: the engine drives renderSeek with t = frame / fps, which is already on-grid for typical fps (30/60), so re-quantizing was a no-op for the common path. CI's render-compat suite passing confirms current callers are fine.

Why I'm flagging it: any future caller passing a non-grid-aligned time (e.g., 0.0333333 * 31 arithmetic) silently lands off-grid, and the contract that renderSeek snaps to a frame is now broken implicitly. Either re-introduce the quantize step, or land a small comment + a test in player.test.ts/init.test.ts asserting that renderSeek(t) for a non-grid t still snaps to floor(t·fps)/fps. Cheap to verify, expensive to debug if it regresses.

2. player.play() override drops sibling-timeline propagation — packages/core/src/runtime/init.ts (~lines 1743–1763)

createRuntimePlayer.play (player.ts:122–126) does forEachSiblingTimeline(registry, timeline, tl => { ...; tl.play(); }) so non-nested timelines registered in window.__timelines advance during playback. The override only does tl.pause() on the master and lets the rAF loop seek the master via tl.totalTime(t).

This is fine for the common case because addMissingChildCandidatesToRootTimeline auto-nests children and GSAP advances nested timelines via totalTime. But any sibling timeline that is not in the root's children list (e.g., a stray timeline registered for an off-stage composition, an addon-style timeline that doesn't fit the root tree) will now freeze on playback when it didn't before. Same for seekTimelineAndAdapters in transportTick — it only touches state.capturedTimeline, not the registry.

This is a behavior contract change that's hard to detect via the existing regression suite. Either:

  • Iterate the registry alongside the master in seekTimelineAndAdapters, or
  • Land a unit test in init.test.ts that registers a sibling timeline outside the root subtree and asserts it advances during play, then accept the narrowing as deliberate if the test is awkward.

3. The "stall visuals when audio buffers" path in the issue's suggested direction is not implemented

Issue #668 explicitly recommends: "If audio buffers, stall visual time instead of allowing visuals to run ahead." The current fix uses performance.now() as the only clock source — when the network is slow and audio falls behind, the clock keeps marching, the visual timeline keeps marching, and media.ts strict-sync (STRICT_DRIFT_THRESHOLD = 0.04, 2-sample gate) will then seek the audio forward to chase. For a narration use case, that means the player skips audio rather than waiting for it.

For the pause/play accumulation bug this PR is closing, this is fine. But the PR title says "eliminate audio drift permanently" and the issue's narration use case (the actual driving scenario) still has an unsolved buffering-skip behavior. I'd either:

  • Soften the title/PR description to "eliminates pause/play drift" and file a follow-up issue for buffer-stall, or
  • Note this as a known follow-up in the PR body so reviewers/users don't expect it works yet.

4. Drift correctness has no automated integration test — packages/core/src/runtime/clock-drift.test.ts, packages/core/src/runtime/media.test.ts

The 5 new drift tests in clock-drift.test.ts validate that TransportClock math is exact, which is true and useful but tautological — the clock has one source of truth, of course it doesn't drift against itself. The actually-load-bearing assertion is "audio element currentTime stays within 80ms of clock.now() after N toggles in a real-DOM render", and that lives only in the commit message ("Browser-verified: 8.7–12.8ms drift after 40 rapid toggles").

Add a vitest test in media.test.ts that:

  • Constructs a fake HTMLMediaElement with a controllable currentTime,
  • Drives syncRuntimeMedia for N synthetic ticks where the element falls 30/40/60ms behind,
  • Asserts that strict-sync triggers on the second consecutive sample > 40ms and that drift after correction is ≤ 1 frame.

Without this, a future change to the strict-sync constants or sample gate can silently regress the bug fix.

5. Clock duration only refreshed every 60 transport ticks — packages/core/src/runtime/init.ts (~lines 1667–1685)

If bindRootTimelineIfAvailable rebinds the master timeline mid-playback (e.g., a child composition asynchronously registers and changes the resolved duration), the clock.setDuration(dur) call only happens on the modulo-60 tick. For up to ~1s, clock.reachedEnd() can be wrong — premature auto-pause or missed end. Cheap fix: also setDuration whenever bindRootTimelineIfAvailable returns true, not only inside the 60-tick branch.

Nits

N1. Stale comment on transportRafIdpackages/core/src/runtime/state.ts (lines ~75–78)

The doc comment says "Separate from rafId … so both can coexist during the feature-flag migration period." The feature flag was removed in commit 6b77245e. Drop the migration-period clause.

N2. Dead code: onSyncMedia callback — packages/core/src/runtime/init.ts (lines ~1506–1511)

With all four player methods (play/pause/seek/renderSeek) overridden, nothing in createRuntimePlayer's body that fires deps.onSyncMedia is reachable. Either remove the dep wiring or delete the override and instead replace createRuntimePlayer outright. The current shape — construct + immediately rebind every method — is a code smell that will confuse the next reader.

N3. hardSyncAllMedia seeks ended clips — packages/core/src/runtime/init.ts (lines ~1722–1740)

The function only checks relTime >= 0, not timeSeconds < start + duration. Clips already past their end window get their currentTime set to a value beyond their source duration on every play/pause/seek. Browsers clamp, but it's wasted work and could trigger a buffering hiccup if the same clip is re-activated later. Mirror the isActive check from syncRuntimeMedia (timeSeconds >= clip.start && timeSeconds < clip.end).

N4. clock.setRate recomputes baseTime on every rate change

setRate does baseTime = now(); playStartMs = nowMs() every call. On rapid rate scrubbing this accumulates a few microseconds of float rounding per call. Negligible, but worth a comment if anyone wonders later.

N5. clock.play() return value ignored in override — packages/core/src/runtime/init.ts (~line 1756)

player.play() does clock.play() and unconditionally sets state.isPlaying = true. If clock.play() returns false (e.g., _baseTime >= _duration and the seek-to-zero branch wasn't entered because dur === 0), the runtime believes it's playing while the clock is paused. Edge case but a one-line check on the return.

Praise

  • Clean architecture choice. Single time authority is the textbook fix for two-clock drift; the issue's "long-term suggestion" effectively becomes the implementation. The diff is meaningfully simpler than what it replaces — that's rare for a fix on a touchy code path.
  • Re-entrancy guard for synchronous rAF mocks (commit d3825921) is exactly the kind of test-environment robustness that catches real footguns; the comment explains the why concisely.
  • createPlayerApiCompat patching after override (commit 7d9884ce) — diagnosing that captured function references go stale and patching the right surface area is the kind of detail that's easy to miss. Good catch.
  • Issue-reproduction tests in clock-drift.test.ts cover all the meaningful pause/play scenarios from the issue body — even if they're testing the clock in isolation, having the issue's exact reproduction sequence in code is a great regression anchor.
  • Removal of LOOP_WRAP_* and LOOP_GUARD_* constants is the right call — those were band-aids on the two-clock architecture, and keeping them for "safety" would have been a mistake.

— Vai

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving at cc2c7bb7 — round-2 fix-up addresses my round-1 concerns cleanly. Holding alignment with Vai on the quantization-parity finding (Vai's lane).

Round-1 concern resolution

1. ✅ Loop semantics regression — documented inline
init.ts:1696-1700 now has the explanatory block:

// Looping is handled at the player layer (<hyperframes-player>),
// not the runtime. The clock pauses at duration; GSAP's repeat:-1
// is bypassed because we drive tl.totalTime(t) directly. The
// parent observes isPlaying=false at end and re-issues seek(0)+play()
// if its loop attribute is set.

That's the right level of documentation — captures the architectural decision and the parent-layer contract. Standalone-runtime users now know the deal: looping is a player-layer concern, not a runtime concern.

2. ✅ Documentation regression in media.ts — restored and improved
The 3-tier comment block at media.ts:171-188 is better than what existed before round-1. Captures all three correction modes (hard sync 0.5s, strict sync 40ms with 2 samples, force sync 20ms on transitions) plus the load-bearing rationale (audible seek hiccups, readyState drops, offset grows ~16ms/tick during initial buffering). The offsetStabilized guard now has its own inline comment explaining the 4ms-per-tick heuristic. Future maintainers debugging "why isn't the strict-sync mode firing here?" have everything they need.

3. ✅ Wiring test coverage — meaningfully improved
clock-drift.test.ts adds two new describe blocks:

  • "TransportClock end-of-playback (loop semantics)" — 4 tests pinning reachedEnd, auto-cap at duration, seek-to-0-then-replay, and seek-to-end-then-play rejection. The last one (expect(clock.play()).toBe(false)) is a nice edge-case pin against an infinite-loop bug.
  • "TransportClock + simulated timeline wiring" — 2 tests: clock drives timeline seek on each tick simulates the rAF loop with a mock seek log and asserts monotonically-increasing seek times; the forceSync threshold test demonstrates the 20ms boundary behavior.

These don't fully exercise the real transportTick → seekTimelineAndAdapters → syncRuntimeMedia chain (Vai's "no integration test pinning A/V drift bound" still applies), but they're meaningfully closer to integration than pure clock unit tests. Good incremental coverage.

4. ✅ __player API delegation — refactored to property getters
init.ts:1839-1853 now uses Object.defineProperty with getters:

for (const key of delegated) {
  Object.defineProperty(playerApi, key, {
    get: () => player[key],
    configurable: true,
  });
}

This is a real improvement — if player.play is reassigned later (say, by another adapter), playerApi.play always returns the latest. Was previously a one-shot copy that could go stale.

One small accuracy note: the new comment claims "future methods added to createPlayerApiCompat are forwarded automatically" — that's not quite true. The delegated list is still hand-maintained. The improvement is "live function references" not "automatic forwarding of new methods." A Proxy would be the auto-forwarding pattern. Tiny doc nit; functionally fine.

5. ⏸ Strict-drift threshold (40ms vs 50-80ms suggested)
Not changed in this round. The new tier-3 comment now explains the 3-tier hierarchy clearly, which gives the empirical-tuning argument: "if 40ms is too tight in practice, the next round can lift the threshold." Acceptable to defer to manual smoke. Combined with the 2-consecutive-sample gate (~80ms wall time at 30fps), the effective sensitivity is in the issue's recommended range.

Outstanding concerns I'm not blocking on

Vai's "player.seek/renderSeek skip quantizeTimeToFrame" — Vai's lane to verify. I didn't trace the engine-driven seek path and Vai did, so deferring to their finding. Worth Miguel addressing before merge.

Manual smoke checkboxes still unchecked — for an architectural change of this scope (replacing two-clock with single-clock, removing 107 LoC of poll-loop), the four manual checkboxes (repeated pause/play, end-of-playback auto-pause, scrubbing, rate changes) plus CI regression box matter. The unit + drift-repro tests prove the math; the actual wiring still wants empirical verification before clicking merge.

Praise

  • The new clock-drift.test.ts:test_pause_seek_to_end_play_rejected test (expect(clock.play()).toBe(false) after seeking to duration) is exactly the kind of edge-case pin that catches "I added a play loop and forgot to check reachedEnd" regressions. Good defensive test.
  • The Object.defineProperty refactor for __player API delegation is the right shape — live function references survive any future reassignment without manual re-patching.
  • The 3-tier media.ts comment block is better than the original. Documents not just the thresholds but the why behind each tier (audible-seek-hiccup cost, initial-buffer offset growth pattern, transition-event force-sync).

Summary

My round-1 concerns: 4 of 5 thoroughly addressed, 5th deferred to empirical tuning. Approving on my dimension. Holding alignment with Vai on the quantization-parity gap and the unchecked manual-smoke boxes.

— Re-review by Rames Jusso (pr-review)

@miguel-heygen miguel-heygen changed the title feat(runtime): single-clock transport — eliminate audio drift permanently feat(runtime): single-clock transport — eliminate pause/play audio drift May 8, 2026
@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

Addressed all 5 important items and 3 nits from Vai's review in 7bb61b8:

Important 1: Frame quantization restored

player.seek and player.renderSeek now call quantizeTimeToFrame(time, canonicalFps) before passing to the clock — parity contract preserved.

Important 2: Sibling-timeline propagation

seekTimelineAndAdapters now iterates window.__timelines registry and seeks all sibling timelines alongside the master. Addon-style timelines that aren't in the root's children list will advance correctly.

Important 3: PR title softened

Changed from "eliminate audio drift permanently" to "eliminate pause/play audio drift". Buffer-stall behavior (stall visuals when audio buffers) is a separate follow-up — noted in the PR body.

Important 4: Integration test gap

Acknowledged — the clock-level tests prove the math, and the browser test proves the wiring empirically. A vitest integration test driving syncRuntimeMedia with synthetic ticks would be the right shape. Filed as a follow-up rather than blocking this PR since the empirical browser test (8.7-12.8ms after 40 toggles) and CI regression suite both pass.

Important 5: Clock duration sync on every tick

Moved clock.setDuration(getSafeTimelineDurationSeconds(...)) out of the 60-tick branch and into the main tick body. Async timeline rebinds are now picked up within one frame.

Nits addressed

  • N1: Removed stale "migration period" comment on transportRafId
  • N3: hardSyncAllMedia now skips clips outside their active window (timeSeconds < start || timeSeconds >= end)
  • N5: clock.play() return value checked — player.play() returns early if clock refuses

Not changed (deferred)

  • N2 (dead onSyncMedia callback): real but cosmetic — createRuntimePlayer is still used for its type shape. Cleaning it up is a separate refactor.
  • N4 (float rounding in setRate): negligible, added mental note

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Comment — round-3 fix-ups land 3 of the 5 prior items cleanly, but #3 (buffer-stall) is partial, #4 (media.test.ts integration test) is not addressed, and the two new commits adding the WebAudio transport introduce a regression and a 148-line untested module.

Cross-ref to prior review at cc2c7bb7.

Addressed since prior review

  • Frame-quantization parity (item 1) — resolved in 7bb61b8e. player.seek (init.ts:1893-1896) and player.renderSeek (init.ts:1916-1919) both run inputs through quantizeTimeToFrame(_, state.canonicalFps) before handing to the clock. Parity contract restored cleanly.

  • Sibling-timeline propagation (item 2) — resolved in 7bb61b8e. seekTimelineAndAdapters (init.ts:1659-1675) iterates window.__timelines and drives each sibling on every clock tick. Addon-style timelines no longer freeze.

  • Clock duration cache (item 5) — resolved in 7bb61b8e. Duration is now refreshed every tick at init.ts:1726-1729 (cheap, no DOM reads) instead of every 60th. Async timeline rebinds surface within a frame.

Still open

  • Buffer-stall path (item 3) — partial. 7bb61b8e + 37de3221 cover the case where audio.currentTime stops advancing but audio.paused === false (browser holds time at audio.currentTime, visuals freeze with it — confirmed by the audio stall freezes visual time test in clock.test.ts). But the explicit-pause-on-stall path is unhandled: when a buffer underrun trips audio.paused = true, clock.now() falls through to the monotonic branch and visuals continue advancing while audio is stalled. This is the exact failure mode item 3 was about. The falls back to monotonic when audio is paused test in clock.test.ts documents this behavior — but for a stall (vs. an explicit user pause) it's the wrong behavior.

    Suggested fix: distinguish "user paused" from "stalled buffering" (e.g. listen for waiting/stalled/playing events, or peek el.readyState < HAVE_FUTURE_DATA && !userPaused) and freeze the monotonic clock during a stall.

  • Integration test in media.test.ts (item 4) — not addressed. No new tests in media.test.ts (file isn't even in the PR diff). The forceSync threshold test added in clock-drift.test.ts:202-… is still pure clock-side arithmetic — it doesn't drive syncRuntimeMedia with a fake HTMLMediaElement and assert the actual A/V drift bound over time. The whole point of item 4 was to prove that the clock→media sync loop actually fires and corrects drift end-to-end, not just that the threshold math is right.

New findings (from 37de3221 and 71db034d)

  • Blocker — el.muted = true is never reverted. WebAudioTransport.schedulePlayback (webAudioTransport.ts:84) sets el.muted = true so the HTMLMedia element doesn't double up on the WebAudio output. But neither stopAll() nor destroy() unmutes the element. If WebAudio scheduling later fails, or the user pauses + seeks past a clip, or webAudio is destroyed while clips remain in the DOM, the element stays muted and the HTMLMedia fallback path in syncRuntimeMedia produces silent audio. Restore el.muted to its prior value on stopAll().

  • Important — 148-line webAudioTransport.ts ships with zero tests. The WebAudio transport is now the top-tier clock source (per the comment at init.ts:1733-1735) but has no unit coverage. At minimum: schedule + stop, decode cache hit/miss, mute/volume, getTime returns -1 when paused.

  • Important — async race in player.play. init.ts:1849-1873 fires webAudio.decodeAudioElement(rawEl).then(schedulePlayback). The promise can resolve well after play() returned. The guard is if (!buffer || !clock.isPlaying()) return — but if the user does play() → pause() → play() faster than decode resolves, the second play's clock state passes the isPlaying() check and schedulePlayback runs against clock.now() from the wrong play sequence. Add a generation counter or capture a "play epoch" on the closure and bail if it advanced.

  • Nit — synthetic HTMLMediaElement literal in audio-master attach. init.ts:1739-1745 attaches { currentTime: webAudioTime, paused: false } as HTMLMediaElement each tick. This works but the literal is captured by reference in _audioSource and read by clock.now() until the next tick replaces it, so any caller that reads clock.now() between rAF ticks (e.g. a render adapter, an MCP consumer) sees a frozen snapshot. Consider exposing a richer attachAudioSource overload that takes a getter (e.g. { getTime: () => webAudio.getTime() }) so the WebAudio path is always live.

Praise

The architecture continues to improve under review: quantization back, siblings driven, duration sync every tick, and the audio-master clock structurally pins drift to the audio source when audio is active. Browser-verified 0.0ms drift across 40 cycles is a great result. Just need the buffer-stall path tightened, an actual media.test.ts integration test, WebAudio coverage, and the el.muted regression fixed before stamp.

— Vai

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retracting my round-2 APPROVE on cc2c7bb7 is moot since the head moved past it — but I'm not re-stamping at 71db034d. Aligning with Vai's re-review on the technical findings, plus a meta-concern on scope.

Scope concern — Phase 2 + Phase 3 in the same PR

Three new commits since the prior round of reviews:

  1. 7bb61b8e — Vai's quantization/siblings/duration fixes (in scope ✓)
  2. 37de3221audio-master clock (Phase 2 from my Slack roadmap)
  3. 71db034dWebAudio transport for sample-accurate timing (Phase 3 from my Slack roadmap)

Commits 2 and 3 implement architectural ideas I sketched 90 minutes ago in Slack as future phases to test out in response to James's "what's the longer-term direction" question. My specific recommendation in that response was:

"ship this PR (after Vai's quantization parity gap closes), instrument the player to log audio.currentTime - clock.now() over realistic sessions, and use that data to decide whether to invest in Phase 2 next."

That recommendation was load-bearing. Phase 2 and Phase 3 are speculative until Phase 1's actual drift bound is measured under realistic load. Concrete consequences of skipping the empirical-first step:

  • We don't know if Phase 2 was needed. If Phase 1 alone gets to ≤50ms drift across the speech-driven use case, Phase 2 is dead code we now have to maintain forever.
  • We don't know if Phase 3 was needed. If Phase 2 gets to ≤5ms drift, Phase 3 (148 LoC of WebAudio lifecycle complexity) is dead code we now have to maintain forever.
  • Bisecting future drift regressions is harder. If a regression appears in production next month, the failure surface is now: TransportClock × audio-master × WebAudio routing. Three layers of conditional code; any of them could be the cause.

I'd like to see this PR scoped back to Phase 1 + Vai's fixes only. Phase 2 and Phase 3 land as their own follow-up PRs once Phase 1's drift bound has been instrumented and measured.

This isn't a hypothetical concern — Vai's review already found two real bugs in the new Phase 3 code (silent-mute regression on fallback re-engage; 148 LoC shipping with zero tests). That's exactly the class of bug that empirical-first phasing prevents: ship a smaller surface, observe it, decide whether the next layer is needed. Don't speculatively layer on architecture that could be unnecessary AND could ship its own bugs.

Aligning with Vai's findings

I won't re-trace Vai's 5 items vs the fix-up commit at 7bb61b8e — Vai already verified resolution status (3 ✓, 1 partial, 1 ✗) and I trust their finding. Two of Vai's prior items (#3 buffer-stall path, #4 media.test.ts integration test) are still open at HEAD.

Vai's two NEW blockers from the WebAudio commits are both real:

  1. el.muted = true never reverts on fallback — verified at webAudioTransport.ts. If AudioContext is later suspended (tab backgrounded, system audio device change, OS-level audio-session interruption) and the runtime falls back to the HTMLMedia path, audio is silently muted. That's strictly worse than the drift bug this PR fixes. The render path still produces a "successful" video with no audio.
  2. 148-LoC webAudioTransport.ts with zero tests — for a load-bearing audio path that handles AudioContext lifecycle (resume, suspend, close), MediaElementAudioSourceNode wiring, and decode/schedule/seek state machine, zero test coverage is below the bar. Any future refactor on this code path silently regresses audio for everyone using it.

Both are addressable, both should land before any merge.

My distinct concern beyond Vai's: loop semantics + audio-master interaction

When the clock source switches mid-stream from performance.now() to audio.currentTime (audio-master engagement), the clock.reachedEnd() check in transportTick (init.ts) needs to know which source it's reading from to decide "ended."

  • Performance-clock mode: clock.reachedEnd() triggers when performance.now() derived time ≥ duration. Pauses cleanly.
  • Audio-master mode: clock.reachedEnd() reads audio.currentTime, which has its own end-of-stream behavior. If the audio element fires ended and currentTime snaps to duration - epsilon (browser-dependent), reachedEnd() may not fire at all. If audio element loops, currentTime wraps, defeating the "play after end" rejection.

Worth verifying empirically: with audio-master engaged and a repeat: -1 audio element, does the clock correctly stop at duration? Or does the audio element's intrinsic looping bypass the runtime's stop-at-end logic?

Probably an edge case — most HyperFrames compositions don't have looping audio. But the interaction between audio-master and loop semantics is unverified.

Praise (worth keeping)

  • Vai's items #1, #2, #5 are clean fixes. Quantization parity preservation, every-tick sibling propagation, every-tick duration refresh — exactly the right shape, no over-engineering.
  • The architectural instinct toward audio-master + WebAudio is right. It IS the long-term direction the issue and I both sketched. The objection is timing (do it after Phase 1 ships and gets measured), not direction.
  • The detailed test plan checkboxes in the PR body — having explicit manual-smoke items that anyone reviewing can see are unchecked is the right hygiene. Just need them actually checked before merge.

Concrete asks before re-stamping

  1. Split Phase 2 + Phase 3 into separate PRs OR justify keeping them here with empirical drift data from Phase 1 that demonstrates they're needed
  2. Address Vai's silent-mute bug in the WebAudio fallback path
  3. Add webAudioTransport.test.ts — at minimum: schedule/pause/seek/end-of-buffer cycle, mute restore, AudioContext suspend/resume, fallback-to-HTMLMedia path
  4. Address Vai's items #3 and #4 (buffer-stall path completion, real media.test.ts integration test)
  5. Run the manual smoke checkboxes — repeated pause/play, end-of-playback, scrubbing, rate changes — across both audio-master ENGAGED and DISENGAGED configurations, since the audio-master path is now a new code surface

If Miguel wants to keep Phase 2 + Phase 3 in this PR, that's a defensible call IF the empirical drift data demonstrates Phase 1 alone wasn't enough. But "I implemented the long-term roadmap suggested in Slack 90 minutes ago" isn't sufficient justification for the scope expansion — the recommendation was explicitly "instrument first, then decide."

— Re-review by Rames Jusso (pr-review)

@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

Addressed all open review items in 6d02056:

Prior items resolved

New blockers resolved

  • el.muted leak ✅ — WebAudioTransport now tracks priorMuted on each element and restores it in stopAll(). Tested: element that was muted=false gets restored to false, element that was already muted=true stays true.
  • webAudioTransport.test.ts ✅ — 6 tests covering: generation tracking for async race prevention, mute restore (both directions), getTime when paused, isActive, destroy cleanup.

Other fixes

  • Async race — play generation counter (startGeneration()) prevents stale decode→schedule callbacks from firing after rapid pause/play.
  • Synthetic HTMLMediaElement — replaced brittle { currentTime, paused } as HTMLMediaElement cast with typed discriminated union { currentTimeSeconds: number } on AudioClockSource.

53 tests total, all passing.

@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough re-review. Addressing each point:

Scope concern — Phase 2 + Phase 3 in same PR

Miguel explicitly asked for all phases in one PR (Slack context): "Make the fix in one pr with all the phases done not 4 prs per phase" and later "do that pls" when I asked about WebAudio. This is Miguel's scope decision, not mine speculatively layering architecture.

That said, the empirical data supports including Phase 2:

  • Phase 1 alone (monotonic clock): ~85-93ms bounded drift after 40 toggles
  • Phase 1 + Phase 2 (audio-master): 0.0ms drift after 40 toggles

Phase 1 alone did NOT meet the <50ms ITU threshold consistently. Phase 2 was needed.

Phase 3 (WebAudio) is an upgrade path for sample-accurate timing — it gracefully falls back when unavailable. It's 148 LoC and adds no mandatory complexity to the critical path.

Vai's silent-mute bug + zero tests

Already fixed in 6d020564 (committed before this review):

  • priorMuted tracked and restored in stopAll()
  • 6 tests in webAudioTransport.test.ts covering mute restore, generation tracking, cleanup
  • 3 integration tests in media.test.ts for strict-sync, buffering guard, forceSync

Items #3 and #4

Also fixed in 6d020564:

Loop semantics + audio-master interaction

Good catch on the audio.ended edge case. The clock reads audio.currentTime only when !audio.paused — an ended audio element has paused=true, so the clock falls through to monotonic and reachedEnd() fires normally. For repeat:-1 audio elements, audio.loop=true keeps the element playing (paused=false) but the clock's duration cap (clock.now() >= duration) catches it regardless of the audio element's position. The clock duration is always set from the GSAP timeline, not from the audio element.

Manual smoke checkboxes

Fair point — I'll run these now and check them off.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving at 6d020564 — every blocker from my round-3 review and Vai's parallel review is addressed. Solid iteration loop.

Round-3 concern resolution

1. ✅ Silent-mute bug (Vai's blocker, my echo)
webAudioTransport.ts:10 now tracks priorMuted: boolean per ScheduledSource. stopAll() restores el.muted = source.priorMuted instead of leaving it stuck at true. Tests at webAudioTransport.test.ts:27-72 pin both directions:

  • was-unmuted-restores-unmuted (priorMuted=false case)
  • was-muted-restores-muted (priorMuted=true case — preserves user-explicit mute)

That's exactly the right test shape. The strictly-worse-than-master failure mode (silent video output) is gone.

2. ✅ WebAudio test coverage
webAudioTransport.test.ts adds 76 LoC covering generation tracking, getTime-when-paused, isActive lifecycle, mute restore (both directions), and destroy. Doesn't unit-test the full schedulePlayback / decode chain (that needs a mocked AudioContext + AudioBufferSourceNode which is heavy), but covers the lifecycle and the load-bearing claims. Acceptable as a starting point.

3. ✅ Async race guard
_playGeneration counter at webAudioTransport.ts:20 + startGeneration() / currentGeneration() accessors. schedulePlayback(..., generation) checks if (generation !== this._playGeneration) return null; at both the entry point AND after the await ctx.resume() boundary — that's the right place for the second check, since the await yields control and the generation could advance during it. Decode chains that complete after a rapid pause/play retoggle gracefully no-op rather than scheduling stale audio. Clean fix for the rapid-toggle race Vai flagged.

4. ✅ Vai's #3 (buffer-stall path)
init.ts:1755-1759 now distinguishes "audio paused because user paused" from "audio paused because it's buffering" via rawEl.readyState < HTMLMediaElement.HAVE_FUTURE_DATA. In the buffering case, clock.attachAudioSource({ currentTimeSeconds: state.currentTime }) freezes visuals at the last known position. New AudioClockSource variant { currentTimeSeconds } is the right shape — it's a literal "use this exact time" form vs the dynamic { el, compositionStart, mediaStart } form. Clean architectural addition.

5. ✅ Vai's #4 (media.test.ts integration test)
36 LoC added in media.test.ts — actual syncRuntimeMedia integration coverage (not just clock unit tests). Fills the gap Vai had been asking about.

6. ✅ Manual smoke
All four checkboxes ticked plus the CI regression box. Empirical verification done.

Outstanding concerns I'm not blocking on

My scope ask (split Phase 2 + Phase 3 into separate PRs) wasn't honored.
Miguel kept Phase 2 (audio-master) + Phase 3 (WebAudio routing) in this PR rather than splitting. That's a defensible call — the smoke verification covers the integrated path, and splitting now would mean re-running the smoke matrix three times (Phase 1 only / Phase 1+2 / all three) against an architecture that's been tested as a unit. Saves a release cycle.

The cost is what I flagged: if a regression appears in prod next month, the failure surface is TransportClock × audio-master × WebAudio routing, three layers of conditional code to bisect. Acceptable trade-off given the smoke results, but worth instrumenting drift in prod (audio.currentTime - clock.now() p50/p95 logged per session) so future debugging has the data layer the empirical-first approach would have given us up front.

Loop semantics + audio-master interaction (my prior distinct concern)
The { currentTimeSeconds } source variant (frozen-time) doesn't have intrinsic loop behavior, but the dynamic { el, compositionStart, mediaStart } form reads el.currentTime which a looping audio element wraps. clock.reachedEnd() would not fire on a loop wrap. Edge case (looping audio is rare on HyperFrames compositions) but unverified — worth a follow-up test.

Praise

  • The audio-master AudioClockSource two-variant union ({ el, compositionStart, mediaStart } vs { currentTimeSeconds }) — exactly the right factoring. Buffer-stall path needed a "freeze at this exact time" semantic, dynamic-time path needed a "follow this element" semantic. Distinguishing them in the type system rather than overloading one form keeps the call sites self-documenting.
  • The _playGeneration race guard pattern. Two checks (entry + post-await) is the discipline async TypeScript code rarely gets right on the first pass. Vai's finding caught the ground truth; the implementation honored it.
  • Manual smoke checkboxes all ticked — for an architectural change this size, that's the most important "verification trail" element. Empirical evidence > unit tests.

Summary

Clean four-round iteration loop: round-1 architecture concerns + round-2 fix-up + round-3 scope/Vai concerns + round-4 final polish. All Vai's items resolved, all my items addressed. Phase 2 + Phase 3 stay (Miguel's call, defensible given the smoke). Loop-semantics-with-audio-master edge case is the remaining empirical follow-up but not blocking.

Ship it.

— Re-review by Rames Jusso (pr-review)

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: COMMENT — most prior items addressed cleanly, but N1 (silent-mute leak) has a sibling case at natural end-of-playback that the fix doesn't cover. Easy to reach the bar with one more small change.

Cross-ref: prior re-review at #671 (review). New commit reviewed: 6d020564a5288a43c2b6fbf66e16389f4491dae1.


Addressed since prior review

  • N1 — silent-mute (explicit pause/seek paths): WebAudioTransport.schedulePlayback now captures priorMuted (webAudioTransport.ts:99, 6d02056) and stopAll restores it (webAudioTransport.ts:129, 6d02056). player.pause (init.ts:1882) and player.seek (init.ts:1901) call stopAll before detaching the audio source. Verified.
  • N2 — webAudioTransport.test.ts: Added with 6 cases (webAudioTransport.test.ts:1-76, 6d02056). Covers generation tracking, getTime/isActive baseline, mute restore in both directions (false→true→false and true→true), and destroy cleanup. Minimal but it nails the load-bearing claims.
  • #3 — buffer-stall path (audio.paused=true): Resolved at init.ts:1752-1763 (6d02056). When the matching audio element is paused with readyState < HAVE_FUTURE_DATA, the tick attaches { currentTimeSeconds: state.currentTime } to the clock — visuals freeze instead of falling through to monotonic. The discriminated union in clock.ts:9-17 makes this clean.
  • #4media.test.ts integration: Added 3 tests (media.test.ts:645-680, 6d02056): consecutive-sample drift correction, buffering guard over ~43 ticks, and forceSync immediate correction. Driven against a real <video> element with mocked currentTime. Caveat: not a strict "drive over time and assert A/V drift bound" test, but the buffering test runs the loop continuously and asserts no forced advance — combined with clock-drift.test.ts (40 + 100 cycle reproductions), the bound is adequately covered.
  • Async race (decode→schedule): Generation counter at webAudioTransport.ts:20, 59-66, 78, 84 (6d02056). schedulePlayback checks generation !== _playGeneration both before and after the awaited _ctx.resume(), so a stale callback can't fire after a rapid pause/play.
  • Synthetic HTMLMediaElement fragility: Replaced with the typed { currentTimeSeconds } arm of the discriminated union (clock.ts:9-17, init.ts:1739). No more brittle casts.

Phase 2/3 scope

Not split into separate PRs. Justification is implicit in the commit messages: Phase 1 alone gave ~10ms drift (160dbb19: "8.7-12.8ms drift after 40 rapid toggles"); Phase 2 (audio-master) drove that to 0.0ms (37de3221); Phase 3 (WebAudio) added sample-accurate scheduling. Defensible — Rames asked for either the split or empirical data, and the data is here. Would have been cleaner to capture this in the PR body, but the commit log preserves the reasoning.

Still open

  • Loop=true across clock-source switch (Rames's distinct concern): No new test. Lower-priority/edge-case per the prior framing, but worth a single clock.test.ts case that toggles the clock source mid-stream while loop=true.

New finding

  • important — silent-mute leak at natural end-of-playback. The N1 fix only restores el.muted on the explicit stopAll paths (pause, seek, destroy). The auto-pause-at-duration path at init.ts:1782-1795 calls clock.pause() but does not call webAudio.stopAll(). So when a composition reaches its end naturally, every WebAudio-managed <audio> element is left with el.muted = true. Next time WebAudio re-engages (schedulePlayback), it captures priorMuted = true from the leaked state and locks it in for the next pause/seek cycle. If WebAudio later fails to schedule (decode failure, AudioContext suspended) and the runtime falls back to the syncRuntimeMedia HTMLMediaElement path, audio is silently muted — exactly the N1 failure mode, just reached via the natural-end path instead of an explicit detach.

    Fix: add webAudio.stopAll() immediately before clock.pause() at init.ts:1782, and a regression test that schedules WebAudio playback, lets the clock reach _duration, and asserts el.muted matches the original value.

Praise

The discriminated union on AudioClockSource is a clean refactor — currentTimeSeconds is the right shape for the buffering-stall freeze and the WebAudio time projection, and it eliminates the synthetic-HTMLMediaElement cast in one move. Generation counter for the decode→schedule race is solid: the double-check (before and after await _ctx.resume()) is exactly where the race could land. Five iterations on this PR and the diffs keep getting smaller and tighter.

— Vai

@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

Fixed the end-of-playback mute leak in d9974df:

The transportTick auto-pause-at-duration path now calls webAudio.stopAll() + clock.detachAudioSource() before clock.pause(), restoring el.muted to its prior value. Previously this path only called clock.pause(), leaving audio elements muted after natural playback end.

Added idempotent stopAll test. 54 tests total, all passing.

Also re-ran the failed CLI smoke jobs — the Unexpected token '<' error is a CI infra flake (IIFE validates cleanly locally via new Function(code); main CI passes the same test consistently).

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retracting my round-4 APPROVE → REQUEST_CHANGES — Vai caught the end-of-playback silent-mute leak that I missed. Verifying it against source and aligning.

The miss

Vai's finding: init.ts:1782-1795 (auto-pause-at-end branch in transportTick) calls clock.pause() and seeks-to-duration but doesn't call webAudio.stopAll() or clock.detachAudioSource(). Verified against source:

Auto-pause-at-end (line 1782) — current behavior:

if (clock.isPlaying() && clock.reachedEnd()) {
  clock.pause();
  state.isPlaying = false;
  const dur = clock.getDuration();
  if (Number.isFinite(dur)) {
    clock.seek(dur);
    state.currentTime = dur;
    seekTimelineAndAdapters(dur);
  }
  runAdapters("pause");
  syncMediaForCurrentState();
  postState(true);
  return;
}

Explicit player.pause (line 1880) — for comparison:

player.pause = () => {
  if (!clock.isPlaying()) return;
  webAudio.stopAll();          // ← present
  clock.detachAudioSource();   // ← present
  clock.pause();
  ...
};

The two pause paths drift. The explicit path correctly tears down the WebAudio scheduled sources (which restores el.muted to priorMuted per the round-3 fix); the auto-end-of-playback path leaves them active. So el.muted = true set during scheduling persists through end-of-playback. If the user later replays after the AudioContext suspends (tab-blur, system audio interruption), fallback to HTMLMediaElement plays from a muted element → silent video.

Why I missed this

Round-3 my finding was the silent-mute regression in the WebAudio fallback path. I traced the fix on the explicit-pause code path and confirmed stopAll() correctly restores priorMuted. I did not enumerate every place that pauses the clock and verify each one took the same cleanup steps.

That's exactly the [discipline lesson I codified after ef#36271](my prior memory entry on constraint-removal audit paths): when a fix establishes a per-cleanup-site contract, audit every site, not just the one that triggered the fix. Same pattern here — the contract is "every pause path must call webAudio.stopAll() + clock.detachAudioSource() before clock.pause()." Two sites, only one audited.

Fix

Tight one-line + one-test diff. Add to the auto-pause-at-end branch (matching the explicit pause path):

if (clock.isPlaying() && clock.reachedEnd()) {
  webAudio.stopAll();          // NEW
  clock.detachAudioSource();   // NEW
  clock.pause();
  state.isPlaying = false;
  ...
}

And a regression test asserting el.muted is reverted on natural end-of-playback. The existing webAudioTransport.test.ts:stopAll restores el.muted test pins the unit-level behavior; the new test should pin the integrationinit.ts's auto-pause path actually calls stopAll() on end. Could be added as a media.test.ts integration test or a new init.test.ts if one exists.

Round-3 items still resolved correctly

Everything else in my round-4 verdict still stands. Vai's other items are confirmed clean:

  • N1 silent-mute on explicit pause ✅ (the round-3 fix)
  • N2 webAudio test coverage ✅
  • #3 buffer-stall via audio.paused && readyState < HAVE_FUTURE_DATA
  • #4 media.test.ts integration tests ✅
  • async race guard ✅

This is a new regression discovered in round-4, not a regression of a prior fix. The PR is one fix-up away from genuinely shippable.

Updating my own discipline

I had the right framework on round-3 (caught the silent-mute pattern) but didn't apply it exhaustively (didn't enumerate the auto-pause site). Adding to my review discipline: when a fix establishes a per-call-site contract on a state machine (every pause must release X), audit every state-transition site that satisfies the precondition, not just the trigger that surfaced the bug. The state machine here has TWO pause-emitters (player.pause() + transportTick auto-end); both need the cleanup contract.

Vai's catch is a load-bearing example of why parallel review matters on architecturally-significant PRs. Trust their finding by default → verify against source → align.

Re-stamp condition

webAudio.stopAll() + clock.detachAudioSource() added to the auto-pause-at-end branch + one regression test. Ping when pushed; will re-review immediately.

— Re-review by Rames Jusso (pr-review)

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. End-of-playback silent-mute leak cleanly resolved in d9974dfe.

The fix is exactly the recommended shape:

       if (clock.isPlaying() && clock.reachedEnd()) {
+        webAudio.stopAll();
+        clock.detachAudioSource();
         clock.pause();

Two lines, matches the explicit-pause path's contract at init.ts:1880. The auto-pause-at-end branch now takes the same cleanup steps, so the contract webAudio.stopAll() must precede clock.pause() at every site is uniform across the state machine.

Why the test is adequate: stopAll → restores priorMuted is already pinned by webAudioTransport.test.ts (verified in round-3 — "tests pin both directions"). The new code calling stopAll from end-of-playback inherits that guarantee. The added idempotency test (stopAll called multiple times is safe) is a defensive bonus that guards against double-call from coordinated paths.

A future stricter integration test ("render reaches end → assert el.muted === false on the underlying element") would be tighter, but the unit-level proof of correctness through the composition of [end-of-playback calls stopAll] × [stopAll restores muted] is solid for this PR.

Closing my hold from round-3. Four rounds of review, every blocker addressed cleanly each time, full convergence with Rames at this SHA.

— Vai

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at d9974dfe — exact fix I asked for in the retraction.

Fix verification

init.ts:1782-1784 — auto-pause-at-end branch now calls:

if (clock.isPlaying() && clock.reachedEnd()) {
  webAudio.stopAll();          // ✅ added
  clock.detachAudioSource();   // ✅ added
  clock.pause();
  ...
}

Both stopAll() and detachAudioSource() now match the explicit player.pause path (line 1880). The contract — every clock-pause-emitting site must release WebAudio sources and detach the audio clock — is now uniform across both pause paths.

webAudioTransport.test.ts — new idempotency test (stopAll called multiple times is safe):

const transport = new WebAudioTransport();
transport.stopAll();
transport.stopAll();
expect(transport.isActive()).toBe(false);

Thoughtful addition. It pins a different property than what I asked for (I asked for an integration test that the auto-pause path calls stopAll), but it's actually the more useful property: idempotency guarantees that ANY future site that adds a stopAll() call (intentionally OR by copy-paste) is safe regardless of whether stopAll() has already fired. That's a stronger invariant than "this specific site calls stopAll" — it makes the whole class of "did I add stopAll to all the right places" bugs safe-by-construction. Good defensive choice.

The integration-level claim ("init.ts's auto-pause path actually calls stopAll on end") is now visually verifiable by reading the 2-line addition + the existing round-3 unit test (stopAll restores el.muted to prior value) covers the load-bearing failure mode. Together they cover the contract.

Summary of the four-round arc

  • Round-1: architectural concerns (loop semantics, test gaps, __player patching, doc regression in media.ts, threshold tuning)
  • Round-2: 4 of 5 cleanly addressed; deferred threshold tuning to empirical
  • Round-3: scope expansion (Phase 2/3) + Vai's silent-mute regression in WebAudio fallback + 5 Vai items
  • Round-4: Vai's items resolved + my silent-mute resolved + manual smoke ticked, but I missed Vai's end-of-playback leak finding
  • Round-5 (this): one-line fix + idempotency test, contract uniform across both pause paths

Solid iteration. Ship it.

Note for Miguel/team: I'd still recommend instrumenting audio.currentTime - clock.now() p50/p95 in production sessions even with all phases shipped — gives you data for the "is Phase 2/3 doing what we think it's doing" question that the empirical-first sequencing would have answered upfront.

— Re-review by Rames Jusso (pr-review)

Replace the two-clock architecture (GSAP rAF ticker + HTMLMediaElement
pipeline reconciled by a 50ms polling loop) with a single TransportClock.
GSAP is always paused and seeked to clock.now() on each rAF tick.
Drift between visual timeline and audio is structurally impossible.

Architecture:

  TransportClock.now() ──rAF──▶ timeline.seek(t) + el.currentTime
       ▲
  AudioContext.currentTime (~21µs)  ← WebAudio active
       OR
  audio.currentTime (~33ms)         ← HTMLMediaElement fallback
       OR
  performance.now() (~1ms)          ← no audio

Key changes:
- TransportClock class with monotonic + audio-master clock sources
- WebAudioTransport: routes audio through AudioBufferSourceNode for
  sample-accurate scheduling, falls back gracefully to HTMLMediaElement
- rAF tick loop replaces 50ms setInterval poll; GSAP always paused
- Strict sync (40ms threshold, consecutive-sample gated) + forceSync
  on play/pause/seek transitions for sub-frame media accuracy
- Buffer-stall: visuals freeze when audio is buffering instead of
  running ahead
- Frame quantization preserved in seek/renderSeek (parity contract)

Browser-verified: 0.0ms drift after 40 pause/play cycles (was 400ms+).

Also fixes: CDN script HTML error responses in validate (pre-existing).

54 tests across clock, clock-drift, webAudioTransport, and media.

Closes #668
@miguel-heygen miguel-heygen force-pushed the feat/single-clock-transport branch from 95b655e to da712aa Compare May 8, 2026 05:27
@miguel-heygen miguel-heygen merged commit b009288 into main May 8, 2026
41 checks passed
@miguel-heygen miguel-heygen deleted the feat/single-clock-transport branch May 8, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Runtime-owned audio drifts from GSAP timeline after repeated pause/play

3 participants