Skip to content

Don't drop retransmitted packets that were ACKed past but not yet delivered#133

Open
JohanG-LAS wants to merge 1 commit intodatarhei:mainfrom
JohanG-LAS:fix/retransmit-drop-past-ack
Open

Don't drop retransmitted packets that were ACKed past but not yet delivered#133
JohanG-LAS wants to merge 1 commit intodatarhei:mainfrom
JohanG-LAS:fix/retransmit-drop-past-ack

Conversation

@JohanG-LAS
Copy link
Copy Markdown

Problem

receiver.Push could drop legitimate retransmissions, even though the
packet had not yet been delivered to the application. This was observed
in production via MediaMTX as a near-1:1 ratio between
packetsReceivedRetrans and packetsReceivedDrop — every retransmit
that the sender sent in response to a NAK was being received and then
silently discarded by the receiver. The same streams over libsrt
(srt-live-transmit/TSDuck) recovered cleanly with zero drops, which
ruled out the network and the sender.

This appears to be the same symptom reported in #91.

Smoking gun (from a debug-instrumented build)

GOSRT_RX rx=0xc00014e4e0 retrans_in seq=691158851 lastACK=691158858 lastDelivered=691158759 maxSeen=691158865 GOSRT_RX rx=0xc00014e4e0 drop reason=lt_lastACK seq=691158851 lastACK=691158858 lastDelivered=691158759 maxSeen=691158865 retrans=true

The retransmission satisfied
lastDeliveredSequenceNumber < seq < lastACKSequenceNumber and was
dropped by the Lt(r.lastACKSequenceNumber) branch as "already
acknowledged" — even though the gap had not yet been delivered or
skipped past.

Root cause

periodicACK and the delivery loop in receiver.Tick are two separate
critical sections. periodicACK can advance lastACKSequenceNumber
past a gap when the packets after the gap have ripe PktTsbpdTime,
while the delivery loop breaks early on the first packet whose
PktTsbpdTime is still in the future. A retransmission arriving in
this window for a sequence in the gap landed in the
Lt(lastACKSequenceNumber) drop branch.

Fix

Allow retransmissions through the Lt(lastACKSequenceNumber) branch.
They fall into the existing out-of-order handling below, where they
either fill the gap (via InsertBefore) or are detected as duplicates
by the existing linear-scan check. TLPKTDROP semantics are still
enforced by the earlier Lte(lastDeliveredSequenceNumber) check, which
remains untouched and continues to drop anything already delivered or
skipped past.

Non-retransmit packets below lastACK (which would be unusual but
possible under heavy reordering) are still dropped exactly as before.

Test

TestRecvRetransmitPastACK constructs the bug state deterministically:

  1. Push packets 0–4 with future PktTsbpdTime so the delivery loop
    breaks at the head of the list.
  2. Push packets 6, 7, 8 (gap at 5) with past PktTsbpdTime so
    periodicACK advances past the gap.
  3. Tick(50) — asserts lastACK == 8 and lastDelivered is still
    uninitialized.
  4. Push a retransmit for seq 5 with RetransmittedPacketFlag = true.
  5. Asserts the retransmit was not counted as a drop, that
    PktRetrans incremented, and that seq 5 is now in packetList.
  6. Tick(300) — asserts the full [0..8] is delivered in order.

Verified that the test fails on main (with PktDrop == 1 exactly as
the production log showed) and passes with the fix applied.

Full go test ./... suite remains green.
Made-with: Cursor

When periodicACK advances lastACKSequenceNumber past a gap (e.g. when
packets after the gap have ripe PktTsbpdTime), the delivery loop can
break before lastDeliveredSequenceNumber catches up. A retransmission
arriving in this window for a sequence in the gap was dropped by the
Lt(lastACKSequenceNumber) branch as "already acknowledged", which
manifests in user reports as PktRecvRetrans approximately equal to
PktRecvDrop.

Allow retransmissions through this branch; they fall into the
out-of-order handling below where they either fill the gap or are
detected as duplicates. TLPKTDROP semantics remain enforced by the
earlier Lte(lastDeliveredSequenceNumber) check, which still drops
anything already delivered or skipped past.

Add a focused test that constructs the exact bug state deterministically
and verifies the retransmission is reinserted into packetList rather
than counted as a drop.

Made-with: Cursor
@gemini-code-assist
Copy link
Copy Markdown

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant