Skip to content

Dlp/fix ws reconnect write timeout#113

Open
Dominicpham03 wants to merge 2 commits into
masterfrom
dlp/fix-ws-reconnect-write-timeout
Open

Dlp/fix ws reconnect write timeout#113
Dominicpham03 wants to merge 2 commits into
masterfrom
dlp/fix-ws-reconnect-write-timeout

Conversation

@Dominicpham03

Copy link
Copy Markdown
Collaborator

No description provided.

Dominic Pham and others added 2 commits June 24, 2026 12:56
WebSocket writes (speed-test updates, device updates, etc.) are AnyCable
performs on the RxgChannel subscription; the gateway drops them as "unknown
subscription" unless the client re-sends `subscribe` on each new socket.

_handleConnection reset _channelConfirmed only in the `disconnected` branch.
But a normal reconnect in WebSocketService goes connected->reconnecting->
connected and never emits `disconnected` (autoReconnect calls _scheduleReconnect
-> reconnecting; `disconnected` only fires when reconnect is disabled). So after
any reconnect _channelConfirmed stayed stale-true, _subscribeToChannel skipped
re-sending the channel subscribe, and every perform/write was dropped before it
reached Rails -> 15s TimeoutException, permanently, until app restart.

Reset _channelConfirmed/_channelSubscribeSent/_resourcesSubscribed/
_confirmedResources at the top of the `connected` branch so each (re)connect
re-runs the full subscribe->confirm handshake on the fresh socket.

Also make SpeedTestRepositoryImpl._mapExceptionToFailure case-insensitive and
match "timed out" so genuine timeouts map to TimeoutFailure instead of the
generic ServerFailure banner.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the reconnect fix. Even with the channel re-subscribing on every
(re)connect, a write issued in the window between the `connected` event and the
channel-subscription confirmation is still a perform on a not-yet-confirmed
AnyCable subscription, so the gateway drops it as "unknown subscription" and the
client times out. Users hit this as a speed-test update that fails once, then
succeeds when they manually re-run it.

Changes:
- WebSocketCacheIntegration: add `ensureChannelReady({timeout})` (and an
  `isChannelConfirmed` getter). It kicks the subscribe handshake if it hasn't
  been sent and waits up to 8s for confirmation, bailing early if the socket
  drops.
- SpeedTestWebSocketDataSource.updateSpeedTestResult: await ensureChannelReady()
  before sending, and on a TimeoutException re-ensure readiness and retry the
  update once (safe — update_resource is idempotent, so a retry can't duplicate
  data). If the channel still isn't ready, throw a clear "realtime channel not
  ready" instead of a blind 15s timeout.

Net effect: the first attempt waits for the channel to be write-ready, so the
manual re-run is no longer needed, and a missed window auto-recovers on retry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant