Skip to content

security(daemon): cap browser connections per session in wsproxy#289

Open
mgabor3141 wants to merge 1 commit into
mainfrom
review/T25-connection-caps
Open

security(daemon): cap browser connections per session in wsproxy#289
mgabor3141 wants to merge 1 commit into
mainfrom
review/T25-connection-caps

Conversation

@mgabor3141

Copy link
Copy Markdown
Contributor

Implements review ticket T25.

The wsproxy held an unbounded number of browser WebSocket connections per session (p.sessions[id]). An authed-but-buggy client stuck in a reconnect loop (the #242 storm) could pile up connections without limit, each carrying proxy goroutines plus a 4 MiB backend read buffer — a self-DoS / runaway-client footgun with no shedding. This adds a generous-but-bounded per-session cap (maxClientsPerSession = 8): connections beyond the cap are refused with a clear WebSocket close reason (StatusTryAgainLater, "too many connections for this session") rather than evicting healthy viewers, so a storming client can't knock an existing tab offline. The check and registration happen under the same lock, so concurrent connects can't race past the cap. The cap is per-session, so legit multi-tab / phone+desktop use of other sessions is unaffected, and a freed slot (viewer disconnect) is immediately reusable by a reconnecting client.

Scope note: the ticket also mentions an optional per-remote-addr cap on /v1/events SSE, but that part overlaps T14 and is left out of this focused change. The runner-side ptyserver.s.clients map is auth-gated behind an owner-only Unix socket; the hub-facing wsproxy is the meaningful blast-radius boundary, so the cap lives there.

Verification

  • cd services/gmuxd && go build ./... — passes
  • go test ./internal/wsproxy/ — new tests cover: refusal beyond the cap, slot not stored on refusal, cap is per-session, and a freed slot is reusable
  • go test ./... — full daemon suite passes

Source finding: T25 (P2) connection caps.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Try this PR

curl -sSfL https://gmux.app/install-pr.sh | sh -s -- 289

Built from e1e2319 — security(daemon): cap browser connections per session in wsproxy
Requires GitHub CLI with auth. Artifacts expire after 7 days.

@mgabor3141 mgabor3141 force-pushed the review/T25-connection-caps branch from c5cc784 to e1e2319 Compare June 7, 2026 10:36

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@mgabor3141

Copy link
Copy Markdown
Contributor Author

@greptile review

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a per-session WebSocket connection cap (maxClientsPerSession = 8) to the wsproxy, preventing an authed-but-buggy reconnect-looping client from exhausting goroutines and backend read buffers. Connections beyond the cap are refused with a StatusTryAgainLater WebSocket close code rather than evicting healthy viewers, and the check + slot registration happen under the same mutex so concurrent dials cannot race past the limit.

  • addConn now returns bool and enforces the cap atomically; a reserved slot is released via removeConn if the subsequent backend dial fails, keeping the count accurate.
  • Four new tests cover: refusal at and beyond the cap, per-session isolation (other sessions unaffected), freed-slot reuse, and an end-to-end handler test that validates the StatusTryAgainLater close code and reason.

Confidence Score: 4/5

Safe to merge; the implementation logic is correct and the cap is enforced atomically. The one rough edge is a test path that can exit before verifying the close code and reason, which is non-blocking but worth tightening.

The core wsproxy.go change is well-structured: the lock prevents concurrent races past the cap, the slot is properly released on backend-dial failure, and refused connections receive an informative close frame without disturbing existing viewers. The test suite is thorough, but TestHandlerRefusesBeyondCapWithCloseReason has an early-return branch that silently skips its main assertions (close code, close reason, viewer count) whenever the dial itself errors — meaning the advertised StatusTryAgainLater guarantee is not always verified by the test.

services/gmuxd/internal/wsproxy/wsproxy_test.go — the early-return path in the integration test should be looked at; wsproxy.go itself needs no further attention.

Important Files Changed

Filename Overview
services/gmuxd/internal/wsproxy/wsproxy.go Adds per-session connection cap (maxClientsPerSession=8): addConn now returns bool and enforces the limit under the same lock; slot is reserved before backend dial and released on failure; refused connections receive StatusTryAgainLater with a clear reason. Logic is correct and atomic.
services/gmuxd/internal/wsproxy/wsproxy_test.go New test file covering cap enforcement, per-session isolation, and freed-slot reuse; integration test drives the full HTTP handler. Minor: the early-return path in TestHandlerRefusesBeyondCapWithCloseReason skips the close-code, close-reason, and viewer-count assertions when dial errors out.

Sequence Diagram

sequenceDiagram
    participant B as Browser
    participant P as wsproxy Handler
    participant S as p.sessions (locked)
    participant R as gmux-run Unix socket

    B->>P: HTTP WebSocket Upgrade
    P->>P: websocket.Accept()
    P->>S: addConn(sessionID, conn)
    alt "at cap (len >= maxClientsPerSession)"
        S-->>P: return false
        P->>B: Close(StatusTryAgainLater, "too many connections")
        P-->>P: return
    else slot available
        S-->>P: "append & return true"
        P->>R: websocket.Dial(backend)
        alt backend dial fails
            R-->>P: error
            P->>S: removeConn(sessionID, conn)
            P->>B: Close(StatusInternalError, "backend unavailable")
            P-->>P: return
        else dial succeeds
            R-->>P: backendConn
            P->>P: spawn 3 goroutines (B to R, R to B, keepalive)
            P->>P: wg.Wait()
            P->>B: Close(StatusNormalClosure)
            P->>R: Close(StatusNormalClosure)
            P->>S: removeConn(sessionID, conn)
        end
    end
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
services/gmuxd/internal/wsproxy/wsproxy_test.go:168-176
**Early-return silently skips the test's key assertions**

If `dial()` returns a non-nil error for the cap+1 connection (which the comment acknowledges as possible), the test exits without ever checking the `StatusTryAgainLater` close code, the "too many connections" close reason, or the post-refusal viewer count. The PR description specifically claims the connection is refused "with a clear WebSocket close reason (`StatusTryAgainLater`)" — but if this branch fires, none of that is verified and the test still passes green. Consider moving the three assertions (close code, reason, viewer count) to run unconditionally before this early return, or at minimum assert them in the nil-error branch without allowing a silent pass for the non-nil case.

Reviews (1): Last reviewed commit: "security(daemon): cap browser connection..." | Re-trigger Greptile

Comment on lines +168 to +176
if n == maxClientsPerSession {
break
}
if time.Now().After(deadline) {
t.Fatalf("only %d/%d connections registered", n, maxClientsPerSession)
}
time.Sleep(5 * time.Millisecond)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Early-return silently skips the test's key assertions

If dial() returns a non-nil error for the cap+1 connection (which the comment acknowledges as possible), the test exits without ever checking the StatusTryAgainLater close code, the "too many connections" close reason, or the post-refusal viewer count. The PR description specifically claims the connection is refused "with a clear WebSocket close reason (StatusTryAgainLater)" — but if this branch fires, none of that is verified and the test still passes green. Consider moving the three assertions (close code, reason, viewer count) to run unconditionally before this early return, or at minimum assert them in the nil-error branch without allowing a silent pass for the non-nil case.

Prompt To Fix With AI
This is a comment left during a code review.
Path: services/gmuxd/internal/wsproxy/wsproxy_test.go
Line: 168-176

Comment:
**Early-return silently skips the test's key assertions**

If `dial()` returns a non-nil error for the cap+1 connection (which the comment acknowledges as possible), the test exits without ever checking the `StatusTryAgainLater` close code, the "too many connections" close reason, or the post-refusal viewer count. The PR description specifically claims the connection is refused "with a clear WebSocket close reason (`StatusTryAgainLater`)" — but if this branch fires, none of that is verified and the test still passes green. Consider moving the three assertions (close code, reason, viewer count) to run unconditionally before this early return, or at minimum assert them in the nil-error branch without allowing a silent pass for the non-nil case.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant