Skip to content

feat: authoritative session attribution via agent-shim (replaces scrollback guessing)#315

Open
mgabor3141 wants to merge 8 commits into
mainfrom
feat/agent-shim-attribution
Open

feat: authoritative session attribution via agent-shim (replaces scrollback guessing)#315
mgabor3141 wants to merge 8 commits into
mainfrom
feat/agent-shim-attribution

Conversation

@mgabor3141

Copy link
Copy Markdown
Contributor

Problem

The gmux sidebar is conversation-keyed: a slot means "this is conversation X" so we can monitor X's session file and resume the right conversation if the runner dies. But gmux doesn't own the conversation identity — pi/claude/codex pick their session file after launch (only once there's an event), so gmux can't know it at spawn time.

Today gmux guesses by matching terminal scrollback against candidate session files. That guess is post-hoc, ambiguous for small/overlapping sessions, blind to /resume rebinds, and assumes one runner per conversation. It's the root cause behind several sidebar bugs (sessions disappearing on relaunch, dismissed sessions reappearing, renamed sessions sorting to the bottom) — all attribution bugs in storage costume (see ADR 0009).

Solution: authoritative attribution via an agent-shim

A tiny, readable JS preload (hook.mjs) is injected into node/bun agent processes by the runner:

  • node: NODE_OPTIONS="… --import file:///abs/hook.mjs"
  • bun: BUN_OPTIONS="… --preload /abs/hook.mjs" (both set, append-safe; each runtime honours its own)

The shim wraps the agent's fs write surface and, whenever a *.jsonl session file is written, POSTs {op,path,pid} to the runner's unix socket. The runner records it on session state and emits a session_file event; the daemon turns that into an authoritative attribution that overrides scrollback. The agent itself told us which file it holds — no guessing.

Key design points:

  • Reads are never reported. Typing /resume makes the agent readdir + bulk-read every session file for its picker — pure noise. A real resume/rebind always ends in a write to the chosen file, so writes alone catch attribution and rebind.
  • Gated + self-disarming. The shim arms only when GMUX_RUNNER_SOCK is present, then strips it (and the injected *_OPTIONS) from process.env so child processes the agent spawns inherit a clean env. Injection is gated on a new adapter.SessionShimmer capability (pi opts in); shells never get NODE_OPTIONS.
  • Hello-gating. The shim's hello (fired at startup, before any write) tells the daemon to suppress scrollback for that session until the real file is reported — closing the pre-write window where scrollback would otherwise mis-attribute a fresh session to a stale old file.
  • Re-announce on reconnect. The runner replays its current shim state to every new /events subscriber, so a restarted daemon re-learns attribution instantly on resubscribe — which let us retire attributions.json entirely (no persisted attribution state).

Benefits

  • Instant attribution the moment the agent writes its file — no 3s throttle, no scrollback fetch, no similarity threshold.
  • Clean /resume rebind: a new file for the same runner drops the old attribution authoritatively.
  • N:1 readiness: each runner reports independently; two runners on one conversation are handled without conflict (currently last-binder-wins, since the attributions map stays 1:1 — representing true N:1 is future work, but the signal is now authoritative per-runner).
  • Reduced coupling: retires attributions.json persistence; the same shim works for any JSONL agent (pi/claude/codex), so the per-adapter attribution surface collapses toward a thin path-recognizer.

Fallback

Scrollback attribution is kept but demoted. It now serves only sessions with no shim signal: shells (which never used it), agent builds where the shim couldn't be injected, and runners that started before any daemon could hand them GMUX_RUNNER_SOCK. The path is annotated Deprecated/FALLBACK (FileAttributor, the shared helpers, tryAttributeUnmatched, fetchScrollbackText), and successful fallback attributions now log a (FALLBACK) marker so we can monitor how often we actually rely on it in production.

Testing

  • Unit/integration: shim authoritative override, /resume rebind, session-death cleanup, scrollback exclusion of shim-covered sessions, the session_file/shim SSE dispatch, append-safe env injection, descendant disarm, and an end-to-end ptyserver test driving a real node agent (TestShimReportsSessionFile, TestShimReannounceOnReconnect).
  • Live end-to-end against real pi under a dev daemon:
    • fresh session shows only shim-attributed (zero scrollback lines) — hello-gating working;
    • /resume produced a clean rebind (019ed2c9 → 019ed018) with the title following;
    • N:1 (two runners on one file) degraded to last-binder-wins, no conflict/crash;
    • after a daemon restart with no attributions.json on disk, the held file was re-attributed immediately on resubscribe.
  • Full go test ./... green across gmuxd, cli/gmux, packages/adapter; go vet clean.

Notes / follow-ups

  • The shim ships as a third readable artifact (embedded via go:embed, materialized to a content-addressed cache path) — intentionally inspectable by users.
  • Known caveat: a runner that started before any daemon never sent a hello, so it falls to scrollback on a later daemon connect. That's the fallback's job; the (FALLBACK) log lets us measure it.
  • Representing true N:1 (multiple runners → one conversation) in the data model remains future work, now unblocked by authoritative per-runner reporting.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Try this PR

curl -sSfL https://gmux.app/install-pr.sh | sh -s -- 315

Built from ef557ee — refactor: tighten attribution comments + fix shim-coverage wording
Requires GitHub CLI with auth. Artifacts expire after 7 days.

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown

Confidence Score: 5/5

Safe to merge; the authoritative attribution path is well-guarded and the scrollback fallback is preserved for unshimmed sessions.

The new shim infrastructure is narrow and well-tested end-to-end. The two inline findings are both non-functional: one is dead cleanup code in NotifySessionDied (redundant loop that is always a no-op), and the other is an unused struct field that causes unnecessary allocations on the write hot-path but does not affect correctness. All concurrency-sensitive state transitions (shimActive, SessionFile) are properly guarded by the existing RWMutex. The fallback scrollback path remains intact for unshimmed sessions.

cli/gmux/internal/ptyserver/shim_integration_test.go — the polling loops read st.SessionFile directly without holding the mutex, which the race detector will flag under go test -race (previously noted; correct accessor is st.ShimSnapshot()).

Important Files Changed

Filename Overview
cli/gmux/internal/agentshim/hook.mjs JS preload shim: arms on GMUX_RUNNER_SOCK, wraps fs write surface, posts session-file events to runner; strips itself via import.meta.url (regression-tested). Logic is sound and intentionally best-effort.
cli/gmux/internal/agentshim/agentshim.go Materializes the embedded shim to a content-addressed cache path; PreloadEnv appends shim flags append-safely. Logic is correct; sync.Once caching is safe.
cli/gmux/internal/ptyserver/ptyserver.go Adds POST /shim/event handler and /events replay; shim injection correctly gated on SessionShimmer capability. ev.Data decoded but unused adds unnecessary allocation on every write event.
cli/gmux/internal/session/state.go Adds shimActive + SessionFile fields with proper RWMutex discipline; ShimSnapshot replays both for reconnects; SetSessionFile emits only on path change.
services/gmuxd/internal/discovery/filemon.go Adds shimFiles/shimCovered maps, AttributeFromShim, MarkShimCovered; scrollback fallback correctly excludes shim-covered sessions. NotifySessionDied has a redundant second shimFiles cleanup loop (always a no-op) and an unused changed variable suppressed with _ = changed.
services/gmuxd/internal/discovery/subscribe.go Adds session_file and shim SSE event handlers; correctly calls OnSessionFile/OnShimActive callbacks; subscribe replacement semantics unchanged.
cli/gmux/internal/ptyserver/shim_integration_test.go End-to-end tests drive a real node agent through the full shim path. Direct st.SessionFile reads in polling loops (without holding mu) are an unsynchronized access that the race detector will flag — the correct accessor is st.ShimSnapshot().
services/gmuxd/internal/discovery/shim_test.go Unit tests for AttributeFromShim, rebind, session death, and scrollback exclusion. All access is single-threaded via direct fm.mu-free calls, correct for unit tests.
packages/adapter/capabilities.go Adds SessionShimmer interface; FileAttributor deprecated with clear doc comment; clean interface design.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent as node/bun Agent
    participant Shim as hook.mjs (shim)
    participant Runner as ptyserver (Runner)
    participant Daemon as gmuxd (Daemon)
    participant FileMon as FileMonitor

    Runner->>Agent: spawn with NODE_OPTIONS/BUN_OPTIONS + GMUX_RUNNER_SOCK
    Agent->>Shim: loaded as preload
    Shim->>Runner: "POST /shim/event {op:hello, pid}"
    Runner->>Runner: state.EmitShimActive()
    Runner-->>Daemon: "SSE event: shim {active:true}"
    Daemon->>FileMon: MarkShimCovered(sessionID)
    Note over FileMon: scrollback attribution suppressed

    Agent->>Agent: "fs.appendFileSync(*.jsonl, data)"
    Shim->>Runner: "POST /shim/event {op:append, path, pid}"
    Runner->>Runner: state.SetSessionFile(path)
    Runner-->>Daemon: "SSE event: session_file {path}"
    Daemon->>FileMon: AttributeFromShim(sessionID, path)
    Note over FileMon: authoritative attribution, no scrollback needed

    Note over Agent: /resume - agent writes new file
    Shim->>Runner: "POST /shim/event {op:append, path:new}"
    Runner->>Runner: state.SetSessionFile(newPath)
    Runner-->>Daemon: "SSE event: session_file {path:new}"
    Daemon->>FileMon: AttributeFromShim(sessionID, newPath)
    Note over FileMon: rebind: old file dropped, new file attributed

    Note over Daemon: Daemon restarts
    Daemon->>Runner: GET /events (reconnect)
    Runner-->>Daemon: "replay: shim {active:true} + session_file {path}"
    Daemon->>FileMon: re-learns attribution, no disk state needed
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent as node/bun Agent
    participant Shim as hook.mjs (shim)
    participant Runner as ptyserver (Runner)
    participant Daemon as gmuxd (Daemon)
    participant FileMon as FileMonitor

    Runner->>Agent: spawn with NODE_OPTIONS/BUN_OPTIONS + GMUX_RUNNER_SOCK
    Agent->>Shim: loaded as preload
    Shim->>Runner: "POST /shim/event {op:hello, pid}"
    Runner->>Runner: state.EmitShimActive()
    Runner-->>Daemon: "SSE event: shim {active:true}"
    Daemon->>FileMon: MarkShimCovered(sessionID)
    Note over FileMon: scrollback attribution suppressed

    Agent->>Agent: "fs.appendFileSync(*.jsonl, data)"
    Shim->>Runner: "POST /shim/event {op:append, path, pid}"
    Runner->>Runner: state.SetSessionFile(path)
    Runner-->>Daemon: "SSE event: session_file {path}"
    Daemon->>FileMon: AttributeFromShim(sessionID, path)
    Note over FileMon: authoritative attribution, no scrollback needed

    Note over Agent: /resume - agent writes new file
    Shim->>Runner: "POST /shim/event {op:append, path:new}"
    Runner->>Runner: state.SetSessionFile(newPath)
    Runner-->>Daemon: "SSE event: session_file {path:new}"
    Daemon->>FileMon: AttributeFromShim(sessionID, newPath)
    Note over FileMon: rebind: old file dropped, new file attributed

    Note over Daemon: Daemon restarts
    Daemon->>Runner: GET /events (reconnect)
    Runner-->>Daemon: "replay: shim {active:true} + session_file {path}"
    Daemon->>FileMon: re-learns attribution, no disk state needed
Loading

Reviews (4): Last reviewed commit: "refactor: tighten attribution comments +..." | Re-trigger Greptile

Comment thread cli/gmux/internal/agentshim/hook.mjs Outdated
…shim

Inject a tiny JS preload (hook.mjs) into node/bun agents so gmux learns
the held conversation file from the agent's own fs writes, replacing
post-hoc scrollback matching (ADR 0009).

- cli/gmux/internal/agentshim: embeds hook.mjs, materializes it to a
  readable content-addressed cache path, builds append-safe
  NODE_OPTIONS=--import / BUN_OPTIONS=--preload env + GMUX_RUNNER_SOCK.
- adapter.SessionShimmer capability gates injection to jsonl agents
  (pi opts in); shells never get NODE_OPTIONS.
- ptyserver injects the preload in New() and serves POST /shim/event,
  recording the path via session.State.SetSessionFile, which emits a
  session_file event on /events (first-attribution + rebind).
- Shim ignores reads, keys on writes, dedupes node's append->write
  re-entry, disarms descendants by stripping its env.
Wire the runner's session_file event (from the agent-shim preload) into
the daemon as authoritative attribution that overrides scrollback:

- Subscriptions.OnSessionFile dispatches the session_file SSE event with
  its sessionID; main.go wires it to FileMonitor.AttributeFromShim.
- AttributeFromShim records filePath->sessionID in attributions + a
  parallel shimFiles map, derives title/slug/status immediately, and
  handles /resume rebind by dropping the session's prior file.
- Scrollback demotion: shim-attributed files are excluded as candidates
  (already in attributions); shim-attributed sessions are excluded from
  the scrollback candidate pool so it can't second-guess them.
- Cleanup: session death and stale-clear drop shimFiles entries;
  NotifyNewSession processes any attribution that landed before
  registration (session_file only fires on change).

Tests: authoritative override, rebind, death cleanup, scrollback
exclusion, SSE dispatch.
Live testing surfaced a pre-write mis-attribution window: a freshly
launched shimmed pi (no conversation yet) was attributed by scrollback
to a stale old session file in the ~30s before its first write, and the
stale attribution then lingered alongside the correct shim one.

- agent-shim 'hello' now emits a 'shim' event (ptyserver -> session_file
  SSE); the daemon marks the session shim-covered (FileMonitor.
  MarkShimCovered) and suppresses scrollback for it. A covered session
  with no conversation correctly stays unattributed until it writes.
- AttributeFromShim now clears ALL other files attributed to the
  session (stale scrollback guess or prior shim file on /resume), not
  just prior shim files — the shim is authoritative for the one file
  the agent holds.
- Coverage cleared on session death.

Verified live against real pi: fresh session shows only shim-attributed
(no scrollback line); /resume produces a clean rebind; N:1 (two runners
on one file) degrades to last-binder-wins with no conflict.
Make attribution survive a daemon restart without persisted state: the
runner replays its current shim coverage and held session file to every
new /events subscriber, so a reconnecting/restarted daemon re-learns
attribution authoritatively the instant it resubscribes.

- session.State retains shimActive (set on hello) and exposes
  ShimSnapshot(); emits are now lock-safe.
- ptyserver handleEvents replays a 'shim' and 'session_file' event to
  each newly-connected subscriber before streaming live events.
- attributions.json is retired: FileMonitor no longer loads or writes
  it (persistAttributionsLocked is now a no-op; NewFileMonitor starts
  empty). Shimmed sessions restore via re-announce; unshimmed sessions
  re-derive via the scrollback fallback. persist.go is kept (and still
  tested) but marked deprecated.

Verified live: after a daemon restart with no attributions.json on
disk, the held file is re-attributed immediately on resubscribe.
Unit test TestShimReannounceOnReconnect covers the replay.
…persistence

Tidy-up after retiring attributions.json and demoting scrollback:

- Annotate the scrollback path as the deprecated FALLBACK: FileAttributor
  interface, the shared attribution helpers, tryAttributeUnmatched, and
  fetchScrollbackText. It now serves only sessions with no shim signal
  (unshimmed builds, injection failures, pre-hello runners).
- Log successful scrollback attributions with a '(FALLBACK)' marker so we
  can monitor how often it actually fires in production.
- Delete the now-dead attributions.json persistence: persist.go +
  persist_test.go removed; persistAttributionsLocked (no-op) and its call
  sites removed; ApplyPersistedAttributions and its tests removed; Watch's
  onFirstScan is passed nil. NewFileMonitorWithAttributions stays as a
  test-only seed seam.

Full suite green across gmuxd, cli/gmux, and packages/adapter; go vet clean.
@mgabor3141 mgabor3141 force-pushed the feat/agent-shim-attribution branch from f0766e9 to b6dee62 Compare June 17, 2026 00:00
Document the shim-based attribution decision as ADR 0010 (the precursor
'attribution is the foundation' draft was never merged; 0009 is now the
verb-first CLI ADR). Update UBIQUITOUS_LANGUAGE.md: the Attribution entry
now reflects shim-primary / scrollback-fallback, and add an Agent-shim
component term.

(CHANGELOG.md only points to the external gmux.app/changelog; nothing to
update locally.)
@mgabor3141 mgabor3141 force-pushed the feat/agent-shim-attribution branch from b6dee62 to ea41874 Compare June 17, 2026 07:38
Slim the wordy doc comments added with the agent-shim work so the point
isn't buried (filemon AttributeFromShim/MarkShimCovered/tryAttributeUnmatched,
the FileAttributor fallback note, ptyserver replay, agentshim PreloadEnv).
Also correct the shim-coverage description: GMUX_RUNNER_SOCK is set by the
runner (not the daemon), so re-announce works regardless of daemon timing;
the fallback is only for agents the shim can't cover.
@mgabor3141 mgabor3141 force-pushed the feat/agent-shim-attribution branch from ea41874 to ef557ee Compare June 17, 2026 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant