Skip to content

scripts: team trajectory viewer generator#64

Closed
ProKil wants to merge 1 commit into
mainfrom
team-trajectory-viewer
Closed

scripts: team trajectory viewer generator#64
ProKil wants to merge 1 commit into
mainfrom
team-trajectory-viewer

Conversation

@ProKil

@ProKil ProKil commented May 23, 2026

Copy link
Copy Markdown
Member

What

Adds scripts/gen_team_viewer.py — a self-contained HTML viewer for team-mode runs, in the same spirit as the existing gen_*_report.py scripts (reads logs/ at generation time, inline CSS/JS, no external assets).

Unlike the existing aggregate reports, this is an interactive, coordination-first browser: a sidebar of every team run, drill into any pair to read its coordination story.

Per-pair detail

  • Header: pass/fail badge, repo · task · features · framework/model · duration · lead
  • Agents: lead/member cards (status, steps, cost, tokens, patch lines)
  • Coordination: tasks done, time-to-first-claim, claims/updates per agent, team_features chips
  • Coordination timeline: task_log events (create/claim/update/done) + inter-agent messages, color-coded by actor, relative timestamps — the centerpiece
  • Final task board, eval (per-feature pass/fail + test output)
  • Collapsible per-agent trajectories (rich *_full_traj.json when present, else summary) + diff-highlighted patches

Usage

uv run python scripts/gen_team_viewer.py            # all team runs
uv run python scripts/gen_team_viewer.py --study    # curated set: cmp-full-*, ablate-*, msa_team_core*
uv run python scripts/gen_team_viewer.py --runs msa_ ablate-flash   # substring filter(s)

Secondary content (trajectory messages, notes, test output, patches) is truncated to keep the embedded JSON bounded; coordination data is kept in full.

Published artifact

A --study-scoped build (1714 pairs / 19 runs) is live on the org reports site:
https://cooperbench-reports.pages.dev/cooperbench/2026-05-23-team-viewer.html

Verification

  • ruff check / ruff format --check / mypy clean on the new script
  • ruff/format/mypy on src/cooperbench/ + pytest (389 passed, 63 skipped) all green
  • Rendered headlessly in Chromium: drill-down, filters, and search work with no console errors

🤖 Generated with Claude Code

@ProKil ProKil force-pushed the team-trajectory-viewer branch 2 times, most recently from d118a8e to ed59e39 Compare May 23, 2026 23:02
Self-contained, interactive HTML viewer for team-mode runs (same gen_*_report.py
convention: reads logs/ at gen time, inline CSS/JS, no external assets).

Indexes every team pair; drill into any pair for the coordination story —
lead/member split, the per-pair widgets below, eval, and collapsible per-agent
trajectories.

Widgets:
- Feature usage (replay): play/scrub timeline, five swim-lanes (task_list,
  scratchpad, mcp, auto_refresh, protocol) lighting up as each feature is used.
  task_list + conversation-protocol derive from the task-log timeline; scratchpad
  / mcp / protocol-sends / auto_refresh are scanned from each agent's trajectory
  (exact timestamps for mini-swe-agent, interpolated-by-step for codex multi-msg
  trajs, and from the raw codex *_stream.log exec blocks for full-dataset runs).
- Task list: per-task lifecycle (create/claim/update/done with notes & times).
- Scratchpad: /workspace/shared/ file browser — recovered PLAN.md + agent patches.
- Protocol: inter-agent message thread (*_sent.jsonl) + typed request/respond.

--study emits the curated study set (cmp-full-*, ablate-*, msa_team_core*);
--runs filters by substring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ProKil ProKil force-pushed the team-trajectory-viewer branch from ed59e39 to aa8070e Compare May 23, 2026 23:22
@ProKil

ProKil commented May 23, 2026

Copy link
Copy Markdown
Member Author

Migrating the team trajectory viewer to cooperbench/reports (it now lives next to the published HTML). Generator added there as scripts/gen_team_viewer.py in commit a710ae5; viewer is live at https://cooperbench-reports.pages.dev/cooperbench/2026-05-23-team-viewer.html . Closing this PR — the script will not land in CooperBench.

@ProKil ProKil closed this May 23, 2026
@ProKil ProKil deleted the team-trajectory-viewer branch May 23, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant