scripts: team trajectory viewer generator#64
Closed
ProKil wants to merge 1 commit into
Closed
Conversation
d118a8e to
ed59e39
Compare
Self-contained, interactive HTML viewer for team-mode runs (same gen_*_report.py convention: reads logs/ at gen time, inline CSS/JS, no external assets). Indexes every team pair; drill into any pair for the coordination story — lead/member split, the per-pair widgets below, eval, and collapsible per-agent trajectories. Widgets: - Feature usage (replay): play/scrub timeline, five swim-lanes (task_list, scratchpad, mcp, auto_refresh, protocol) lighting up as each feature is used. task_list + conversation-protocol derive from the task-log timeline; scratchpad / mcp / protocol-sends / auto_refresh are scanned from each agent's trajectory (exact timestamps for mini-swe-agent, interpolated-by-step for codex multi-msg trajs, and from the raw codex *_stream.log exec blocks for full-dataset runs). - Task list: per-task lifecycle (create/claim/update/done with notes & times). - Scratchpad: /workspace/shared/ file browser — recovered PLAN.md + agent patches. - Protocol: inter-agent message thread (*_sent.jsonl) + typed request/respond. --study emits the curated study set (cmp-full-*, ablate-*, msa_team_core*); --runs filters by substring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ed59e39 to
aa8070e
Compare
Member
Author
|
Migrating the team trajectory viewer to cooperbench/reports (it now lives next to the published HTML). Generator added there as scripts/gen_team_viewer.py in commit a710ae5; viewer is live at https://cooperbench-reports.pages.dev/cooperbench/2026-05-23-team-viewer.html . Closing this PR — the script will not land in CooperBench. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
scripts/gen_team_viewer.py— a self-contained HTML viewer for team-mode runs, in the same spirit as the existinggen_*_report.pyscripts (readslogs/at generation time, inline CSS/JS, no external assets).Unlike the existing aggregate reports, this is an interactive, coordination-first browser: a sidebar of every team run, drill into any pair to read its coordination story.
Per-pair detail
team_featureschipstask_logevents (create/claim/update/done) + inter-agent messages, color-coded by actor, relative timestamps — the centerpiece*_full_traj.jsonwhen present, else summary) + diff-highlighted patchesUsage
Secondary content (trajectory messages, notes, test output, patches) is truncated to keep the embedded JSON bounded; coordination data is kept in full.
Published artifact
A
--study-scoped build (1714 pairs / 19 runs) is live on the org reports site:https://cooperbench-reports.pages.dev/cooperbench/2026-05-23-team-viewer.html
Verification
ruff check/ruff format --check/mypyclean on the new scriptruff/format/mypyonsrc/cooperbench/+pytest(389 passed, 63 skipped) all green🤖 Generated with Claude Code