release: v0.0.19 by akhatua2 · Pull Request #68 · cooperbench/CooperBench

akhatua2 · 2026-05-26T03:15:09Z

Summary

Fixes claude_code adapter dropping every assistant turn from agent*_traj.json.

The bug

parse_session_jsonl (src/cooperbench/agents/claude_code/parsers.py) treated message.role as authoritative and rejected any event whose role wasn't in {user, assistant, system}. Recent claude-code session writers emit assistant turns with message.role: None — the role lives only in the top-level event.type — so every LLM turn (text, thinking, tool_use) got filtered out, leaving traj files that contained only user tool_result entries.

Affected every claude_code run since the session-format shift, including all 0.0.17 / 0.0.18 trajectories on disk.

The fix

One line — fall back to event.type when message.role is missing. Purely additive: existing message.role values still win.

role = message.get("role") or event.get("type")

Validation

Re-parsed the broken anyhow_task/390/f1_f4/agent2_session.jsonl (86 raw assistant events, 43 user events) with the fix:

Before: 43 messages, all user
After: 129 messages (43 user + 86 assistant)

Cross-checked 17 session files across recent CooperBench + CooperData runs:

0 drops, 0 skew, 0 empty-content issues
tool_use block counts match raw event counts exactly (28/28, 125/125, 44/44, 42/42 on spot-checked agents)
Patch contents align with Edit/Write tool_use calls

Other parsers untouched (parse_stream_json populates result.json fields, unaffected). Audited src/cooperbench/ for similar message.role-strict checks — none found in claude_code or runner/; swe_agent uses a different schema.

Test plan

uv run ruff check src/cooperbench/
uv run ruff format --check src/cooperbench/
uv run python -m mypy src/cooperbench/
uv run python -m pytest tests/ --tb=short -q (385 passed, 63 skipped)
Re-parsed broken session: 43 → 129 messages, roles correct, content blocks render as [tool_use ...] {...} / text / [tool_result] ...

Migration

Old agent*_traj.json files on disk are still wrong — they were written with the buggy parser. But the underlying *_session.jsonl and *_stream.jsonl are complete; the trajectory can be re-derived from *_session.jsonl by calling the fixed parse_session_jsonl on it.

🤖 Generated with Claude Code

claude_code adapter — fix parse_session_jsonl dropping all assistant turns from agent*_traj.json. The parser treated message.role as authoritative and rejected any event whose role wasn't in {user, assistant, system}. Recent claude-code session writers emit assistant turns with message.role: None — the role lives only in event.type — so every LLM turn (text / thinking / tool_use) got filtered out, leaving traj files that contained only user tool_result entries. Affected every claude_code run since the session-format shift, including all 0.0.17 / 0.0.18 trajectories on disk. Fix: fall back to event.type when message.role is missing. On a representative session (anyhow_task/390/f1_f4/agent2_session.jsonl) this recovers all 86 assistant events, taking the parsed trajectory from 43 messages (all user) to 129 (43 user + 86 assistant). The underlying *_session.jsonl and *_stream.jsonl files were always complete — only the derived *_traj.json was wrong, so historical runs can be re-parsed by calling parse_session_jsonl on the on-disk session file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

akhatua2 merged commit d46d9e7 into main May 26, 2026
3 checks passed

akhatua2 deleted the release/v0.0.19 branch May 26, 2026 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.0.19#68

release: v0.0.19#68
akhatua2 merged 1 commit into
mainfrom
release/v0.0.19

akhatua2 commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akhatua2 commented May 26, 2026

Summary

The bug

The fix

Validation

Test plan

Migration

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant