Skip to content

release: v0.0.19#68

Merged
akhatua2 merged 1 commit into
mainfrom
release/v0.0.19
May 26, 2026
Merged

release: v0.0.19#68
akhatua2 merged 1 commit into
mainfrom
release/v0.0.19

Conversation

@akhatua2

Copy link
Copy Markdown
Collaborator

Summary

Fixes claude_code adapter dropping every assistant turn from agent*_traj.json.

The bug

parse_session_jsonl (src/cooperbench/agents/claude_code/parsers.py) treated message.role as authoritative and rejected any event whose role wasn't in {user, assistant, system}. Recent claude-code session writers emit assistant turns with message.role: None — the role lives only in the top-level event.type — so every LLM turn (text, thinking, tool_use) got filtered out, leaving traj files that contained only user tool_result entries.

Affected every claude_code run since the session-format shift, including all 0.0.17 / 0.0.18 trajectories on disk.

The fix

One line — fall back to event.type when message.role is missing. Purely additive: existing message.role values still win.

role = message.get("role") or event.get("type")

Validation

Re-parsed the broken anyhow_task/390/f1_f4/agent2_session.jsonl (86 raw assistant events, 43 user events) with the fix:

  • Before: 43 messages, all user
  • After: 129 messages (43 user + 86 assistant)

Cross-checked 17 session files across recent CooperBench + CooperData runs:

  • 0 drops, 0 skew, 0 empty-content issues
  • tool_use block counts match raw event counts exactly (28/28, 125/125, 44/44, 42/42 on spot-checked agents)
  • Patch contents align with Edit/Write tool_use calls

Other parsers untouched (parse_stream_json populates result.json fields, unaffected). Audited src/cooperbench/ for similar message.role-strict checks — none found in claude_code or runner/; swe_agent uses a different schema.

Test plan

  • uv run ruff check src/cooperbench/
  • uv run ruff format --check src/cooperbench/
  • uv run python -m mypy src/cooperbench/
  • uv run python -m pytest tests/ --tb=short -q (385 passed, 63 skipped)
  • Re-parsed broken session: 43 → 129 messages, roles correct, content blocks render as [tool_use ...] {...} / text / [tool_result] ...

Migration

Old agent*_traj.json files on disk are still wrong — they were written with the buggy parser. But the underlying *_session.jsonl and *_stream.jsonl are complete; the trajectory can be re-derived from *_session.jsonl by calling the fixed parse_session_jsonl on it.

🤖 Generated with Claude Code

claude_code adapter — fix parse_session_jsonl dropping all assistant
turns from agent*_traj.json.

The parser treated message.role as authoritative and rejected any event
whose role wasn't in {user, assistant, system}.  Recent claude-code
session writers emit assistant turns with message.role: None — the role
lives only in event.type — so every LLM turn (text / thinking /
tool_use) got filtered out, leaving traj files that contained only user
tool_result entries.  Affected every claude_code run since the
session-format shift, including all 0.0.17 / 0.0.18 trajectories on
disk.

Fix: fall back to event.type when message.role is missing.

On a representative session (anyhow_task/390/f1_f4/agent2_session.jsonl)
this recovers all 86 assistant events, taking the parsed trajectory
from 43 messages (all user) to 129 (43 user + 86 assistant).

The underlying *_session.jsonl and *_stream.jsonl files were always
complete — only the derived *_traj.json was wrong, so historical runs
can be re-parsed by calling parse_session_jsonl on the on-disk session
file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@akhatua2 akhatua2 merged commit d46d9e7 into main May 26, 2026
3 checks passed
@akhatua2 akhatua2 deleted the release/v0.0.19 branch May 26, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant