fix: filter orphaned tool messages in _sanitize_assistant_messages#8350
fix: filter orphaned tool messages in _sanitize_assistant_messages#8350EmilyCheoh wants to merge 2 commits into
Conversation
Implement filtering of orphaned tool messages from cleaned messages to prevent API errors.
There was a problem hiding this comment.
Code Review
This pull request introduces logic to filter out orphaned tool messages from the OpenAI payload to prevent 400 API errors caused by context truncation. The review feedback suggests making the message and tool call parsing more robust by supporting both dictionary and object attribute access, as the messages might not always be dictionaries.
| valid_tc_ids = set() | ||
| final: list = [] | ||
| _orphan_count = 0 | ||
| for msg in cleaned: | ||
| if not isinstance(msg, dict): | ||
| final.append(msg) | ||
| continue | ||
| role = msg.get("role") | ||
| if role == "assistant" and msg.get("tool_calls"): | ||
| valid_tc_ids = {tc["id"] for tc in msg["tool_calls"] if isinstance(tc, dict) and "id" in tc} | ||
| final.append(msg) | ||
| elif role == "tool": | ||
| if msg.get("tool_call_id") in valid_tc_ids: | ||
| final.append(msg) | ||
| valid_tc_ids.discard(msg.get("tool_call_id")) | ||
| else: | ||
| _orphan_count += 1 | ||
| else: | ||
| valid_tc_ids = set() | ||
| final.append(msg) |
There was a problem hiding this comment.
The current implementation assumes that all messages in cleaned are dictionaries and that tool_calls contains only dictionary elements. If cleaned contains Message objects (or if tool_calls contains ToolCall objects), the code will either skip processing them or raise a TypeError / KeyError.
To make this sanitization robust and adhere to defensive programming practices, we should support both dictionary and object attribute access for role, tool_calls, and tool_call_id.
valid_tc_ids = set()
final: list = []
_orphan_count = 0
for msg in cleaned:
role = msg.get("role") if isinstance(msg, dict) else getattr(msg, "role", None)
tool_calls = msg.get("tool_calls") if isinstance(msg, dict) else getattr(msg, "tool_calls", None)
if role == "assistant" and isinstance(tool_calls, list) and tool_calls:
valid_tc_ids = {
tc["id"] if isinstance(tc, dict) else getattr(tc, "id", None)
for tc in tool_calls
}
valid_tc_ids.discard(None)
final.append(msg)
elif role == "tool":
tool_call_id = msg.get("tool_call_id") if isinstance(msg, dict) else getattr(msg, "tool_call_id", None)
if tool_call_id in valid_tc_ids:
final.append(msg)
valid_tc_ids.discard(tool_call_id)
else:
_orphan_count += 1
else:
valid_tc_ids = set()
final.append(msg)There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The logic around
valid_tc_idswould be clearer and less error-prone if you mutated a single set (valid_tc_ids.clear()/.update(...)) instead of reassigning it in different branches, which also makes the intended lifetime of the tracked IDs more obvious. - Consider explicitly typing
finalto reflect the expected message structure (e.g.,list[dict[str, Any]]) to make the intent and constraints of the sanitization pass clearer to future maintainers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The logic around `valid_tc_ids` would be clearer and less error-prone if you mutated a single set (`valid_tc_ids.clear()` / `.update(...)`) instead of reassigning it in different branches, which also makes the intended lifetime of the tracked IDs more obvious.
- Consider explicitly typing `final` to reflect the expected message structure (e.g., `list[dict[str, Any]]`) to make the intent and constraints of the sanitization pass clearer to future maintainers.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
a4c4a7d to
9bd38ca
Compare
Fixes #8349
After context truncation or compression removes an
assistantmessage containingtool_calls, the correspondingrole: "tool"response messages may remain in the conversation history. The API then rejects the request with:Modifications / 改动点
Added a second pass in
_sanitize_assistant_messages()(openai_source.py) that removes anyrole: "tool"message whosetool_call_iddoes not match atool_callsentry in a precedingassistantmessageActs as a last-line-of-defense before API dispatch, complementing the existing
fix_messages()inContextTruncatorThis is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Error before fix:
[sources.openai_source]: Chat Model request error: Error code: 400 - {'error': {'message': 'unexpected tool_use_id found in tool_result blocks: toolu_01AGPDyN5PStuEuoumrdgC9o. Each tool_result block must have a corresponding tool_use block in the previous message.'}}
After fix, orphaned messages are silently filtered and the request succeeds:
[sources.openai_source]: Filtered 4 orphaned tool message(s)
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Bug Fixes: