fix: Preserve reasoning_content for DeepSeek edge-case assistant messages#895
fix: Preserve reasoning_content for DeepSeek edge-case assistant messages#895nickmesen wants to merge 1 commit intoGitlawb:mainfrom
Conversation
|
[Bug / Potential Fix] DeepSeek V4 Flash/Pro: reasoning_content must be passed back to the API (400) on assistant messages with tool_calls #878 |
|
After applying this patch I get once in a while this error: |
@fulalas Thanks a lot for trying the patch and for reporting this. So far I haven’t been able to reproduce that error in my own tests, but it’s definitely possible that this introduced a secondary side effect, or that there’s still another edge case not covered yet. For context, I applied the patch on top of the latest If you can reproduce it consistently, that would be extremely helpful. The If possible, could you run it with openclaude --model deepseek-v4-pro --effort high --debug-file /tmp/oc-debug.logAt the moment I haven’t been able to do much heavier testing, mainly because of the token cost of It would also help a lot to get a maintainer/reviewer’s eyes on this, in case there’s still something missing in the current fix. |
OverviewBoth Head-to-Head Comparison
Key Differences
What Was TestedBoth sessions ran heavy concurrent agent flows, not single long conversations:
This validates all 4 edge cases covered by the fix:
Errors Observed (Unrelated to This Fix)The following errors appeared during the sessions but are pre-existing issues, not caused by the
None of these are 400 errors from DeepSeek. Validation ScopeThis validation specifically covers the DeepSeek reasoning_content failure fixed by this PR. Tested provider/model paths:
Tested scenarios:
Out of Scope / Not Fully Validated The following areas were not fully validated in this test pass:
|
29184bd to
a1854ab
Compare
|
this will be impacted by #910 |
`reasoning_content` to be present on every assistant message in the conversation history. In the DeepSeek scenarios I reproduced, omitting the property entirely causes a `400` error: "The `reasoning_content` in the thinking mode must be passed back to the API." The original DeepSeek support introduced in commit `ff2a380` seems to handle the happy path correctly for assistant messages that contain a thinking block, but from the testing I did, it looks like a few edge cases may have been missed: 1. Assistant messages with array-based `content` but no thinking block (for example, when the model calls tools without emitting visible thinking). 2. Assistant messages where `content` is a plain string rather than an array, which appears to bypass the existing `convertMessages()` path. 3. The synthetic `"[Tool execution interrupted by user]"` message injected during coalescing, which is created outside `convertMessages()`. Additionally, `conversationRecovery.ts` was stripping all thinking blocks during deserialization for third-party providers, which seems to prevent the shim from converting them back into `reasoning_content`. Proposed changes in this PR: - `openaiShim.ts`: attach `reasoning_content` to assistant messages when `preserveReasoningContent` is `true`, using `""` when no thinking block is present. - `openaiShim.ts`: add `reasoning_content` to the synthetic interrupt message and to the string-content `else` branch. - `conversationRecovery.ts`: remove `stripThinkingBlocks()` and let the shim handle provider-specific filtering. These changes are gated behind `preserveReasoningContent`, which is currently enabled only for DeepSeek and Moonshot, so the expected impact on other providers should be limited. That said, my validation here was focused on DeepSeek, so I’m not fully sure whether there could be secondary effects in other provider paths. Tests: - Updated `conversationRecovery` test - `65` passing, `0` failing
a1854ab to
cdd3d4f
Compare
gnanam1990
left a comment
There was a problem hiding this comment.
Thanks for digging into the DeepSeek edge cases. A few things to sort out before this lands:
-
Overlap with #914 — both PRs touch openaiShim.ts in the same reasoning_content territory. Could you sync with that author and agree on a merge order? Whichever lands second will need a rebase.
-
stripThinkingBlocks removal — the deleted call referenced 'issue #248 finding 5'. Could you link to that finding (or add a test) showing the new flow doesn't regress it? The shim-side filtering should cover it, but it's worth pinning down.
-
Missing regression tests for the actual edge cases the PR claims to fix:
- synthetic interrupt assistant message
- string-content branch
- array content with no thinking block
-
Empty-string reasoning_content — DeepSeek docs ask for the original reasoning_content; an empty string may itself be rejected on some setups. Worth verifying against more than the local repro.
Happy to re-review once tests are in and the #248 question is addressed.
Thanks for the detailed review. @gnanam1990
My fix is complementary to that work. Once #914 lands, I can rebase on top of it and adapt this PR to use the cleaner provider detection path introduced there, such as providerSupportsReasoning(), instead of relying only on the DeepSeek base URL check. So the merge order makes sense to me: #914 first, then this PR rebased on top of it. The reasoning_content fixes should remain localized to the shim layer and should be straightforward to adapt.
What I can add are unit/regression tests for the shim behavior in the reported branches:
Those tests would help protect the OpenClaude-side behavior. For the provider-side validation, I propose running a real OpenClaude + DeepSeek session using this bug as the working initiative in my AI-driven development workflow. In other words, instead of only testing isolated mocked cases, I would use OpenClaude itself to work through this issue against the real DeepSeek provider. That would exercise the integration through a realistic end-to-end flow and generate a full debug log from an actual development session. Since the session would be focused on this public OpenClaude bug, I can share the resulting logs without exposing private project data. I can also attach a short Markdown guide explaining how to locate each relevant case in the log, including:
That way, the tests would protect the local shim behavior, while the real session would validate that the same payloads work correctly against api.deepseek.com.
Omitting it causes the 400 invalid_request_error this PR is addressing. Setting it to "" is accepted by DeepSeek for these edge cases. The attached debug log supports this: across the full session, there are zero invalid_request_error responses, including turns where reasoning_content was "". If DeepSeek rejected empty strings, those message types would fail and the log would show 400s. It does not. That said, I also saw the recent note mentioning that this DeepSeek V4 reasoning_content issue is already tracked in #878, with fixes in flight in #918 and #925. If you think either of those PRs already solves the problem, or if one of them is the safer path forward, I’m happy to close this PR and align with that direction. My main goal is to help make sure the DeepSeek compatibility issue is resolved in the most reliable way, since I’m currently exploring adding OpenClaude + DeepSeek to my workflow. Could you please confirm whether you would like me to continue with this PR and apply the changes above, adjust the approach, or close it in favor of #918 or #925? |
|
Thanks, I checked #918 and it appears to cover the empty-string reasoning_content fallback for some of the flows I was concerned about. I hope that covers all the necessary cases. Given that, I’m happy to close this PR and align with #918/#925 as the preferred path. I’ll keep my local patch only until those changes land in a release and I can validate them in my workflow. Feel free to reuse any validation notes, logs, or edge-case analysis from this PR if useful. |
reasoning_contentto be present on every assistant message in the conversation history. In the DeepSeek scenarios I reproduced, omitting the property entirely causes a400error:"The
reasoning_contentin the thinking mode must be passed backto the API."
The original DeepSeek support introduced in commit
ff2a380seems to handle the happy path correctly for assistant messages that contain a thinking block, but from the testing I did, it looks like a few edge cases may have been missed:contentbut no thinking block (for example, when the model calls tools without emitting visible thinking).contentis a plain string rather than an array, which appears to bypass the existingconvertMessages()path."[Tool execution interrupted by user]"message injected during coalescing, which is created outsideconvertMessages().Additionally,
conversationRecovery.tswas stripping all thinking blocks during deserialization for third-party providers, which seems to prevent the shim from converting them back intoreasoning_content.Proposed changes in this PR:
openaiShim.ts: attachreasoning_contentto assistant messages whenpreserveReasoningContentistrue, using""when no thinking block is present.openaiShim.ts: addreasoning_contentto the synthetic interrupt message and to the string-contentelsebranch.conversationRecovery.ts: removestripThinkingBlocks()and let the shim handle provider-specific filtering.These changes are gated behind
preserveReasoningContent, which is currently enabled only for DeepSeek and Moonshot, so the expected impact on other providers should be limited. That said, my validation here was focused on DeepSeek, so I’m not fully sure whether there could be secondary effects in other provider paths.Tests:
conversationRecoverytest65passing,0failingSummary
Impact
Testing
bun run buildbun run smokeNotes