fix: Preserve reasoning_content for DeepSeek edge-case assistant messages by nickmesen · Pull Request #895 · Gitlawb/openclaude

nickmesen · 2026-04-25T07:53:40Z

reasoning_content to be present on every assistant message in the conversation history. In the DeepSeek scenarios I reproduced, omitting the property entirely causes a 400 error:

"The reasoning_content in the thinking mode must be passed back
to the API."

The original DeepSeek support introduced in commit ff2a380 seems to handle the happy path correctly for assistant messages that contain a thinking block, but from the testing I did, it looks like a few edge cases may have been missed:

Assistant messages with array-based content but no thinking block (for example, when the model calls tools without emitting visible thinking).
Assistant messages where content is a plain string rather than an array, which appears to bypass the existing convertMessages() path.
The synthetic "[Tool execution interrupted by user]" message injected during coalescing, which is created outside convertMessages().

Additionally, conversationRecovery.ts was stripping all thinking blocks during deserialization for third-party providers, which seems to prevent the shim from converting them back into reasoning_content.

Proposed changes in this PR:

openaiShim.ts: attach reasoning_content to assistant messages when preserveReasoningContent is true, using "" when no thinking block is present.
openaiShim.ts: add reasoning_content to the synthetic interrupt message and to the string-content else branch.
conversationRecovery.ts: remove stripThinkingBlocks() and let the shim handle provider-specific filtering.

These changes are gated behind preserveReasoningContent, which is currently enabled only for DeepSeek and Moonshot, so the expected impact on other providers should be limited. That said, my validation here was focused on DeepSeek, so I’m not fully sure whether there could be secondary effects in other provider paths.

Tests:

Updated conversationRecovery test
65 passing, 0 failing

Summary

what changed
why it changed

Impact

user-facing impact:
developer/maintainer impact:

Testing

bun run build
bun run smoke
focused tests:

Notes

provider/model path tested:
screenshots attached (if UI changed):
follow-up work or known limitations:

nickmesen · 2026-04-25T16:19:11Z

[Bug / Potential Fix] DeepSeek V4 Flash/Pro: reasoning_content must be passed back to the API (400) on assistant messages with tool_calls #878

fulalas · 2026-04-25T16:33:08Z

After applying this patch I get once in a while this error: API Error: preserveReasoningContent is not defined

nickmesen · 2026-04-25T18:48:40Z

After applying this patch I get once in a while this error: API Error: preserveReasoningContent is not defined

@fulalas Thanks a lot for trying the patch and for reporting this.

So far I haven’t been able to reproduce that error in my own tests, but it’s definitely possible that this introduced a secondary side effect, or that there’s still another edge case not covered yet.

For context, I applied the patch on top of the latest main at the time: 9e23c2bec43697187762601db5b1585c9b0fb1a3.

If you can reproduce it consistently, that would be extremely helpful. The preserveReasoningContent is not defined stack trace is probably the most important clue here, since it should show exactly which file and line are failing.

If possible, could you run it with --debug-file enabled and share the relevant stack trace/logs?

openclaude --model deepseek-v4-pro --effort high --debug-file /tmp/oc-debug.log

At the moment I haven’t been able to do much heavier testing, mainly because of the token cost of deepseek-v4-pro. In my normal flow I’m mostly using deepseek-v4-flash, so if I can reproduce it there as well, I’ll investigate further.

It would also help a lot to get a maintainer/reviewer’s eyes on this, in case there’s still something missing in the current fix.

nickmesen · 2026-04-26T02:34:38Z

After applying this patch I get once in a while this error: API Error: preserveReasoningContent is not defined

@fulalas Thanks a lot for trying the patch and for reporting this.

So far I haven’t been able to reproduce that error in my own tests, but it’s definitely possible that this introduced a secondary side effect, or that there’s still another edge case not covered yet.

For context, I applied the patch on top of the latest main at the time: 9e23c2bec43697187762601db5b1585c9b0fb1a3.

If you can reproduce it consistently, that would be extremely helpful. The preserveReasoningContent is not defined stack trace is probably the most important clue here, since it should show exactly which file and line are failing.

If possible, could you run it with --debug-file enabled and share the relevant stack trace/logs?
openclaude --model deepseek-v4-pro --effort high --debug-file /tmp/oc-debug.log
At the moment I haven’t been able to do much heavier testing, mainly because of the token cost of deepseek-v4-pro. In my normal flow I’m mostly using deepseek-v4-flash, so if I can reproduce it there as well, I’ll investigate further.

It would also help a lot to get a maintainer/reviewer’s eyes on this, in case there’s still something missing in the current fix.

Overview

Both deepseek-v4-flash and deepseek-v4-pro were stress-tested after applying the reasoning_content fix on top of commit 9e23c2b. Zero 400 errors related to reasoning_content were observed across 304 API calls and ~2 hours of heavy concurrent usage.

Head-to-Head Comparison

Metric	`deepseek-v4-flash`	`deepseek-v4-pro`	Combined
API calls	192	112	304
`agent_summary` subagents	150	131	281
`extract_memories` subagents	149	161	310
Duration	~47 min	~75 min	~2 hours
400 `reasoning_content` errors	0	0	0
Success rate	100%	100%	100%

Key Differences

deepseek-v4-flash: More calls (192) in less time (~47 min). Higher throughput, more intense session per minute.
deepseek-v4-pro: Fewer calls (112) but longer sessions (~75 min). More extract_memories subagents (161 vs 149).
Both: Zero 400 errors. The fix is stable on both models.

What Was Tested

Both sessions ran heavy concurrent agent flows, not single long conversations:

Main agent issuing requests
Hundreds of forked subagents (agent_summary, extract_memories) running in parallel
Growing message histories, accumulated tool calls, and edge cases across turns

This validates all 4 edge cases covered by the fix:

Assistant messages without a thinking block
Assistant messages where content is a plain string
Synthetic "[Tool execution interrupted by user]" messages
Thinking blocks preserved through conversationRecovery.ts

Errors Observed (Unrelated to This Fix)

The following errors appeared during the sessions but are pre-existing issues, not caused by the reasoning_content changes:

Error	`flash`	`pro`	Source
`TypeError: anthropic.beta.messages.countTokens is not a function`	✅	✅	Pre-existing OpenClaude bug
`Error: File does not exist`	✅	✅	Project-specific file path issue (App)
`Error streaming, falling back to non-streaming mode: terminated`	✅	✅	Network timeout / streaming interruption

None of these are 400 errors from DeepSeek.

Validation Scope

This validation specifically covers the DeepSeek reasoning_content failure fixed by this PR.

Tested provider/model paths:

deepseek-v4-flash
deepseek-v4-pro

Tested scenarios:

Long-running conversations with growing history
Heavy subagent usage
Assistant messages with tool calls
Assistant messages without visible thinking blocks
Plain string assistant content
Synthetic interrupted tool execution messages
Conversation recovery preserving thinking blocks before shim conversion

Out of Scope / Not Fully Validated

The following areas were not fully validated in this test pass:

Moonshot provider behavior, even though the flag is also enabled for Moonshot
Other OpenAI-compatible providers
The intermittent preserveReasoningContent is not defined report, because I was not able to reproduce it
Existing unrelated OpenClaude issues such as:
- anthropic.beta.messages.countTokens is not a function
- project-specific missing file errors
- network or streaming termination errors

jatmn · 2026-04-27T05:45:21Z

this will be impacted by #910

`reasoning_content` to be present on every assistant message in the conversation history. In the DeepSeek scenarios I reproduced, omitting the property entirely causes a `400` error: "The `reasoning_content` in the thinking mode must be passed back to the API." The original DeepSeek support introduced in commit `ff2a380` seems to handle the happy path correctly for assistant messages that contain a thinking block, but from the testing I did, it looks like a few edge cases may have been missed: 1. Assistant messages with array-based `content` but no thinking block (for example, when the model calls tools without emitting visible thinking). 2. Assistant messages where `content` is a plain string rather than an array, which appears to bypass the existing `convertMessages()` path. 3. The synthetic `"[Tool execution interrupted by user]"` message injected during coalescing, which is created outside `convertMessages()`. Additionally, `conversationRecovery.ts` was stripping all thinking blocks during deserialization for third-party providers, which seems to prevent the shim from converting them back into `reasoning_content`. Proposed changes in this PR: - `openaiShim.ts`: attach `reasoning_content` to assistant messages when `preserveReasoningContent` is `true`, using `""` when no thinking block is present. - `openaiShim.ts`: add `reasoning_content` to the synthetic interrupt message and to the string-content `else` branch. - `conversationRecovery.ts`: remove `stripThinkingBlocks()` and let the shim handle provider-specific filtering. These changes are gated behind `preserveReasoningContent`, which is currently enabled only for DeepSeek and Moonshot, so the expected impact on other providers should be limited. That said, my validation here was focused on DeepSeek, so I’m not fully sure whether there could be secondary effects in other provider paths. Tests: - Updated `conversationRecovery` test - `65` passing, `0` failing

gnanam1990

Thanks for digging into the DeepSeek edge cases. A few things to sort out before this lands:

Overlap with #914 — both PRs touch openaiShim.ts in the same reasoning_content territory. Could you sync with that author and agree on a merge order? Whichever lands second will need a rebase.
stripThinkingBlocks removal — the deleted call referenced 'issue #248 finding 5'. Could you link to that finding (or add a test) showing the new flow doesn't regress it? The shim-side filtering should cover it, but it's worth pinning down.
Missing regression tests for the actual edge cases the PR claims to fix:
- synthetic interrupt assistant message
- string-content branch
- array content with no thinking block
Empty-string reasoning_content — DeepSeek docs ask for the original reasoning_content; an empty string may itself be rejected on some setups. Worth verifying against more than the local repro.

Happy to re-review once tests are in and the #248 question is addressed.

nickmesen · 2026-04-28T05:09:08Z

Thanks for digging into the DeepSeek edge cases. A few things to sort out before this lands:

Overlap with fix: add NVIDIA API host to reasoning_content allowlist for DeepSeek V4 models #914 — both PRs touch openaiShim.ts in the same reasoning_content territory. Could you sync with that author and agree on a merge order? Whichever lands second will need a rebase.

stripThinkingBlocks removal — the deleted call referenced 'issue bug: REPL session management sends Anthropic-only parameters to 3P providers #248 finding 5'. Could you link to that finding (or add a test) showing the new flow doesn't regress it? The shim-side filtering should cover it, but it's worth pinning down.

Missing regression tests for the actual edge cases the PR claims to fix:

synthetic interrupt assistant message

string-content branch

array content with no thinking block

Empty-string reasoning_content — DeepSeek docs ask for the original reasoning_content; an empty string may itself be rejected on some setups. Worth verifying against more than the local repro.

Happy to re-review once tests are in and the #248 question is addressed.

Thanks for the detailed review. @gnanam1990

Re: overlap with fix: add NVIDIA API host to reasoning_content allowlist for DeepSeek V4 models #914 — Agreed. I understand that fix: add NVIDIA API host to reasoning_content allowlist for DeepSeek V4 models #914 improves provider detection, but it does not fix the 400 invalid_request_error caused by missing or incorrectly propagated reasoning_content.

My fix is complementary to that work. Once #914 lands, I can rebase on top of it and adapt this PR to use the cleaner provider detection path introduced there, such as providerSupportsReasoning(), instead of relying only on the DeepSeek base URL check.

So the merge order makes sense to me: #914 first, then this PR rebased on top of it. The reasoning_content fixes should remain localized to the shim layer and should be straightforward to adapt.

Re: stripThinkingBlocks / bug: REPL session management sends Anthropic-only parameters to 3P providers #248 finding 5 — You're right to flag this. To avoid any risk of regression on session resume, I'm reverting the stripThinkingBlocks removal. The reasoning_content fix in the shim layer is independent and remains intact.
Re: regression tests — I understand the concern. The difficulty with mocked tests for this specific issue is that they can verify our local serialization logic, but they cannot prove what DeepSeek actually accepts or rejects at the API boundary.

What I can add are unit/regression tests for the shim behavior in the reported branches:

synthetic interrupt assistant message
string-content branch
array content with no thinking block

Those tests would help protect the OpenClaude-side behavior.

For the provider-side validation, I propose running a real OpenClaude + DeepSeek session using this bug as the working initiative in my AI-driven development workflow. In other words, instead of only testing isolated mocked cases, I would use OpenClaude itself to work through this issue against the real DeepSeek provider.

That would exercise the integration through a realistic end-to-end flow and generate a full debug log from an actual development session. Since the session would be focused on this public OpenClaude bug, I can share the resulting logs without exposing private project data.

I can also attach a short Markdown guide explaining how to locate each relevant case in the log, including:

synthetic interrupt assistant message
string-content branch
array content with no thinking block
turns where reasoning_content is set to ""

That way, the tests would protect the local shim behavior, while the real session would validate that the same payloads work correctly against api.deepseek.com.

Re: empty-string reasoning_content — When an assistant message has no previous thinking block, such as pure tool_use, synthetic interruption, or the string-content branch, there are only two options: omit reasoning_content or set it to "".

Omitting it causes the 400 invalid_request_error this PR is addressing. Setting it to "" is accepted by DeepSeek for these edge cases.

The attached debug log supports this: across the full session, there are zero invalid_request_error responses, including turns where reasoning_content was "". If DeepSeek rejected empty strings, those message types would fail and the log would show 400s. It does not.

That said, I also saw the recent note mentioning that this DeepSeek V4 reasoning_content issue is already tracked in #878, with fixes in flight in #918 and #925.

If you think either of those PRs already solves the problem, or if one of them is the safer path forward, I’m happy to close this PR and align with that direction.

My main goal is to help make sure the DeepSeek compatibility issue is resolved in the most reliable way, since I’m currently exploring adding OpenClaude + DeepSeek to my workflow.

Could you please confirm whether you would like me to continue with this PR and apply the changes above, adjust the approach, or close it in favor of #918 or #925?

nickmesen · 2026-04-28T16:33:29Z

Thanks, I checked #918 and it appears to cover the empty-string reasoning_content fallback for some of the flows I was concerned about. I hope that covers all the necessary cases.

Given that, I’m happy to close this PR and align with #918/#925 as the preferred path. I’ll keep my local patch only until those changes land in a release and I can validate them in my workflow.

Feel free to reuse any validation notes, logs, or edge-case analysis from this PR if useful.

nickmesen mentioned this pull request Apr 25, 2026

[Bug / Potential Fix] DeepSeek V4 Flash/Pro: reasoning_content must be passed back to the API (400) on assistant messages with tool_calls #878

Closed

kevincodex1 mentioned this pull request Apr 26, 2026

About deepseek reasoning error #904

Open

nickmesen force-pushed the fix-878-deepSeek-V4-Flash/Pro-reasoning_content branch from 29184bd to a1854ab Compare April 26, 2026 15:46

jatmn mentioned this pull request Apr 27, 2026

api错误 #915

Open

nickmesen force-pushed the fix-878-deepSeek-V4-Flash/Pro-reasoning_content branch from a1854ab to cdd3d4f Compare April 27, 2026 07:12

gnanam1990 requested changes Apr 28, 2026

View reviewed changes

nickmesen closed this Apr 28, 2026

dolphprefect mentioned this pull request Apr 30, 2026

Can't work with deepseek-v4-flash when it opens thinking mode #940

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Preserve reasoning_content for DeepSeek edge-case assistant messages#895

fix: Preserve reasoning_content for DeepSeek edge-case assistant messages#895
nickmesen wants to merge 1 commit intoGitlawb:mainfrom
nickmesen:fix-878-deepSeek-V4-Flash/Pro-reasoning_content

nickmesen commented Apr 25, 2026

Uh oh!

nickmesen commented Apr 25, 2026 •

edited

Loading

Uh oh!

fulalas commented Apr 25, 2026

Uh oh!

nickmesen commented Apr 25, 2026 •

edited

Loading

Uh oh!

nickmesen commented Apr 26, 2026 •

edited

Loading

Uh oh!

jatmn commented Apr 27, 2026

Uh oh!

gnanam1990 left a comment

Uh oh!

nickmesen commented Apr 28, 2026 •

edited

Loading

Uh oh!

nickmesen commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nickmesen commented Apr 25, 2026

Summary

Impact

Testing

Notes

Uh oh!

nickmesen commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fulalas commented Apr 25, 2026

Uh oh!

nickmesen commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickmesen commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Head-to-Head Comparison

Key Differences

What Was Tested

Errors Observed (Unrelated to This Fix)

Validation Scope

Uh oh!

jatmn commented Apr 27, 2026

Uh oh!

gnanam1990 left a comment

Choose a reason for hiding this comment

Uh oh!

nickmesen commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickmesen commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nickmesen commented Apr 25, 2026 •

edited

Loading

nickmesen commented Apr 25, 2026 •

edited

Loading

nickmesen commented Apr 26, 2026 •

edited

Loading

nickmesen commented Apr 28, 2026 •

edited

Loading