Skip to content

fix: Preserve reasoning_content for DeepSeek edge-case assistant messages#895

Closed
nickmesen wants to merge 1 commit intoGitlawb:mainfrom
nickmesen:fix-878-deepSeek-V4-Flash/Pro-reasoning_content
Closed

fix: Preserve reasoning_content for DeepSeek edge-case assistant messages#895
nickmesen wants to merge 1 commit intoGitlawb:mainfrom
nickmesen:fix-878-deepSeek-V4-Flash/Pro-reasoning_content

Conversation

@nickmesen
Copy link
Copy Markdown
Contributor

reasoning_content to be present on every assistant message in the conversation history. In the DeepSeek scenarios I reproduced, omitting the property entirely causes a 400 error:

"The reasoning_content in the thinking mode must be passed back
to the API."

The original DeepSeek support introduced in commit ff2a380 seems to handle the happy path correctly for assistant messages that contain a thinking block, but from the testing I did, it looks like a few edge cases may have been missed:

  1. Assistant messages with array-based content but no thinking block (for example, when the model calls tools without emitting visible thinking).
  2. Assistant messages where content is a plain string rather than an array, which appears to bypass the existing convertMessages() path.
  3. The synthetic "[Tool execution interrupted by user]" message injected during coalescing, which is created outside convertMessages().

Additionally, conversationRecovery.ts was stripping all thinking blocks during deserialization for third-party providers, which seems to prevent the shim from converting them back into reasoning_content.

Proposed changes in this PR:

  • openaiShim.ts: attach reasoning_content to assistant messages when preserveReasoningContent is true, using "" when no thinking block is present.
  • openaiShim.ts: add reasoning_content to the synthetic interrupt message and to the string-content else branch.
  • conversationRecovery.ts: remove stripThinkingBlocks() and let the shim handle provider-specific filtering.

These changes are gated behind preserveReasoningContent, which is currently enabled only for DeepSeek and Moonshot, so the expected impact on other providers should be limited. That said, my validation here was focused on DeepSeek, so I’m not fully sure whether there could be secondary effects in other provider paths.

Tests:

  • Updated conversationRecovery test
  • 65 passing, 0 failing

Summary

  • what changed
  • why it changed

Impact

  • user-facing impact:
  • developer/maintainer impact:

Testing

  • bun run build
  • bun run smoke
  • focused tests:

Notes

  • provider/model path tested:
  • screenshots attached (if UI changed):
  • follow-up work or known limitations:

@nickmesen
Copy link
Copy Markdown
Contributor Author

nickmesen commented Apr 25, 2026

[Bug / Potential Fix] DeepSeek V4 Flash/Pro: reasoning_content must be passed back to the API (400) on assistant messages with tool_calls #878

@fulalas
Copy link
Copy Markdown

fulalas commented Apr 25, 2026

After applying this patch I get once in a while this error: API Error: preserveReasoningContent is not defined

@nickmesen
Copy link
Copy Markdown
Contributor Author

nickmesen commented Apr 25, 2026

After applying this patch I get once in a while this error: API Error: preserveReasoningContent is not defined

@fulalas Thanks a lot for trying the patch and for reporting this.

So far I haven’t been able to reproduce that error in my own tests, but it’s definitely possible that this introduced a secondary side effect, or that there’s still another edge case not covered yet.

For context, I applied the patch on top of the latest main at the time: 9e23c2bec43697187762601db5b1585c9b0fb1a3.

If you can reproduce it consistently, that would be extremely helpful. The preserveReasoningContent is not defined stack trace is probably the most important clue here, since it should show exactly which file and line are failing.

If possible, could you run it with --debug-file enabled and share the relevant stack trace/logs?

openclaude --model deepseek-v4-pro --effort high --debug-file /tmp/oc-debug.log

At the moment I haven’t been able to do much heavier testing, mainly because of the token cost of deepseek-v4-pro. In my normal flow I’m mostly using deepseek-v4-flash, so if I can reproduce it there as well, I’ll investigate further.

It would also help a lot to get a maintainer/reviewer’s eyes on this, in case there’s still something missing in the current fix.

@nickmesen
Copy link
Copy Markdown
Contributor Author

nickmesen commented Apr 26, 2026

After applying this patch I get once in a while this error: API Error: preserveReasoningContent is not defined

@fulalas Thanks a lot for trying the patch and for reporting this.

So far I haven’t been able to reproduce that error in my own tests, but it’s definitely possible that this introduced a secondary side effect, or that there’s still another edge case not covered yet.

For context, I applied the patch on top of the latest main at the time: 9e23c2bec43697187762601db5b1585c9b0fb1a3.

If you can reproduce it consistently, that would be extremely helpful. The preserveReasoningContent is not defined stack trace is probably the most important clue here, since it should show exactly which file and line are failing.

If possible, could you run it with --debug-file enabled and share the relevant stack trace/logs?

openclaude --model deepseek-v4-pro --effort high --debug-file /tmp/oc-debug.log

At the moment I haven’t been able to do much heavier testing, mainly because of the token cost of deepseek-v4-pro. In my normal flow I’m mostly using deepseek-v4-flash, so if I can reproduce it there as well, I’ll investigate further.

It would also help a lot to get a maintainer/reviewer’s eyes on this, in case there’s still something missing in the current fix.

Overview

Both deepseek-v4-flash and deepseek-v4-pro were stress-tested after applying the reasoning_content fix on top of commit 9e23c2b. Zero 400 errors related to reasoning_content were observed across 304 API calls and ~2 hours of heavy concurrent usage.


Head-to-Head Comparison

Metric deepseek-v4-flash deepseek-v4-pro Combined
API calls 192 112 304
agent_summary subagents 150 131 281
extract_memories subagents 149 161 310
Duration ~47 min ~75 min ~2 hours
400 reasoning_content errors 0 0 0
Success rate 100% 100% 100%

Key Differences

  • deepseek-v4-flash: More calls (192) in less time (~47 min). Higher throughput, more intense session per minute.
  • deepseek-v4-pro: Fewer calls (112) but longer sessions (~75 min). More extract_memories subagents (161 vs 149).
  • Both: Zero 400 errors. The fix is stable on both models.

What Was Tested

Both sessions ran heavy concurrent agent flows, not single long conversations:

  • Main agent issuing requests
  • Hundreds of forked subagents (agent_summary, extract_memories) running in parallel
  • Growing message histories, accumulated tool calls, and edge cases across turns

This validates all 4 edge cases covered by the fix:

  1. Assistant messages without a thinking block
  2. Assistant messages where content is a plain string
  3. Synthetic "[Tool execution interrupted by user]" messages
  4. Thinking blocks preserved through conversationRecovery.ts

Errors Observed (Unrelated to This Fix)

The following errors appeared during the sessions but are pre-existing issues, not caused by the reasoning_content changes:

Error flash pro Source
TypeError: anthropic.beta.messages.countTokens is not a function Pre-existing OpenClaude bug
Error: File does not exist Project-specific file path issue (App)
Error streaming, falling back to non-streaming mode: terminated Network timeout / streaming interruption

None of these are 400 errors from DeepSeek.

Validation Scope

This validation specifically covers the DeepSeek reasoning_content failure fixed by this PR.

Tested provider/model paths:

  • deepseek-v4-flash
  • deepseek-v4-pro

Tested scenarios:

  • Long-running conversations with growing history
  • Heavy subagent usage
  • Assistant messages with tool calls
  • Assistant messages without visible thinking blocks
  • Plain string assistant content
  • Synthetic interrupted tool execution messages
  • Conversation recovery preserving thinking blocks before shim conversion

Out of Scope / Not Fully Validated

The following areas were not fully validated in this test pass:

  • Moonshot provider behavior, even though the flag is also enabled for Moonshot
  • Other OpenAI-compatible providers
  • The intermittent preserveReasoningContent is not defined report, because I was not able to reproduce it
  • Existing unrelated OpenClaude issues such as:
    • anthropic.beta.messages.countTokens is not a function
    • project-specific missing file errors
    • network or streaming termination errors

@nickmesen nickmesen force-pushed the fix-878-deepSeek-V4-Flash/Pro-reasoning_content branch from 29184bd to a1854ab Compare April 26, 2026 15:46
@jatmn jatmn mentioned this pull request Apr 27, 2026
@jatmn
Copy link
Copy Markdown
Collaborator

jatmn commented Apr 27, 2026

this will be impacted by #910

`reasoning_content` to be present on every assistant message in the
conversation history. In the DeepSeek scenarios I reproduced, omitting
the property entirely causes a `400` error:

  "The `reasoning_content` in the thinking mode must be passed back
  to the API."

The original DeepSeek support introduced in commit `ff2a380` seems to
handle the happy path correctly for assistant messages that contain a
thinking block, but from the testing I did, it looks like a few edge
cases may have been missed:

1. Assistant messages with array-based `content` but no thinking block
   (for example, when the model calls tools without emitting visible
   thinking).
2. Assistant messages where `content` is a plain string rather than an
   array, which appears to bypass the existing `convertMessages()` path.
3. The synthetic `"[Tool execution interrupted by user]"` message
   injected during coalescing, which is created outside
   `convertMessages()`.

Additionally, `conversationRecovery.ts` was stripping all thinking
blocks during deserialization for third-party providers, which seems to
prevent the shim from converting them back into `reasoning_content`.

Proposed changes in this PR:
- `openaiShim.ts`: attach `reasoning_content` to assistant messages
  when `preserveReasoningContent` is `true`, using `""` when no
  thinking block is present.
- `openaiShim.ts`: add `reasoning_content` to the synthetic interrupt
  message and to the string-content `else` branch.
- `conversationRecovery.ts`: remove `stripThinkingBlocks()` and let
  the shim handle provider-specific filtering.

These changes are gated behind `preserveReasoningContent`, which is
currently enabled only for DeepSeek and Moonshot, so the expected impact
on other providers should be limited. That said, my validation here was
focused on DeepSeek, so I’m not fully sure whether there could be
secondary effects in other provider paths.

Tests:
- Updated `conversationRecovery` test
- `65` passing, `0` failing
@nickmesen nickmesen force-pushed the fix-878-deepSeek-V4-Flash/Pro-reasoning_content branch from a1854ab to cdd3d4f Compare April 27, 2026 07:12
Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into the DeepSeek edge cases. A few things to sort out before this lands:

  1. Overlap with #914 — both PRs touch openaiShim.ts in the same reasoning_content territory. Could you sync with that author and agree on a merge order? Whichever lands second will need a rebase.

  2. stripThinkingBlocks removal — the deleted call referenced 'issue #248 finding 5'. Could you link to that finding (or add a test) showing the new flow doesn't regress it? The shim-side filtering should cover it, but it's worth pinning down.

  3. Missing regression tests for the actual edge cases the PR claims to fix:

    • synthetic interrupt assistant message
    • string-content branch
    • array content with no thinking block
  4. Empty-string reasoning_content — DeepSeek docs ask for the original reasoning_content; an empty string may itself be rejected on some setups. Worth verifying against more than the local repro.

Happy to re-review once tests are in and the #248 question is addressed.

@nickmesen
Copy link
Copy Markdown
Contributor Author

nickmesen commented Apr 28, 2026

Thanks for digging into the DeepSeek edge cases. A few things to sort out before this lands:

  1. Overlap with fix: add NVIDIA API host to reasoning_content allowlist for DeepSeek V4 models #914 — both PRs touch openaiShim.ts in the same reasoning_content territory. Could you sync with that author and agree on a merge order? Whichever lands second will need a rebase.

  2. stripThinkingBlocks removal — the deleted call referenced 'issue bug: REPL session management sends Anthropic-only parameters to 3P providers #248 finding 5'. Could you link to that finding (or add a test) showing the new flow doesn't regress it? The shim-side filtering should cover it, but it's worth pinning down.

  3. Missing regression tests for the actual edge cases the PR claims to fix:

    • synthetic interrupt assistant message
    • string-content branch
    • array content with no thinking block
  4. Empty-string reasoning_content — DeepSeek docs ask for the original reasoning_content; an empty string may itself be rejected on some setups. Worth verifying against more than the local repro.

Happy to re-review once tests are in and the #248 question is addressed.

Thanks for the detailed review. @gnanam1990

  1. Re: overlap with fix: add NVIDIA API host to reasoning_content allowlist for DeepSeek V4 models #914 — Agreed. I understand that fix: add NVIDIA API host to reasoning_content allowlist for DeepSeek V4 models #914 improves provider detection, but it does not fix the 400 invalid_request_error caused by missing or incorrectly propagated reasoning_content.

My fix is complementary to that work. Once #914 lands, I can rebase on top of it and adapt this PR to use the cleaner provider detection path introduced there, such as providerSupportsReasoning(), instead of relying only on the DeepSeek base URL check.

So the merge order makes sense to me: #914 first, then this PR rebased on top of it. The reasoning_content fixes should remain localized to the shim layer and should be straightforward to adapt.

  1. Re: stripThinkingBlocks / bug: REPL session management sends Anthropic-only parameters to 3P providers #248 finding 5 — You're right to flag this. To avoid any risk of regression on session resume, I'm reverting the stripThinkingBlocks removal. The reasoning_content fix in the shim layer is independent and remains intact.

  2. Re: regression tests — I understand the concern. The difficulty with mocked tests for this specific issue is that they can verify our local serialization logic, but they cannot prove what DeepSeek actually accepts or rejects at the API boundary.

What I can add are unit/regression tests for the shim behavior in the reported branches:

  • synthetic interrupt assistant message
  • string-content branch
  • array content with no thinking block

Those tests would help protect the OpenClaude-side behavior.

For the provider-side validation, I propose running a real OpenClaude + DeepSeek session using this bug as the working initiative in my AI-driven development workflow. In other words, instead of only testing isolated mocked cases, I would use OpenClaude itself to work through this issue against the real DeepSeek provider.

That would exercise the integration through a realistic end-to-end flow and generate a full debug log from an actual development session. Since the session would be focused on this public OpenClaude bug, I can share the resulting logs without exposing private project data.

I can also attach a short Markdown guide explaining how to locate each relevant case in the log, including:

  • synthetic interrupt assistant message
  • string-content branch
  • array content with no thinking block
  • turns where reasoning_content is set to ""

That way, the tests would protect the local shim behavior, while the real session would validate that the same payloads work correctly against api.deepseek.com.

  1. Re: empty-string reasoning_content — When an assistant message has no previous thinking block, such as pure tool_use, synthetic interruption, or the string-content branch, there are only two options: omit reasoning_content or set it to "".

Omitting it causes the 400 invalid_request_error this PR is addressing. Setting it to "" is accepted by DeepSeek for these edge cases.

The attached debug log supports this: across the full session, there are zero invalid_request_error responses, including turns where reasoning_content was "". If DeepSeek rejected empty strings, those message types would fail and the log would show 400s. It does not.

That said, I also saw the recent note mentioning that this DeepSeek V4 reasoning_content issue is already tracked in #878, with fixes in flight in #918 and #925.

If you think either of those PRs already solves the problem, or if one of them is the safer path forward, I’m happy to close this PR and align with that direction.

My main goal is to help make sure the DeepSeek compatibility issue is resolved in the most reliable way, since I’m currently exploring adding OpenClaude + DeepSeek to my workflow.

Could you please confirm whether you would like me to continue with this PR and apply the changes above, adjust the approach, or close it in favor of #918 or #925?

@nickmesen
Copy link
Copy Markdown
Contributor Author

Thanks, I checked #918 and it appears to cover the empty-string reasoning_content fallback for some of the flows I was concerned about. I hope that covers all the necessary cases.

Given that, I’m happy to close this PR and align with #918/#925 as the preferred path. I’ll keep my local patch only until those changes land in a release and I can validate them in my workflow.

Feel free to reuse any validation notes, logs, or edge-case analysis from this PR if useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants