Fix Structured Output for GPT-OSS Models#4386
Fix Structured Output for GPT-OSS Models#4386windreamer wants to merge 3 commits intoInternLM:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes structured output for GPT-OSS models by avoiding Guided Decoding (which conflicts with Harmony response parsing) and instead injecting the requested response schema into the prompt using Harmony’s native # Response Formats section.
Changes:
- Detect GPT-OSS (
arch == 'GptOssForCausalLM') requests with non-textresponse_format. - Inject the serialized
response_formatschema into the system message under# Response Formats(creating a system message if missing). - Disable guided decoding for GPT-OSS by clearing the local
response_formatpassed intoGenerationConfig.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f51f924 to
d3f847a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
It's been a while since I compiled from source ,does |
No need to recompile it, you can just patch the python part. |
d3f847a to
d4366e3
Compare
d4366e3 to
8cb9ef2
Compare
… Harmony/JSON mode conflict for GPT-OSS Move the GPT-OSS guided decoding logic from api_server.py inline code into GptOssResponseParser._convert_response_format_to_harmony(), following the established ResponseParser pattern for model-specific request handling. When the model architecture is GptOssForCausalLM and a structured response_format is requested, the schema is now injected into the system prompt as a '# Response Formats' section and response_format is cleared on the request to avoid the conflict between Harmony-native mode and the engine's built-in JSON/response-format mode. In api_server.py, response_format extraction is moved after parser instantiation so that the parser can modify the request first. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
8cb9ef2 to
e991917
Compare
…ormat conversion tests - Build format_body without leading newlines; only prefix with \n\n when appending to an existing system message. This prevents a newly inserted system message from starting with blank lines that could interact poorly with downstream chat-template rendering. - Add TestGptOssResponseFormatHarmonyConversion test class with 5 tests: 1. response_format is cleared after conversion 2. schema appended to existing system message 3. schema inserted as new system message (no leading blank lines) 4. text-type response_format is not converted 5. no response_format leaves request unchanged
49633d0 to
91e9ca5
Compare
1. Guard model_copy() with hasattr check: extract _clear_response_format()
helper that falls back to in-place mutation for non-Pydantic request
objects (e.g. test sentinels). Prevents double-raise in the except path.
2. Use logger.exception() instead of logger.error(f'...{e}') so that
stack traces are preserved in the log output.
3. Mark _patch_streamable_parser fixture as autouse=True and remove
redundant monkeypatch.setattr calls from individual test methods.
91e9ca5 to
cb76d43
Compare
Motivation
GPT-OSS models use Harmony Response format, which conflicts with Guided Decoding (token-level JSON constraint) when
response_formatis specified. This causes:message.parsedresultsBreaking existing OpenAI SDK clients using
client.beta.chat.completions.parse().Modification
Approach: Replace Guided Decoding with Harmony-native structured output.
response_format# Response Formatssectionresponse_formatcloses: #4347