fix(openai-shim): strip store for local providers (vLLM, custom)#1048
Merged
kevincodex1 merged 1 commit intoGitlawb:mainfrom May 8, 2026
Merged
Conversation
Local OpenAI-compatible servers (vLLM, llama.cpp, custom self-hosted gateways) often validate request bodies against a strict JSON schema and reject unknown fields with `400 Bad Request`. The shim already sends `store: false` (an OpenAI-only field for cloud conversation persistence) and strips it for known cloud hosts that share the same intolerance (Gemini, Cerebras). Local servers have no notion of remote conversation storage and fall in the same bucket. Add `isLocal` to `shouldStripResponsesStore` so any baseUrl resolved by `isLocalProviderUrl` (localhost / 127.0.0.1 / ::1 / 0.0.0.0) gets the field removed. Lenient locals (Ollama) already ignored it; this unblocks strict ones (vLLM Qwen) without behavior change for the former. Closes Gitlawb#672 (the `store: false` symptom; the separate `max_tokens` default vs. vLLM `max_model_len` collision is a different concern).
kevincodex1
approved these changes
May 7, 2026
techbrewboss
approved these changes
May 7, 2026
Collaborator
techbrewboss
left a comment
There was a problem hiding this comment.
Reviewed the shim change and targeted test. The implementation is narrowly scoped: local OpenAI-compatible URLs now use the same store stripping path as the existing strict providers, and the added regression test covers the localhost chat-completions request body.
Verification: bun test src/services/api/openaiShim.test.ts passes, 92 tests / 0 failures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Local OpenAI-compatible servers (vLLM, llama.cpp, custom self-hosted gateways) frequently validate request bodies against a strict JSON schema and reject unknown fields with 400. The shim sends
store: false— an OpenAI-only flag for cloud conversation persistence — and already strips it for cloud hosts that share the same intolerance (Gemini #959, Cerebras #1040). Local servers have no remote-storage concept and belong in the same bucket.This PR adds
isLocaltoshouldStripResponsesStore, so any baseUrl resolved byisLocalProviderUrl()(localhost/127.0.0.1/::1/0.0.0.0) gets the field removed.Impact
Testing
Local provider (vLLM/Ollama/etc.): strips unsupported store on chat_completions (#672)covershttp://localhost:8000/v1.bun test src/services/api/openaiShim.test.ts— 92 pass / 0 fail.bun run build— bundle clean.bun run smoke— 0.9.2 OK.Notes
store-only. Issue API Error: 400 : OpenClaude + vLLM em Setup Multi-GPU (RTX 3090) #672 also lists a separatemax_tokens=32000default colliding with vLLMmax_model_len=32768— that's a model-context concern, not a wire-shape one, and would need to land on the catalog/context side. Out of scope here.