Skip to content

fix(openai-shim): strip store for local providers (vLLM, custom)#1048

Merged
kevincodex1 merged 1 commit intoGitlawb:mainfrom
0xfandom:fix/672-strip-store-local-provider
May 8, 2026
Merged

fix(openai-shim): strip store for local providers (vLLM, custom)#1048
kevincodex1 merged 1 commit intoGitlawb:mainfrom
0xfandom:fix/672-strip-store-local-provider

Conversation

@0xfandom
Copy link
Copy Markdown
Contributor

@0xfandom 0xfandom commented May 7, 2026

Summary

Local OpenAI-compatible servers (vLLM, llama.cpp, custom self-hosted gateways) frequently validate request bodies against a strict JSON schema and reject unknown fields with 400. The shim sends store: false — an OpenAI-only flag for cloud conversation persistence — and already strips it for cloud hosts that share the same intolerance (Gemini #959, Cerebras #1040). Local servers have no remote-storage concept and belong in the same bucket.

This PR adds isLocal to shouldStripResponsesStore, so any baseUrl resolved by isLocalProviderUrl() (localhost / 127.0.0.1 / ::1 / 0.0.0.0) gets the field removed.

Impact

  • Users: strict local backends (vLLM + Qwen, llama.cpp custom builds) no longer 400 on first request. Lenient ones (Ollama) already ignored the field — no behavior change for them.
  • Devs: single-line OR added to existing strip predicate. Pattern matches the merged Cerebras/Gemini host predicates.

Testing

  • New unit test: Local provider (vLLM/Ollama/etc.): strips unsupported store on chat_completions (#672) covers http://localhost:8000/v1.
  • bun test src/services/api/openaiShim.test.ts — 92 pass / 0 fail.
  • bun run build — bundle clean.
  • bun run smoke — 0.9.2 OK.

Notes

Local OpenAI-compatible servers (vLLM, llama.cpp, custom self-hosted
gateways) often validate request bodies against a strict JSON schema
and reject unknown fields with `400 Bad Request`. The shim already
sends `store: false` (an OpenAI-only field for cloud conversation
persistence) and strips it for known cloud hosts that share the same
intolerance (Gemini, Cerebras). Local servers have no notion of
remote conversation storage and fall in the same bucket.

Add `isLocal` to `shouldStripResponsesStore` so any baseUrl resolved
by `isLocalProviderUrl` (localhost / 127.0.0.1 / ::1 / 0.0.0.0) gets
the field removed. Lenient locals (Ollama) already ignored it; this
unblocks strict ones (vLLM Qwen) without behavior change for the
former.

Closes Gitlawb#672 (the `store: false` symptom; the separate `max_tokens`
default vs. vLLM `max_model_len` collision is a different concern).
Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the shim change and targeted test. The implementation is narrowly scoped: local OpenAI-compatible URLs now use the same store stripping path as the existing strict providers, and the added regression test covers the localhost chat-completions request body.

Verification: bun test src/services/api/openaiShim.test.ts passes, 92 tests / 0 failures.

@kevincodex1 kevincodex1 merged commit 4830d6f into Gitlawb:main May 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API Error: 400 : OpenClaude + vLLM em Setup Multi-GPU (RTX 3090)

3 participants