Add Ollama Cloud provider#743
Open
dalton-cole wants to merge 2 commits into
Open
Conversation
Ollama's hosted service (https://ollama.com) exposes OpenAI-compatible endpoints at /v1 with Bearer-token auth. Add :ollama_cloud as a dedicated provider inheriting from the existing Ollama provider so chat, streaming, media, and dynamic model listing all work unchanged, while correctly reporting as remote, requiring an API key, and defaulting api_base to https://ollama.com/v1. Two class-level overrides are load-bearing: - `slug` returns "ollama_cloud" — the default "ollamacloud" would mismatch the :ollama_cloud registration symbol and break Model::Info#provider lookups. - `assume_models_exist?` returns true — cloud models are dynamic and not in the static registry; the existing Ollama provider gets the same shortcut via its `local?` flag, which OllamaCloud correctly returns false for. Models.dev already catalogs Ollama Cloud under the key "ollama-cloud", so the MODELS_DEV_PROVIDER_MAP entry wires its 37 models into the shared registry with full metadata (context_window, max_output_tokens, capabilities). Adding ollama_cloud_api_key to models.rake's configure_from_env lets the maintainer's next `rake models:update` populate the shipped models.json. Verified live against the hosted API: /v1/models returns the expected OpenAI list shape, sync and streaming chat both work on gpt-oss:120b, ConfigurationError is raised when the key is missing, and `RubyLLM.models.refresh!` populates 38 entries (37 from models.dev + 1 additional from the live provider listing). 8 VCR cassettes recorded (4 basic chat, 1 streaming, 3 thinking). Two reasoning-model quirks match the existing ollama/qwen3 skip pattern: system-prompt replacement and streaming-vs-sync token count drift. Resolves crmne#740. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0f09155 to
d1ae15a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Ollama's hosted service (https://ollama.com) exposes OpenAI-compatible endpoints at
/v1with Bearer-token auth. This PR adds:ollama_cloudas a dedicated provider inheriting from the existing Ollama provider so chat, streaming, media, and dynamic model listing all work unchanged, while correctly reporting as remote, requiring an API key, and defaultingapi_basetohttps://ollama.com/v1.Two class-level overrides are load-bearing:
slugreturns"ollama_cloud"— the default"ollamacloud"would mismatch the:ollama_cloudregistration symbol and breakModel::Info#providerlookups.assume_models_exist?returnstrue— cloud models are dynamic and not in the static registry; the existing Ollama provider gets the same shortcut via itslocal?flag, whichOllamaCloudcorrectly returnsfalsefor.Models.dev already catalogs Ollama Cloud under the key
"ollama-cloud", so a singleMODELS_DEV_PROVIDER_MAPentry wires its 37 models into the shared registry with full metadata (context_window,max_output_tokens, capabilities). Addingollama_cloud_api_keytomodels.rake'sconfigure_from_envlets the maintainer's nextrake models:updatepopulate the shippedmodels.json.Verified live against the hosted API:
/v1/modelsreturns the expected OpenAI list shape, sync and streaming chat both work ongpt-oss:120b,ConfigurationErroris raised when the key is missing, andRubyLLM.models.refresh!populates 38 entries (37 from models.dev + 1 additional from the live provider listing).8 VCR cassettes recorded (4 basic chat, 1 streaming, 3 thinking). Two reasoning-model quirks match the existing
ollama/qwen3skip pattern: system-prompt replacement and streaming-vs-sync token count drift.Type of change
Scope check
Required for new features
Quality check
overcommit --installand all hooks passSKIP=RSpec overcommit --run pre-commitreports "All pre-commit hooks passed" — gitleaks, rubocop, flay, appraisal-update, and trailing-whitespace all clean on the staged files. The RSpec hook invokes the full live suite, which needs credentials for all 13 providers and can't run from a single-provider local environment.bundle exec rake vcr:record[provider_name]Used a scoped
rspec -e ollama_cloudinvocation rather than therake vcr:record[ollama_cloud]task because the task runs the entire test queue and itsFileUtils.rm_f(cassette_path) if example.exceptionhook would delete bedrock/vertexai/azure cassettes on any unrelated credential-missing failure. The 8 recorded cassettes are structurally identical to what the rake task produces (URI templated via<OLLAMA_CLOUD_API_BASE>filter,Authorizationredacted to<AUTH_TOKEN>, no key leakage).bundle exec rspecTargeted subset verified (38 unit spec examples + 14 scoped live tests incl. 2 documented skips). The full live suite is a maintainer-environment check.
docs/_getting_started/configuration.md(API keys block, reference block, and a new "Ollama Cloud" subsection with subscription-tier pricing note),README.mdanddocs/index.mdprovider lists,.env.example, andspec/support/rubyllm_configuration.rb.models.json,aliases.json)Neither file was touched. The
MODELS_DEV_PROVIDER_MAP+models.rakewiring lets the nextrake models:updateregenerate them correctly.AI-generated code
API changes
Adds
RubyLLM::Providers::OllamaCloudand theollama_cloud_api_key/ollama_cloud_api_baseconfiguration options. No existing API changed.