Skip to content

Add Ollama Cloud provider#743

Open
dalton-cole wants to merge 2 commits into
crmne:mainfrom
dalton-cole:dc/ollama-cloud
Open

Add Ollama Cloud provider#743
dalton-cole wants to merge 2 commits into
crmne:mainfrom
dalton-cole:dc/ollama-cloud

Conversation

@dalton-cole
Copy link
Copy Markdown

What this does

Ollama's hosted service (https://ollama.com) exposes OpenAI-compatible endpoints at /v1 with Bearer-token auth. This PR adds :ollama_cloud as a dedicated provider inheriting from the existing Ollama provider so chat, streaming, media, and dynamic model listing all work unchanged, while correctly reporting as remote, requiring an API key, and defaulting api_base to https://ollama.com/v1.

Two class-level overrides are load-bearing:

  • slug returns "ollama_cloud" — the default "ollamacloud" would mismatch the :ollama_cloud registration symbol and break Model::Info#provider lookups.
  • assume_models_exist? returns true — cloud models are dynamic and not in the static registry; the existing Ollama provider gets the same shortcut via its local? flag, which OllamaCloud correctly returns false for.

Models.dev already catalogs Ollama Cloud under the key "ollama-cloud", so a single MODELS_DEV_PROVIDER_MAP entry wires its 37 models into the shared registry with full metadata (context_window, max_output_tokens, capabilities). Adding ollama_cloud_api_key to models.rake's configure_from_env lets the maintainer's next rake models:update populate the shipped models.json.

Verified live against the hosted API: /v1/models returns the expected OpenAI list shape, sync and streaming chat both work on gpt-oss:120b, ConfigurationError is raised when the key is missing, and RubyLLM.models.refresh! populates 38 entries (37 from models.dev + 1 additional from the live provider listing).

8 VCR cassettes recorded (4 basic chat, 1 streaming, 3 thinking). Two reasoning-model quirks match the existing ollama/qwen3 skip pattern: system-prompt replacement and streaming-vs-sync token count drift.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Required for new features

Quality check

  • I ran overcommit --install and all hooks pass

    SKIP=RSpec overcommit --run pre-commit reports "All pre-commit hooks passed" — gitleaks, rubocop, flay, appraisal-update, and trailing-whitespace all clean on the staged files. The RSpec hook invokes the full live suite, which needs credentials for all 13 providers and can't run from a single-provider local environment.
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]

      Used a scoped rspec -e ollama_cloud invocation rather than the rake vcr:record[ollama_cloud] task because the task runs the entire test queue and its FileUtils.rm_f(cassette_path) if example.exception hook would delete bedrock/vertexai/azure cassettes on any unrelated credential-missing failure. The 8 recorded cassettes are structurally identical to what the rake task produces (URI templated via <OLLAMA_CLOUD_API_BASE> filter, Authorization redacted to <AUTH_TOKEN>, no key leakage).
    • All tests pass: bundle exec rspec

      Targeted subset verified (38 unit spec examples + 14 scoped live tests incl. 2 documented skips). The full live suite is a maintainer-environment check.
  • I updated documentation if needed

    docs/_getting_started/configuration.md (API keys block, reference block, and a new "Ollama Cloud" subsection with subscription-tier pricing note), README.md and docs/index.md provider lists, .env.example, and spec/support/rubyllm_configuration.rb.
  • I didn't modify auto-generated files manually (models.json, aliases.json)

    Neither file was touched. The MODELS_DEV_PROVIDER_MAP + models.rake wiring lets the next rake models:update regenerate them correctly.

AI-generated code

  • I used AI tools to help write this code
  • I have reviewed and understand all generated code (required if above is checked)

API changes

  • Breaking change
  • New public methods/classes

    Adds RubyLLM::Providers::OllamaCloud and the ollama_cloud_api_key / ollama_cloud_api_base configuration options. No existing API changed.
  • Changed method signatures
  • No API changes

Ollama's hosted service (https://ollama.com) exposes OpenAI-compatible
endpoints at /v1 with Bearer-token auth. Add :ollama_cloud as a dedicated
provider inheriting from the existing Ollama provider so chat, streaming,
media, and dynamic model listing all work unchanged, while correctly
reporting as remote, requiring an API key, and defaulting api_base to
https://ollama.com/v1.

Two class-level overrides are load-bearing:

  - `slug` returns "ollama_cloud" — the default "ollamacloud" would
    mismatch the :ollama_cloud registration symbol and break
    Model::Info#provider lookups.

  - `assume_models_exist?` returns true — cloud models are dynamic and
    not in the static registry; the existing Ollama provider gets the
    same shortcut via its `local?` flag, which OllamaCloud correctly
    returns false for.

Models.dev already catalogs Ollama Cloud under the key "ollama-cloud",
so the MODELS_DEV_PROVIDER_MAP entry wires its 37 models into the
shared registry with full metadata (context_window, max_output_tokens,
capabilities). Adding ollama_cloud_api_key to models.rake's
configure_from_env lets the maintainer's next `rake models:update`
populate the shipped models.json.

Verified live against the hosted API: /v1/models returns the expected
OpenAI list shape, sync and streaming chat both work on gpt-oss:120b,
ConfigurationError is raised when the key is missing, and
`RubyLLM.models.refresh!` populates 38 entries (37 from models.dev +
1 additional from the live provider listing).

8 VCR cassettes recorded (4 basic chat, 1 streaming, 3 thinking). Two
reasoning-model quirks match the existing ollama/qwen3 skip pattern:
system-prompt replacement and streaming-vs-sync token count drift.

Resolves crmne#740.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants