Skip to content

Support Gemini Image models in RubyLLM.paint#750

Open
danieldenis01 wants to merge 4 commits into
crmne:mainfrom
academia-ruby:claude/inspiring-saha-0db974
Open

Support Gemini Image models in RubyLLM.paint#750
danieldenis01 wants to merge 4 commits into
crmne:mainfrom
academia-ruby:claude/inspiring-saha-0db974

Conversation

@danieldenis01
Copy link
Copy Markdown

@danieldenis01 danieldenis01 commented Apr 26, 2026

Closes #473.

What this does

Fixes RubyLLM.paint for the Gemini Image model family (gemini-2.5-flash-image, gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, nano-banana-pro-preview), which was hardcoded to the Imagen :predict protocol and unreachable.

The Gemini provider now branches on imagen?(model):

  • Imagen keeps :predict with instances/parameters (byte-for-byte unchanged).
  • Gemini Image routes to :generateContent with contents/parts and parses candidates[].content.parts[].inlineData, the same protocol Gemini chat uses.

On #473 you raised the concern that branching would require "a manual list of which model uses which interface, which is infeasible to maintain." This PR doesn't need one. The discriminator is a single prefix check — model.to_s.start_with?('imagen') — and the Gemini Image branch is the default fallthrough. Any new Gemini image model lands on the right path automatically (including nano-banana-pro-preview, which doesn't share the gemini- prefix). The check mirrors the existing /imagen/ pattern already in Gemini::Capabilities#pricing_family.

Improvements over the previous Gemini image generation

  • Image-to-image editing (with:) — new capability. Before this PR the Gemini provider had no support for image references (with: was an unused method arg, rejected by the base validate_paint_inputs!). The Gemini Image branch now accepts one or more local files / URLs / Attachment instances via with:, reusing Gemini::Media#format_attachment to build inline_data parts. Imagen still rejects with:.
  • size: is meaningful again on the Gemini Image branch. A small map translates the common DALL-E sizes (1024x1024, 1792x1024, 1024x1792, 1408x1024, 1024x1408) to Gemini aspectRatio. Unknown sizes default to 1:1 with a debug log. Imagen continues to ignore size:.
  • params: deep-merges into the payload so users can override any nested generationConfig / imageConfig field without clobbering the rest.
  • usageMetadata from Gemini Image responses is passed through to Image#usage.

The public signature RubyLLM.paint(prompt, model:, with:, size:, params:) is unchanged.

Reproduction (before this PR):

RubyLLM.paint('a red panda coding in Ruby', model: 'gemini-3.1-flash-image-preview')
# => RubyLLM::Error: ... is not supported for predict.

Type of change

  • Bug fix
  • New feature (image-to-image editing on the Gemini provider)
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • Re-recorded VCR cassettes with bundle exec rake vcr:record[gemini]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

Tests added

Integration (VCR-backed, via IMAGE_GENERATION_MODELS):

  • gemini-2.5-flash-image paint + image edit with with:
  • gemini-3.1-flash-image-preview paint (the exact model from the bug report)

Unit-level (spec/ruby_llm/providers/gemini/images_spec.rb, no network):

  • Imagen + with: raises UnsupportedAttachmentError
  • Imagen response missing bytesBase64Encoded raises RubyLLM::Error
  • Gemini Image + :unknown attachment type raises UnsupportedAttachmentError
  • Gemini Image + unmapped size: defaults aspectRatio to 1:1

Coverage: lib/ruby_llm/providers/gemini/images.rb at 100% line / 79% branch.

AI-generated code

  • I used AI tools to help write this code
  • I have reviewed and understand all generated code

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Out of scope

  • VertexAI has the same bug shape but no images.rb at all and needs OAuth — separate PR.
  • Streaming image generation (:streamGenerateContent).
  • Per-image pricing for the new Gemini Image entries (registry currently lacks pricing.images; Image#total_cost falls back to output_price_per_million).

danieldenis01 and others added 2 commits April 26, 2026 02:00
The Gemini provider's image generation was hardcoded to the Imagen
:predict endpoint, leaving the Gemini Image family (Nano Banana et al.)
unreachable: RubyLLM.paint with gemini-2.5-flash-image,
gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, or
nano-banana-pro-preview raised "is not supported for predict" even
though those models are listed in the registry with image output.

Branch internally on imagen?(model). Imagen keeps its existing
:predict/instances payload and predictions[].bytesBase64Encoded parsing
unchanged. Everything else routes through :generateContent with
contents/parts and parses candidates[].content.parts[].inlineData,
matching the protocol Gemini chat already speaks. The fallthrough also
covers nano-banana-pro-preview, which doesn't share the gemini- prefix.

Image-to-image editing via with: is supported on the Gemini Image
branch by reusing Gemini::Media#format_attachment to build inline_data
parts. validate_paint_inputs! is overridden as a no-op so the base
class's blanket attachment rejection doesn't fire; the model-aware
checks (mask: rejection, Imagen-with-with: rejection) live in
render_image_payload after @model is assigned.

size: is translated through SIZE_TO_ASPECT_RATIO for the common DALL-E
sizes; unknown sizes default to 1:1 with a debug log. Users override
via params:, which deep-merges into the payload so nested
generationConfig blocks aren't clobbered.

Tests cover both the original Nano Banana (gemini-2.5-flash-image,
paint + edit) and Nano Banana 2 (gemini-3.1-flash-image-preview, the
exact model from the bug report). Imagen, OpenAI, and OpenRouter image
tests pass unchanged.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.28%. Comparing base (5bdda1a) to head (ccbbdee).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   87.21%   87.28%   +0.06%     
==========================================
  Files         121      121              
  Lines        5703     5739      +36     
  Branches     1442     1454      +12     
==========================================
+ Hits         4974     5009      +35     
- Misses        729      730       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danieldenis01
Copy link
Copy Markdown
Author

I'll implement the missing test cases.

Codecov reported four uncovered lines in the Gemini Images branching:
the Imagen `with:` rejection, the Imagen response-shape guard, the
unknown-attachment-type rejection on the Gemini Image branch, and the
unmapped-size default. None of these paths are reachable from the
existing VCR-backed integration specs (Imagen rejects `with:` before
hitting the wire; Gemini Image cassettes only exercise PNG inputs and
the supported `1024x1024` size).

Add a focused unit spec at spec/ruby_llm/providers/gemini/images_spec.rb
that extends a bare object with Gemini::Media + Gemini::Images (same
pattern used in chat_spec.rb) and exercises each branch directly with
stubbed attachments and Faraday::Response doubles. No new cassettes
needed.

Brings lib/ruby_llm/providers/gemini/images.rb to 100% line coverage
and lifts branch coverage from 66.67% to 79.17%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add support for gemini 2.5 image (Nano banana) on the .paint action

2 participants