Skip to content

feat: SeFi-Image support#1707

Open
fszontagh wants to merge 21 commits into
leejet:masterfrom
fszontagh:feat/sefi-image-prototype
Open

feat: SeFi-Image support#1707
fszontagh wants to merge 21 commits into
leejet:masterfrom
fszontagh:feat/sefi-image-prototype

Conversation

@fszontagh

@fszontagh fszontagh commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds inference support for SeFi-Image, a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: arXiv:2606.22568. See docs/sefi_image.md.

What's in:

  • VERSION_SEFI_IMAGE + version detection
  • Dual-time embedding block (semantic_embedder + texture_embedder, concat)
  • Per-stream Euler sampler with alpha-shift + delta_t
  • SeFi-aware Qwen3-VL conditioning (chat template, layers 9/18/27)
  • VAE BN normalization on packed texture latents
  • script/convert_sefi.py for converting diffusers checkpoint to single sd.cpp safetensors
  • --extra-sample-args sefi_alpha=0.3 / sefi_delta_t=0.1 overrides
  • Filename heuristic: turbo in path => alpha=1.0, else alpha=0.3

Related Issue / Discussion

Closes #1702.

Additional Information

Example

./build/bin/sd-cli \
  --model /path/to/sefi_1b_turbo.safetensors \
  --llm   /path/to/qwen3_vl_2b.safetensors \
  -p "a photograph of an orange tabby cat sitting on a couch" \
  --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
  --diffusion-fa --offload-to-cpu \
  -o out.png

Tested variants (all 7 from huggingface.co/SeFi-Image)

Variant Encoder Baseline (12GB VRAM) --max-vram 8 --stream-layers
1B-Base qwen3_vl_2b ok 109s ok 172s
1B-turbo qwen3_vl_2b ok 14s ok 17s
2B-Base qwen3_vl_2b ok 229s ok 296s
2B-turbo qwen3_vl_2b ok 29s ok 25s
5B-Base qwen3_vl_4b OOM ok 563s
5B-turbo qwen3_vl_4b OOM ok 170s
5B-RL qwen3_vl_4b OOM ok 587s

5B variants use Qwen3-VL-4B-Instruct as the text encoder (1B/2B use 2B). 5B needs streaming on 12GB-class GPUs.

Checklist

fszontagh added 18 commits June 23, 2026 19:53
@GreenShadows

Copy link
Copy Markdown

The quality seems surprisingly good for such a small model.

bool double_z = true;
} dd_config;

void init_params(ggml_context* ctx, const String2TensorStorage& tensor_storage_map = {}, std::string prefix = "") override {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bn.running_mean and bn.running_var should be extracted as constants and provided through get_latents_mean_std for calls to vae_to_diffusion_latents and diffusion_to_vae_latents.

It looks like the SeFi-image uses the standard FLUX.2 VAE. If so, no special handling is needed here.

Comment thread src/stable-diffusion.cpp
case FLUX2_FLOW_PRED: {
LOG_INFO("running in Flux2 FLOW mode");
denoiser = std::make_shared<Flux2FlowDenoiser>();
if (sd_version_is_sefi_image(version)) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying FLUX2_FLOW_PRED means using Flux2FlowDenoiser, rather than switching the pred based on the model version.

Comment thread src/stable-diffusion.cpp
sefi_path = SAFE_STR(sd_ctx_params->model_path);
}
bool is_turbo = sefi_path.find("turbo") != std::string::npos;
sefi_denoiser->timestep_shift_alpha = is_turbo ? SefiFlowDenoiser::kAlphaTurbo

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should support specifying alpha directly, rather than adding a parameter like turbo.

Comment thread src/stable-diffusion.cpp
: timesteps_vec;
adjust_sample_step_scalings(shifted_timestep, scaling_timesteps_vec, c_in, &c_skip, &c_out);

if (auto sefi_denoiser = std::dynamic_pointer_cast<SefiFlowDenoiser>(denoiser)) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be placed inside process_timesteps.

Comment thread README.md

## 🔥Important News

* **2026/06/26** 🚀 stable-diffusion.cpp now supports **SeFi-Image**

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import news should only add models with high community discussion. Based on the current level of interest, SeFi-Image does not meet the requirement.

Comment thread src/name_conversion.cpp
prefix_map["te1."] = "text_encoders.clip_l.transformer.";
}

if (sd_version_is_sefi_image(version)) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --diffusion-model parameter should be used instead of hardcoding the prefix here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] SeFi-Image-5B-turbo

3 participants