feat: SeFi-Image support#1707
Conversation
…_t-shifted semantic timestep
…+ VAE semantic slice)
…lta_t via --extra-sample-args
…rs into SefiFlowDenoiser constants
|
The quality seems surprisingly good for such a small model. |
…-diffusion-model + --vae like krea2/flux2)
| bool double_z = true; | ||
| } dd_config; | ||
|
|
||
| void init_params(ggml_context* ctx, const String2TensorStorage& tensor_storage_map = {}, std::string prefix = "") override { |
There was a problem hiding this comment.
bn.running_mean and bn.running_var should be extracted as constants and provided through get_latents_mean_std for calls to vae_to_diffusion_latents and diffusion_to_vae_latents.
It looks like the SeFi-image uses the standard FLUX.2 VAE. If so, no special handling is needed here.
| case FLUX2_FLOW_PRED: { | ||
| LOG_INFO("running in Flux2 FLOW mode"); | ||
| denoiser = std::make_shared<Flux2FlowDenoiser>(); | ||
| if (sd_version_is_sefi_image(version)) { |
There was a problem hiding this comment.
Specifying FLUX2_FLOW_PRED means using Flux2FlowDenoiser, rather than switching the pred based on the model version.
| sefi_path = SAFE_STR(sd_ctx_params->model_path); | ||
| } | ||
| bool is_turbo = sefi_path.find("turbo") != std::string::npos; | ||
| sefi_denoiser->timestep_shift_alpha = is_turbo ? SefiFlowDenoiser::kAlphaTurbo |
There was a problem hiding this comment.
It should support specifying alpha directly, rather than adding a parameter like turbo.
| : timesteps_vec; | ||
| adjust_sample_step_scalings(shifted_timestep, scaling_timesteps_vec, c_in, &c_skip, &c_out); | ||
|
|
||
| if (auto sefi_denoiser = std::dynamic_pointer_cast<SefiFlowDenoiser>(denoiser)) { |
There was a problem hiding this comment.
This should be placed inside process_timesteps.
|
|
||
| ## 🔥Important News | ||
|
|
||
| * **2026/06/26** 🚀 stable-diffusion.cpp now supports **SeFi-Image** |
There was a problem hiding this comment.
Import news should only add models with high community discussion. Based on the current level of interest, SeFi-Image does not meet the requirement.
| prefix_map["te1."] = "text_encoders.clip_l.transformer."; | ||
| } | ||
|
|
||
| if (sd_version_is_sefi_image(version)) { |
There was a problem hiding this comment.
The --diffusion-model parameter should be used instead of hardcoding the prefix here.
Summary
Adds inference support for SeFi-Image, a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: arXiv:2606.22568. See docs/sefi_image.md.
What's in:
VERSION_SEFI_IMAGE+ version detectionsemantic_embedder+texture_embedder, concat)script/convert_sefi.pyfor converting diffusers checkpoint to single sd.cpp safetensors--extra-sample-args sefi_alpha=0.3/sefi_delta_t=0.1overridesturboin path => alpha=1.0, else alpha=0.3Related Issue / Discussion
Closes #1702.
Additional Information
Example
./build/bin/sd-cli \ --model /path/to/sefi_1b_turbo.safetensors \ --llm /path/to/qwen3_vl_2b.safetensors \ -p "a photograph of an orange tabby cat sitting on a couch" \ --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \ --diffusion-fa --offload-to-cpu \ -o out.pngTested variants (all 7 from huggingface.co/SeFi-Image)
--max-vram 8 --stream-layers5B variants use
Qwen3-VL-4B-Instructas the text encoder (1B/2B use 2B). 5B needs streaming on 12GB-class GPUs.Checklist