Add SeedVR2 support (CORE-6)#14424
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request integrates SeedVR2 as a first-class model family in ComfyUI, including model detection/configuration, native latent/conditioning handling, SeedVR2-specific VAE wrapper + tiling, and extra workflow nodes (preprocess/conditioning/progressive sampling/post-processing), with accompanying unit/regression coverage.
Changes:
- Adds SeedVR2 model detection, config, latent format, dtype policy, and model base wiring.
- Introduces SeedVR2 native VAE implementation/wrapper (including tiling + temporal/memory-state behavior) and SeedVR2 variable-length attention backend.
- Adds SeedVR2 extra nodes (preprocess/conditioning/progressive sampler/post-processing) and a broad unit/regression test suite.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
comfy/model_detection.py |
Adds SeedVR2 UNet config detection heuristics (3B/7B layouts). |
comfy/supported_models.py |
Registers SeedVR2 supported model + dtype policy hook. |
comfy/supported_models_base.py |
Extends set_inference_dtype API to optionally accept a device. |
comfy/model_base.py |
Adds SeedVR2 BaseModel wrapper and passes SeedVR2 conditioning through extra_conds. |
comfy/latent_formats.py |
Introduces SeedVR2 latent format (16-channel latent). |
comfy/sd.py |
Adds SeedVR2 VAE detection/wiring and “owned tiling” hooks for VAEs that handle their own tiling. |
nodes.py |
Registers nodes_seedvr.py in built-in extra nodes loading. |
comfy/ldm/seedvr/constants.py |
SeedVR2/ByteDance/published-standards constants used across the integration. |
comfy/ldm/seedvr/model.py |
Implements SeedVR2 NaDiT model and related utilities (windowing, RoPE, attention plumbing). |
comfy/ldm/seedvr/attention.py |
Adds SeedVR2 split-loop var-attention backend. |
comfy/ldm/seedvr/vae.py |
Implements SeedVR2 VAE + wrapper, causal memory-state handling, and tiled encode/decode. |
comfy/ldm/seedvr/color_fix.py |
Adds LAB/wavelet/AdaIN post-processing color transfer utilities. |
comfy_extras/nodes_seedvr.py |
Adds SeedVR2 preprocess, conditioning, progressive sampler, and post-processing nodes. |
tests-unit/comfy_test/model_detection_test.py |
Adds SeedVR2 model detection coverage. |
tests-unit/comfy_test/test_seedvr2_model.py |
Consolidated regression tests for model/rope/latent format and VAE graph boundaries. |
tests-unit/comfy_test/test_seedvr2_dtype.py |
Tests SeedVR2 dtype policy + conditioning behavior. |
tests-unit/comfy_test/test_seedvr2_internals.py |
Tests GroupNorm limit gating and SeedVR2 optimized var-attention contract. |
tests-unit/comfy_test/test_seedvr2_vae_decode.py |
Tests SeedVR2 VAE wrapper decode shape/rank contracts. |
tests-unit/comfy_test/test_seedvr2_vae_tiled.py |
Tests SeedVR2 VAE tiling/dispatch behavior (encode/decode fallback routing). |
tests-unit/comfy_test/test_seedvr_progressive_sampler.py |
Tests progressive sampler schema + chunking/validation behavior. |
tests-unit/comfy_test/seedvr_vae_forward_test.py |
Regression test for SeedVR VAE forward contract (no diffusers-style attributes). |
tests-unit/comfy_extras_test/test_seedvr2_nodes.py |
Verifies node schema/execute signature alignment for SeedVR2 nodes. |
tests-unit/comfy_extras_test/test_seedvr2_conditioning.py |
Tests SeedVR2 conditioning node output determinism + fail-loud behavior. |
tests-unit/comfy_extras_test/test_seedvr2_post_processing.py |
Tests post-processing schema and error messaging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if encode: | ||
| out = model.encode(t_chunk)[0] | ||
| else: | ||
| out = model.decode_(t_chunk) |
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis PR introduces SeedVR2, a complete video diffusion model architecture for ComfyUI. It adds a multimodal transformer (NaDiT) supporting video and text conditioning with windowed attention and rotary embeddings, a 3D causal VAE for video encoding/decoding with temporal memory management and overlap-aware tiling, color correction utilities using wavelet decomposition and CIELAB histogram matching, checkpoint detection for multiple model variants (7B separate/shared, 3B), Comfy workflow nodes for preprocessing, conditioning resolution, postprocessing with color correction modes, and progressive temporal chunk sampling with VRAM-aware auto-sizing, plus comprehensive regression tests validating the model pipeline and individual components. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_extras/nodes_seedvr.py`:
- Around line 284-286: The code currently crops odd-sized tensors by computing
h2/w2 and slicing output (variables h2, w2, output) which discards the last
row/column; instead preserve the original height/width (use h =
output.shape[-3], w = output.shape[-2]) and stop slicing away pixels. If
downstream code requires even dimensions, change the behavior to pad to the next
even size (compute pad_h = 0 if h%2==0 else 1, pad_w = 0 if w%2==0 else 1) and
apply symmetric zero/transparent padding to output (and the alpha plane) rather
than trimming; track the original h/w so you can remove padding only if and when
needed, ensuring alignment logic and alpha restore operate on the original
resized canvas rather than a cropped one.
In `@comfy/ldm/modules/diffusionmodules/model.py`:
- Around line 35-37: Guard the new downscale_freq_shift before it's used to
avoid division by zero/negative scale: check that downscale_freq_shift is an
integer >= 0 and strictly less than half_dim (where half_dim = embedding_dim //
2) so that (half_dim - downscale_freq_shift) > 0; if the check fails, raise a
clear ValueError explaining that downscale_freq_shift must be in [0,
half_dim-1]. Apply this validation immediately before the existing emb
computation that uses emb = math.log(10000) / (half_dim - downscale_freq_shift)
so the function (timestep embedding helper) fails fast instead of producing
invalid tensors.
In `@comfy/ldm/seedvr/color_fix.py`:
- Around line 36-45: The function wavelet_decomposition can return an unbound
low_freq when levels==0; to fix, initialize low_freq before the loop (e.g., set
low_freq = image or low_freq = image.clone()) so low_freq is always defined;
keep high_freq as torch.zeros_like(image) and then run the existing loop that
uses wavelet_blur, high_freq.add_(...), and image = low_freq—this ensures the
final return (high_freq, low_freq) is safe even for levels=0.
In `@comfy/ldm/seedvr/model.py`:
- Around line 1274-1275: The window_method construction uses num_layers // 2 *
["720pwin_by_size_bysize","720pswin_by_size_bysize"] which yields one element
short for odd num_layers and causes indexing errors; update window_method so it
produces exactly num_layers entries (e.g., repeat the two-item pattern and
append the first item when num_layers is odd) and replace the same pattern
wherever window_method is built in this file (the other occurrence noted in the
review). Ensure the rest of the constructor variables (txt_dim, vid_dim) remain
unchanged.
In `@comfy/ldm/seedvr/vae.py`:
- Around line 255-264: The process-global _NORM_LIMIT and setter
set_norm_limit() cause cross-request interference; change the design so memory
limits are instance-scoped: remove reliance on the global _NORM_LIMIT in
GroupNorm-related logic and add a per-wrapper/per-model attribute (e.g., an
instance field on VideoAutoencoderKLWrapper and any GroupNorm-wrapping classes)
that set_memory_limit() updates, then flow that instance-specific limit into the
chunking/forward logic instead of calling set_norm_limit(); ensure
GroupNorm/chunking functions accept the limit as an argument or read it from the
instance so concurrent VAEs don’t mutate a shared global.
In `@comfy/sd.py`:
- Around line 1235-1237: The generic post-format hook here (formatter =
getattr(self.first_stage_model, "comfy_format_encoded", None); samples =
formatter(samples)) must not turn SeedVR2 single-frame 4-D latents into 5-D
tensors; update the hook call or its result handling so that if
comfy_format_encoded (or any formatter) returns a 5-D tensor with a temporal
axis of length 1 (ndim == 5 and shape[2] == 1) and the model/instance expects
latent_dimensions == 2 (SeedVR2), you squeeze the temporal axis back to a 4-D
layout (B, C, H, W). Reference comfy_format_encoded /
VideoAutoencoderKLWrapper.comfy_format_encoded and VAE.encode()/encode_tiled()
to implement this check-and-squeeze immediately after samples =
formatter(samples) (or alternatively restrict formatters to value-only
transforms), so downstream code continues to receive 2-D/4-D SeedVR2 latents.
In `@tests-unit/comfy_test/seedvr_vae_forward_test.py`:
- Around line 23-28: The test module mutates the global CLI flag by setting
cli_args.cpu based on torch.cuda.is_available(), causing cross-module state
leakage; update the module to save the original value of cli_args.cpu before
changing it and restore it after the test module runs (use a module-level
teardown or an autouse pytest fixture) so other tests aren't
affected—specifically wrap the change to cli_args.cpu (the symbol referenced in
the import) with save/restore logic or perform the toggle inside a fixture that
yields and resets the original value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1bf2b866-576d-4b0b-b629-992a88010ac3
📒 Files selected for processing (25)
comfy/latent_formats.pycomfy/ldm/modules/diffusionmodules/model.pycomfy/ldm/seedvr/attention.pycomfy/ldm/seedvr/color_fix.pycomfy/ldm/seedvr/constants.pycomfy/ldm/seedvr/model.pycomfy/ldm/seedvr/vae.pycomfy/model_base.pycomfy/model_detection.pycomfy/sd.pycomfy/supported_models.pycomfy/supported_models_base.pycomfy_extras/nodes_seedvr.pynodes.pytests-unit/comfy_extras_test/test_seedvr2_conditioning.pytests-unit/comfy_extras_test/test_seedvr2_nodes.pytests-unit/comfy_extras_test/test_seedvr2_post_processing.pytests-unit/comfy_test/model_detection_test.pytests-unit/comfy_test/seedvr_vae_forward_test.pytests-unit/comfy_test/test_seedvr2_dtype.pytests-unit/comfy_test/test_seedvr2_internals.pytests-unit/comfy_test/test_seedvr2_model.pytests-unit/comfy_test/test_seedvr2_vae_decode.pytests-unit/comfy_test/test_seedvr2_vae_tiled.pytests-unit/comfy_test/test_seedvr_progressive_sampler.py
| h2 = output.shape[-3] - (output.shape[-3] % 2) | ||
| w2 = output.shape[-2] - (output.shape[-2] % 2) | ||
| output = output[:, :, :h2, :w2, :] |
There was a problem hiding this comment.
Don't trim odd-sized outputs here.
Lines 284-286 unconditionally drop the last row/column from odd-sized results, so post-processing no longer preserves the original resized dimensions. That breaks the node’s stated alignment behavior and also shifts/restores alpha onto a smaller canvas for odd-width or odd-height inputs.
Proposed fix
- h2 = output.shape[-3] - (output.shape[-3] % 2)
- w2 = output.shape[-2] - (output.shape[-2] % 2)
- output = output[:, :, :h2, :w2, :]
if decoded_was_4d:
output = output.reshape(-1, output.shape[-3], output.shape[-2], output.shape[-1])
return io.NodeOutput(output)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| h2 = output.shape[-3] - (output.shape[-3] % 2) | |
| w2 = output.shape[-2] - (output.shape[-2] % 2) | |
| output = output[:, :, :h2, :w2, :] | |
| if decoded_was_4d: | |
| output = output.reshape(-1, output.shape[-3], output.shape[-2], output.shape[-1]) | |
| return io.NodeOutput(output) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy_extras/nodes_seedvr.py` around lines 284 - 286, The code currently
crops odd-sized tensors by computing h2/w2 and slicing output (variables h2, w2,
output) which discards the last row/column; instead preserve the original
height/width (use h = output.shape[-3], w = output.shape[-2]) and stop slicing
away pixels. If downstream code requires even dimensions, change the behavior to
pad to the next even size (compute pad_h = 0 if h%2==0 else 1, pad_w = 0 if
w%2==0 else 1) and apply symmetric zero/transparent padding to output (and the
alpha plane) rather than trimming; track the original h/w so you can remove
padding only if and when needed, ensuring alignment logic and alpha restore
operate on the original resized canvas rather than a cropped one.
| half_dim = embedding_dim // 2 | ||
| emb = math.log(10000) / (half_dim - 1) | ||
| emb = math.log(10000) / (half_dim - downscale_freq_shift) | ||
| emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb) |
There was a problem hiding this comment.
Guard the new frequency-shift parameter.
downscale_freq_shift now feeds the divisor directly. Values >= embedding_dim // 2 make the frequency scale undefined or inverted, so this helper can return broken timestep embeddings instead of failing fast.
Suggested guard
def get_timestep_embedding(timesteps, embedding_dim, flip_sin_to_cos=False, downscale_freq_shift=1):
assert len(timesteps.shape) == 1
half_dim = embedding_dim // 2
+ if half_dim > 0 and downscale_freq_shift >= half_dim:
+ raise ValueError("downscale_freq_shift must be smaller than embedding_dim // 2")
emb = math.log(10000) / (half_dim - downscale_freq_shift)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| half_dim = embedding_dim // 2 | |
| emb = math.log(10000) / (half_dim - 1) | |
| emb = math.log(10000) / (half_dim - downscale_freq_shift) | |
| emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb) | |
| def get_timestep_embedding(timesteps, embedding_dim, flip_sin_to_cos=False, downscale_freq_shift=1): | |
| assert len(timesteps.shape) == 1 | |
| half_dim = embedding_dim // 2 | |
| if half_dim > 0 and downscale_freq_shift >= half_dim: | |
| raise ValueError("downscale_freq_shift must be smaller than embedding_dim // 2") | |
| emb = math.log(10000) / (half_dim - downscale_freq_shift) | |
| emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/ldm/modules/diffusionmodules/model.py` around lines 35 - 37, Guard the
new downscale_freq_shift before it's used to avoid division by zero/negative
scale: check that downscale_freq_shift is an integer >= 0 and strictly less than
half_dim (where half_dim = embedding_dim // 2) so that (half_dim -
downscale_freq_shift) > 0; if the check fails, raise a clear ValueError
explaining that downscale_freq_shift must be in [0, half_dim-1]. Apply this
validation immediately before the existing emb computation that uses emb =
math.log(10000) / (half_dim - downscale_freq_shift) so the function (timestep
embedding helper) fails fast instead of producing invalid tensors.
| def wavelet_decomposition(image: Tensor, levels: int = WAVELET_DECOMP_LEVELS): | ||
| high_freq = torch.zeros_like(image) | ||
|
|
||
| for i in range(levels): | ||
| radius = 2 ** i | ||
| low_freq = wavelet_blur(image, radius) | ||
| high_freq.add_(image).sub_(low_freq) | ||
| image = low_freq | ||
|
|
||
| return high_freq, low_freq |
There was a problem hiding this comment.
low_freq undefined when levels=0.
If levels=0 is passed, the loop body never executes and low_freq remains unbound, causing UnboundLocalError at the return statement. While the default is 5 and this is an internal function, a defensive initialization ensures robustness.
Proposed fix
def wavelet_decomposition(image: Tensor, levels: int = WAVELET_DECOMP_LEVELS):
high_freq = torch.zeros_like(image)
+ low_freq = image # fallback when levels=0
for i in range(levels):
radius = 2 ** i
low_freq = wavelet_blur(image, radius)
high_freq.add_(image).sub_(low_freq)
image = low_freq
return high_freq, low_freq🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/ldm/seedvr/color_fix.py` around lines 36 - 45, The function
wavelet_decomposition can return an unbound low_freq when levels==0; to fix,
initialize low_freq before the loop (e.g., set low_freq = image or low_freq =
image.clone()) so low_freq is always defined; keep high_freq as
torch.zeros_like(image) and then run the existing loop that uses wavelet_blur,
high_freq.add_(...), and image = low_freq—this ensures the final return
(high_freq, low_freq) is safe even for levels=0.
| from comfy.cli_args import args as cli_args | ||
|
|
||
| if not torch.cuda.is_available(): | ||
| cli_args.cpu = True | ||
|
|
||
| from comfy.ldm.seedvr.vae import VideoAutoencoderKL # noqa: E402 |
There was a problem hiding this comment.
Avoid leaking args.cpu across test modules.
tests-unit/comfy_test/seedvr_vae_forward_test.py, tests-unit/comfy_test/test_seedvr2_internals.py, tests-unit/comfy_test/test_seedvr2_model.py, tests-unit/comfy_test/test_seedvr2_vae_decode.py, and tests-unit/comfy_test/test_seedvr2_vae_tiled.py all set the same process-global CLI flag during import and never restore it. The shared root cause is test-state leakage: once one module forces CPU mode, later modules in the same pytest worker can run under the wrong backend configuration. Save the previous value and restore it after each module (for example with module teardown or an autouse fixture).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests-unit/comfy_test/seedvr_vae_forward_test.py` around lines 23 - 28, The
test module mutates the global CLI flag by setting cli_args.cpu based on
torch.cuda.is_available(), causing cross-module state leakage; update the module
to save the original value of cli_args.cpu before changing it and restore it
after the test module runs (use a module-level teardown or an autouse pytest
fixture) so other tests aren't affected—specifically wrap the change to
cli_args.cpu (the symbol referenced in the import) with save/restore logic or
perform the toggle inside a fixture that yields and resets the original value.
019f24f to
cfb9c31
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/sd.py`:
- Around line 1283-1297: The code is incorrectly truncating temporal frames by
slicing pixel_samples to maximum after converting through upscale/downscale;
instead compute any latent-aligned length needed (e.g. latent_max =
self.upscale_ratio[0](self.downscale_ratio[0](pixel_samples.shape[2]))) but do
NOT drop frames — pass the full pixel_samples to encode_tiled_3d and either pad
pixel_samples to latent_max or let encode_tiled_3d handle padding/alignment;
update the block around maximum/pixel_samples/encode_tiled_3d (symbols: maximum,
pixel_samples, encode_tiled_3d, encode_tiled) so temporal length is preserved
and tiling math uses padding/alignment rather than slicing.
In `@tests-unit/comfy_test/model_detection_test.py`:
- Around line 174-184: Add an assertion in
test_seedvr2_3b_shared_mm_detection_config to lock the 3B-only vid_out_norm
flag: after calling detect_unet_config(sd, "") and validating unet_config
fields, assert that unet_config["vid_out_norm"] is True so the 3B-specific
runtime knob (vid_out_norm) is covered; locate the test function
test_seedvr2_3b_shared_mm_detection_config and the detect_unet_config result to
add this check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1d80d0a2-2f48-4281-a04a-3b6e7289df8a
📒 Files selected for processing (25)
comfy/latent_formats.pycomfy/ldm/modules/diffusionmodules/model.pycomfy/ldm/seedvr/attention.pycomfy/ldm/seedvr/color_fix.pycomfy/ldm/seedvr/constants.pycomfy/ldm/seedvr/model.pycomfy/ldm/seedvr/vae.pycomfy/model_base.pycomfy/model_detection.pycomfy/sd.pycomfy/supported_models.pycomfy/supported_models_base.pycomfy_extras/nodes_seedvr.pynodes.pytests-unit/comfy_extras_test/test_seedvr2_conditioning.pytests-unit/comfy_extras_test/test_seedvr2_nodes.pytests-unit/comfy_extras_test/test_seedvr2_post_processing.pytests-unit/comfy_test/model_detection_test.pytests-unit/comfy_test/seedvr_vae_forward_test.pytests-unit/comfy_test/test_seedvr2_dtype.pytests-unit/comfy_test/test_seedvr2_internals.pytests-unit/comfy_test/test_seedvr2_model.pytests-unit/comfy_test/test_seedvr2_vae_decode.pytests-unit/comfy_test/test_seedvr2_vae_tiled.pytests-unit/comfy_test/test_seedvr_progressive_sampler.py
✅ Files skipped from review due to trivial changes (1)
- comfy/latent_formats.py
🚧 Files skipped from review as they are similar to previous changes (22)
- comfy/supported_models_base.py
- nodes.py
- comfy/ldm/modules/diffusionmodules/model.py
- comfy/model_base.py
- comfy/supported_models.py
- tests-unit/comfy_test/test_seedvr2_internals.py
- comfy/model_detection.py
- tests-unit/comfy_test/test_seedvr2_model.py
- comfy/ldm/seedvr/attention.py
- tests-unit/comfy_extras_test/test_seedvr2_nodes.py
- tests-unit/comfy_extras_test/test_seedvr2_conditioning.py
- comfy/ldm/seedvr/color_fix.py
- comfy/ldm/seedvr/constants.py
- tests-unit/comfy_test/test_seedvr2_dtype.py
- tests-unit/comfy_test/test_seedvr2_vae_decode.py
- tests-unit/comfy_test/test_seedvr2_vae_tiled.py
- tests-unit/comfy_extras_test/test_seedvr2_post_processing.py
- comfy/ldm/seedvr/vae.py
- tests-unit/comfy_test/seedvr_vae_forward_test.py
- comfy_extras/nodes_seedvr.py
- comfy/ldm/seedvr/model.py
- tests-unit/comfy_test/test_seedvr_progressive_sampler.py
| def test_seedvr2_3b_shared_mm_detection_config(self): | ||
| sd = _make_seedvr2_3b_shared_mm_sd() | ||
| unet_config = detect_unet_config(sd, "") | ||
|
|
||
| assert unet_config is not None | ||
| assert unet_config["image_model"] == "seedvr2" | ||
| assert unet_config["vid_dim"] == 2560 | ||
| assert unet_config["heads"] == 20 | ||
| assert unet_config["num_layers"] == 32 | ||
| assert unet_config["mlp_type"] == "swiglu" | ||
|
|
There was a problem hiding this comment.
Lock down the 3B-only vid_out_norm flag here.
The 3B SeedVR2 branch in comfy/model_detection.py also sets vid_out_norm = True, but this test never asserts it. That leaves the one 3B-specific runtime knob free to regress while the detection test still passes.
Suggested assertion
def test_seedvr2_3b_shared_mm_detection_config(self):
sd = _make_seedvr2_3b_shared_mm_sd()
unet_config = detect_unet_config(sd, "")
assert unet_config is not None
assert unet_config["image_model"] == "seedvr2"
assert unet_config["vid_dim"] == 2560
assert unet_config["heads"] == 20
assert unet_config["num_layers"] == 32
assert unet_config["mlp_type"] == "swiglu"
+ assert unet_config["vid_out_norm"] is True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests-unit/comfy_test/model_detection_test.py` around lines 174 - 184, Add an
assertion in test_seedvr2_3b_shared_mm_detection_config to lock the 3B-only
vid_out_norm flag: after calling detect_unet_config(sd, "") and validating
unet_config fields, assert that unet_config["vid_out_norm"] is True so the
3B-specific runtime knob (vid_out_norm) is covered; locate the test function
test_seedvr2_3b_shared_mm_detection_config and the detect_unet_config result to
add this check.
Unable to load SeedVR2 3B in 16-bit, fp8 only (log)[ERROR] !!! Exception during processing !!! expected mat1 and mat2 to have the same dtype, but got: float != c10::Half tl:dr: RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half |
|
@10902940129041 thanks for the report.
I tried to reproduce this on current PR head using the SeedVR2 3B fp16 checkpoint with the PR image workflow, but I could not reproduce the dtype error. I also tested the progressive sampler path, and that completed successfully as well. Your traceback shows native Could you share:
That should be enough for me to compare your run against the reproduction here and narrow down the dtype mismatch. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/model_base.py`:
- Around line 1760-1769: The code does not guard against empty reference_latents
lists before attempting to concatenate. When reference_latents is an empty list
(which is not None), the ordered list becomes empty, the loop does not execute,
and torch.cat(stacked, dim=2) will throw an error. Add a check to verify that
reference_latents has at least one element after the initial None check, and
only proceed with processing the latents and building the stacked concatenation
if the list is non-empty. This guard should be placed immediately after the
reference_latents is not None check and before the ordered list is constructed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 4893c8f1-80c3-4f76-ada4-c53f2a4b9d94
📒 Files selected for processing (5)
comfy/model_base.pycomfy/model_detection.pycomfy/sd.pycomfy/supported_models.pynodes.py
🚧 Files skipped from review as they are similar to previous changes (4)
- nodes.py
- comfy/model_detection.py
- comfy/supported_models.py
- comfy/sd.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/model_base.py`:
- Around line 1760-1769: The code does not guard against empty reference_latents
lists before attempting to concatenate. When reference_latents is an empty list
(which is not None), the ordered list becomes empty, the loop does not execute,
and torch.cat(stacked, dim=2) will throw an error. Add a check to verify that
reference_latents has at least one element after the initial None check, and
only proceed with processing the latents and building the stacked concatenation
if the list is non-empty. This guard should be placed immediately after the
reference_latents is not None check and before the ordered list is constructed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 4893c8f1-80c3-4f76-ada4-c53f2a4b9d94
📒 Files selected for processing (5)
comfy/model_base.pycomfy/model_detection.pycomfy/sd.pycomfy/supported_models.pynodes.py
🚧 Files skipped from review as they are similar to previous changes (4)
- nodes.py
- comfy/model_detection.py
- comfy/supported_models.py
- comfy/sd.py
🛑 Comments failed to post (1)
comfy/model_base.py (1)
1760-1769:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard empty
reference_latentsbefore concatenation.If
reference_latentsis an empty list,stackedstays empty andtorch.cat(stacked, dim=2)throws at runtime. Please add a non-empty check before buildingreference_latent.Suggested fix
reference_latents = kwargs.get("reference_latents", None) - if reference_latents is not None: + if reference_latents is not None and len(reference_latents) > 0: # SCAIL-2 multi-reference: reference_latents[0] is the primary ref, [1:] are additional # references. Stack as [additional..., primary] so the primary stays adjacent to the video. ordered = list(reference_latents[1:]) + list(reference_latents[:1]) stacked = [] for lat in ordered: lat = self.process_latent_in(lat) stacked.append(torch.cat([lat, torch.ones_like(lat[:, :4])], dim=1)) out['reference_latent'] = comfy.conds.CONDRegular(torch.cat(stacked, dim=2))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.reference_latents = kwargs.get("reference_latents", None) if reference_latents is not None and len(reference_latents) > 0: # SCAIL-2 multi-reference: reference_latents[0] is the primary ref, [1:] are additional # references. Stack as [additional..., primary] so the primary stays adjacent to the video. ordered = list(reference_latents[1:]) + list(reference_latents[:1]) stacked = [] for lat in ordered: lat = self.process_latent_in(lat) stacked.append(torch.cat([lat, torch.ones_like(lat[:, :4])], dim=1)) out['reference_latent'] = comfy.conds.CONDRegular(torch.cat(stacked, dim=2))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/model_base.py` around lines 1760 - 1769, The code does not guard against empty reference_latents lists before attempting to concatenate. When reference_latents is an empty list (which is not None), the ordered list becomes empty, the loop does not execute, and torch.cat(stacked, dim=2) will throw an error. Add a check to verify that reference_latents has at least one element after the initial None check, and only proceed with processing the latents and building the stacked concatenation if the list is non-empty. This guard should be placed immediately after the reference_latents is not None check and before the ordered list is constructed.Source: Coding guidelines
Summary
Resubmission of #14110, reverted in #14359 pending cleanup.
Changes since #14110
VAEDecodeTiled/VAEEncodeTiledpublic controls back to master semanticscomfy/ldm/modules/attention.pyintocomfy/ldm/seedvr/attention.pycomfy/sd.pyinto the VAE wrapper behind generic hooks; no model-specific branches remain in corecomfy/ldm/seedvr/andcomfy_extras/nodes_seedvr.pythe diff is registration and detection glue, andnodes.pycarries one registration lineExamples
4x image upscale, 7B fp16. The input below is a 1024x1024 original taken down to 256x256 with the same second-order Real-ESRGAN degradation pipeline the SeedVR2 paper evaluates against: two rounds of random blur, rescaling, gaussian/poisson noise, and JPEG, plus H.264/MPEG-4 compression, at a fixed seed. The exact script is in the gist for degrading your own test inputs the same way.
7B fp16 model, 4x upscale
Output file, full ComfyUI workflow embedded
7B sharp fp8 model, 4x upscale
Output file, full ComfyUI workflow embedded
In each input cell the red box holds the actual input pixels at native size; the rest of the cell is that same image zoomed 4x nearest to output scale, which is what the model starts from.
4x video upscale. Two clips, 256x256 and 97 frames each, degraded from 1024x1024 originals through the same pipeline, video-compression stages included. Same red-box convention on the input panels; panels are animated WebP display transcodes, full-fidelity videos linked under each set.
3B fp8 model, 4x upscale, 3-chunk temporal blend (clip 1)
Full videos: reference | degraded input | SeedVR2 output, full ComfyUI workflow embedded
7B sharp fp16 model, 4x upscale, auto chunking (clip 2)
Full videos: reference | degraded input | SeedVR2 output, full ComfyUI workflow embedded
Degradation provenance
The test image/video inputs are not simple bicubic downscales. Each input was produced from its reference with a published second-order degradation pipeline (Real-ESRGAN / RealBasicVSR) consisting of unsharp masking, two stages of random blur (iso, aniso, generalized, plateau, sinc kernels), random rescaling (bilinear, area, bicubic), gaussian or poisson noise, and JPEG compression, with video codec compression (h264, mpeg4) supplying the temporal degradation for video sources. This follows the SeedVR2 paper's evaluation protocol (https://arxiv.org/abs/2506.05301). The exact script is included in the gist (
degrade_uav.py) so reviewers can degrade any image or video the same way for independent testing. The sources shown are roughly 20% of the overall images and videos used for evaluation purposes, with another 50% taken directly from SeedVR2's original evaluation sources.Workflows
Example workflows are available here:
https://gist.github.com/pollockjj/8d40c875e9eacb9b560709faef4ea31f
Both workflows are generated against this PR's node schemas; the image workflow was executed end-to-end at this PR's head.
Models
https://huggingface.co/Comfy-Org/SeedVR2
Repackaged model files with conditioning embedded so only one model file is needed.
Prior art
Tracking
Linear: https://linear.app/comfyorg/issue/CORE-6/support-seedvr2
Validation
tests-unit/comfy_test+tests-unit/comfy_extras_test: 153 passed.