Skip to content

Add SeedVR2 support (CORE-6)#14424

Open
pollockjj wants to merge 8 commits into
Comfy-Org:masterfrom
pollockjj:seedvr2-native-support-v5
Open

Add SeedVR2 support (CORE-6)#14424
pollockjj wants to merge 8 commits into
Comfy-Org:masterfrom
pollockjj:seedvr2-native-support-v5

Conversation

@pollockjj

@pollockjj pollockjj commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add native SeedVR2 model detection, latent format, dtype policy, and VAE support.
  • Add SeedVR2 conditioning, progressive sampler, post-processing, and workflow-node support.
  • Add SeedVR2 attention, VAE tiling, temporal chunking, and regression coverage.

Resubmission of #14110, reverted in #14359 pending cleanup.

Changes since #14110

  • Removed 826 lines of the 7,383 the reverted PR added:
    • VAE cleanup:
      • training and gradient-checkpointing paths
      • diffusers-reference machinery with no inference path: distribution sampling, weight inflation, conv-transpose variants, unused resnet options
      • constructor-dead generality: write-only attention options, quant convs, unreachable resampling and pooling branches, and the slicing, padding, and pre-norm plumbing
      • stock tiled VAE behavior restored: VAEDecodeTiled/VAEEncodeTiled public controls back to master semantics
    • the legacy ByteDance sampling-schedule path in the nodes (timestep shift transform and its constants), replaced by Comfy's native samplers
    • dead DiT and rotary-embedding options no released checkpoint exercises (xpos, custom frequencies, shared qkv/mlp, temporal windowing, block-type plumbing), plus dead initialization and padding fallbacks
    • the static-audit tests that covered the removed options; the suite now carries contract coverage only (model detection, dtype policy, tiling dispatch, node boundaries, sampler chunking)
  • Relocated about 180 lines to minimize SeedVR2's footprint in shared ComfyUI code:
    • the variable-length attention path moved out of comfy/ldm/modules/attention.py into comfy/ldm/seedvr/attention.py
    • SeedVR2 tiling and temporal handling moved out of comfy/sd.py into the VAE wrapper behind generic hooks; no model-specific branches remain in core
    • net effect: outside comfy/ldm/seedvr/ and comfy_extras/nodes_seedvr.py the diff is registration and detection glue, and nodes.py carries one registration line
  • Validated output against the previously released Add SeedVR2 support (CORE-6) #14110 baseline: 36.9–47.6 dB PSNR across 3B fp8, 7B fp16, 7B sharp fp8, and 7B sharp fp16, on image, single-chunk video, and multi-chunk video; outputs at this head are byte-identical across two independent runs each.

Examples

4x image upscale, 7B fp16. The input below is a 1024x1024 original taken down to 256x256 with the same second-order Real-ESRGAN degradation pipeline the SeedVR2 paper evaluates against: two rounds of random blur, rescaling, gaussian/poisson noise, and JPEG, plus H.264/MPEG-4 compression, at a fixed seed. The exact script is in the gist for degrading your own test inputs the same way.

7B fp16 model, 4x upscale

reference, degraded input, SeedVR2 7B fp16 restored

Output file, full ComfyUI workflow embedded

7B sharp fp8 model, 4x upscale

reference, degraded input, SeedVR2 7B sharp fp8 restored

Output file, full ComfyUI workflow embedded

In each input cell the red box holds the actual input pixels at native size; the rest of the cell is that same image zoomed 4x nearest to output scale, which is what the model starts from.

4x video upscale. Two clips, 256x256 and 97 frames each, degraded from 1024x1024 originals through the same pipeline, video-compression stages included. Same red-box convention on the input panels; panels are animated WebP display transcodes, full-fidelity videos linked under each set.

3B fp8 model, 4x upscale, 3-chunk temporal blend (clip 1)

Full videos: reference | degraded input | SeedVR2 output, full ComfyUI workflow embedded

7B sharp fp16 model, 4x upscale, auto chunking (clip 2)

Full videos: reference | degraded input | SeedVR2 output, full ComfyUI workflow embedded

Degradation provenance

The test image/video inputs are not simple bicubic downscales. Each input was produced from its reference with a published second-order degradation pipeline (Real-ESRGAN / RealBasicVSR) consisting of unsharp masking, two stages of random blur (iso, aniso, generalized, plateau, sinc kernels), random rescaling (bilinear, area, bicubic), gaussian or poisson noise, and JPEG compression, with video codec compression (h264, mpeg4) supplying the temporal degradation for video sources. This follows the SeedVR2 paper's evaluation protocol (https://arxiv.org/abs/2506.05301). The exact script is included in the gist (degrade_uav.py) so reviewers can degrade any image or video the same way for independent testing. The sources shown are roughly 20% of the overall images and videos used for evaluation purposes, with another 50% taken directly from SeedVR2's original evaluation sources.

Workflows

Example workflows are available here:
https://gist.github.com/pollockjj/8d40c875e9eacb9b560709faef4ea31f

Both workflows are generated against this PR's node schemas; the image workflow was executed end-to-end at this PR's head.

Models

https://huggingface.co/Comfy-Org/SeedVR2

Repackaged model files with conditioning embedded so only one model file is needed.

Prior art

Tracking

Linear: https://linear.app/comfyorg/issue/CORE-6/support-seedvr2

Validation

  • SeedVR2 unit suite: 45 passed. Full tests-unit/comfy_test + tests-unit/comfy_extras_test: 153 passed.
  • Local SeedVR2 image and video workflows completed successfully on 3B fp16, 7B fp16, and 7B sharp fp8.
  • Image and video workflows at this PR's head are pixel-identical across two independent runs (two image, two video, four model variants).

Copilot AI review requested due to automatic review settings June 12, 2026 00:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request integrates SeedVR2 as a first-class model family in ComfyUI, including model detection/configuration, native latent/conditioning handling, SeedVR2-specific VAE wrapper + tiling, and extra workflow nodes (preprocess/conditioning/progressive sampling/post-processing), with accompanying unit/regression coverage.

Changes:

  • Adds SeedVR2 model detection, config, latent format, dtype policy, and model base wiring.
  • Introduces SeedVR2 native VAE implementation/wrapper (including tiling + temporal/memory-state behavior) and SeedVR2 variable-length attention backend.
  • Adds SeedVR2 extra nodes (preprocess/conditioning/progressive sampler/post-processing) and a broad unit/regression test suite.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
comfy/model_detection.py Adds SeedVR2 UNet config detection heuristics (3B/7B layouts).
comfy/supported_models.py Registers SeedVR2 supported model + dtype policy hook.
comfy/supported_models_base.py Extends set_inference_dtype API to optionally accept a device.
comfy/model_base.py Adds SeedVR2 BaseModel wrapper and passes SeedVR2 conditioning through extra_conds.
comfy/latent_formats.py Introduces SeedVR2 latent format (16-channel latent).
comfy/sd.py Adds SeedVR2 VAE detection/wiring and “owned tiling” hooks for VAEs that handle their own tiling.
nodes.py Registers nodes_seedvr.py in built-in extra nodes loading.
comfy/ldm/seedvr/constants.py SeedVR2/ByteDance/published-standards constants used across the integration.
comfy/ldm/seedvr/model.py Implements SeedVR2 NaDiT model and related utilities (windowing, RoPE, attention plumbing).
comfy/ldm/seedvr/attention.py Adds SeedVR2 split-loop var-attention backend.
comfy/ldm/seedvr/vae.py Implements SeedVR2 VAE + wrapper, causal memory-state handling, and tiled encode/decode.
comfy/ldm/seedvr/color_fix.py Adds LAB/wavelet/AdaIN post-processing color transfer utilities.
comfy_extras/nodes_seedvr.py Adds SeedVR2 preprocess, conditioning, progressive sampler, and post-processing nodes.
tests-unit/comfy_test/model_detection_test.py Adds SeedVR2 model detection coverage.
tests-unit/comfy_test/test_seedvr2_model.py Consolidated regression tests for model/rope/latent format and VAE graph boundaries.
tests-unit/comfy_test/test_seedvr2_dtype.py Tests SeedVR2 dtype policy + conditioning behavior.
tests-unit/comfy_test/test_seedvr2_internals.py Tests GroupNorm limit gating and SeedVR2 optimized var-attention contract.
tests-unit/comfy_test/test_seedvr2_vae_decode.py Tests SeedVR2 VAE wrapper decode shape/rank contracts.
tests-unit/comfy_test/test_seedvr2_vae_tiled.py Tests SeedVR2 VAE tiling/dispatch behavior (encode/decode fallback routing).
tests-unit/comfy_test/test_seedvr_progressive_sampler.py Tests progressive sampler schema + chunking/validation behavior.
tests-unit/comfy_test/seedvr_vae_forward_test.py Regression test for SeedVR VAE forward contract (no diffusers-style attributes).
tests-unit/comfy_extras_test/test_seedvr2_nodes.py Verifies node schema/execute signature alignment for SeedVR2 nodes.
tests-unit/comfy_extras_test/test_seedvr2_conditioning.py Tests SeedVR2 conditioning node output determinism + fail-loud behavior.
tests-unit/comfy_extras_test/test_seedvr2_post_processing.py Tests post-processing schema and error messaging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread comfy/ldm/seedvr/vae.py
Comment on lines +136 to +139
if encode:
out = model.encode(t_chunk)[0]
else:
out = model.decode_(t_chunk)
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4893c8f1-80c3-4f76-ada4-c53f2a4b9d94

📥 Commits

Reviewing files that changed from the base of the PR and between cfb9c31 and ad04a61.

📒 Files selected for processing (5)
  • comfy/model_base.py
  • comfy/model_detection.py
  • comfy/sd.py
  • comfy/supported_models.py
  • nodes.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • nodes.py
  • comfy/model_detection.py
  • comfy/supported_models.py
  • comfy/sd.py

📝 Walkthrough

Walkthrough

This PR introduces SeedVR2, a complete video diffusion model architecture for ComfyUI. It adds a multimodal transformer (NaDiT) supporting video and text conditioning with windowed attention and rotary embeddings, a 3D causal VAE for video encoding/decoding with temporal memory management and overlap-aware tiling, color correction utilities using wavelet decomposition and CIELAB histogram matching, checkpoint detection for multiple model variants (7B separate/shared, 3B), Comfy workflow nodes for preprocessing, conditioning resolution, postprocessing with color correction modes, and progressive temporal chunk sampling with VRAM-aware auto-sizing, plus comprehensive regression tests validating the model pipeline and individual components.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding SeedVR2 support, with the linear ticket reference providing proper context.
Description check ✅ Passed The description comprehensively documents the changes, validation results, examples, and references, directly relating to the SeedVR2 support implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_seedvr.py`:
- Around line 284-286: The code currently crops odd-sized tensors by computing
h2/w2 and slicing output (variables h2, w2, output) which discards the last
row/column; instead preserve the original height/width (use h =
output.shape[-3], w = output.shape[-2]) and stop slicing away pixels. If
downstream code requires even dimensions, change the behavior to pad to the next
even size (compute pad_h = 0 if h%2==0 else 1, pad_w = 0 if w%2==0 else 1) and
apply symmetric zero/transparent padding to output (and the alpha plane) rather
than trimming; track the original h/w so you can remove padding only if and when
needed, ensuring alignment logic and alpha restore operate on the original
resized canvas rather than a cropped one.

In `@comfy/ldm/modules/diffusionmodules/model.py`:
- Around line 35-37: Guard the new downscale_freq_shift before it's used to
avoid division by zero/negative scale: check that downscale_freq_shift is an
integer >= 0 and strictly less than half_dim (where half_dim = embedding_dim //
2) so that (half_dim - downscale_freq_shift) > 0; if the check fails, raise a
clear ValueError explaining that downscale_freq_shift must be in [0,
half_dim-1]. Apply this validation immediately before the existing emb
computation that uses emb = math.log(10000) / (half_dim - downscale_freq_shift)
so the function (timestep embedding helper) fails fast instead of producing
invalid tensors.

In `@comfy/ldm/seedvr/color_fix.py`:
- Around line 36-45: The function wavelet_decomposition can return an unbound
low_freq when levels==0; to fix, initialize low_freq before the loop (e.g., set
low_freq = image or low_freq = image.clone()) so low_freq is always defined;
keep high_freq as torch.zeros_like(image) and then run the existing loop that
uses wavelet_blur, high_freq.add_(...), and image = low_freq—this ensures the
final return (high_freq, low_freq) is safe even for levels=0.

In `@comfy/ldm/seedvr/model.py`:
- Around line 1274-1275: The window_method construction uses num_layers // 2 *
["720pwin_by_size_bysize","720pswin_by_size_bysize"] which yields one element
short for odd num_layers and causes indexing errors; update window_method so it
produces exactly num_layers entries (e.g., repeat the two-item pattern and
append the first item when num_layers is odd) and replace the same pattern
wherever window_method is built in this file (the other occurrence noted in the
review). Ensure the rest of the constructor variables (txt_dim, vid_dim) remain
unchanged.

In `@comfy/ldm/seedvr/vae.py`:
- Around line 255-264: The process-global _NORM_LIMIT and setter
set_norm_limit() cause cross-request interference; change the design so memory
limits are instance-scoped: remove reliance on the global _NORM_LIMIT in
GroupNorm-related logic and add a per-wrapper/per-model attribute (e.g., an
instance field on VideoAutoencoderKLWrapper and any GroupNorm-wrapping classes)
that set_memory_limit() updates, then flow that instance-specific limit into the
chunking/forward logic instead of calling set_norm_limit(); ensure
GroupNorm/chunking functions accept the limit as an argument or read it from the
instance so concurrent VAEs don’t mutate a shared global.

In `@comfy/sd.py`:
- Around line 1235-1237: The generic post-format hook here (formatter =
getattr(self.first_stage_model, "comfy_format_encoded", None); samples =
formatter(samples)) must not turn SeedVR2 single-frame 4-D latents into 5-D
tensors; update the hook call or its result handling so that if
comfy_format_encoded (or any formatter) returns a 5-D tensor with a temporal
axis of length 1 (ndim == 5 and shape[2] == 1) and the model/instance expects
latent_dimensions == 2 (SeedVR2), you squeeze the temporal axis back to a 4-D
layout (B, C, H, W). Reference comfy_format_encoded /
VideoAutoencoderKLWrapper.comfy_format_encoded and VAE.encode()/encode_tiled()
to implement this check-and-squeeze immediately after samples =
formatter(samples) (or alternatively restrict formatters to value-only
transforms), so downstream code continues to receive 2-D/4-D SeedVR2 latents.

In `@tests-unit/comfy_test/seedvr_vae_forward_test.py`:
- Around line 23-28: The test module mutates the global CLI flag by setting
cli_args.cpu based on torch.cuda.is_available(), causing cross-module state
leakage; update the module to save the original value of cli_args.cpu before
changing it and restore it after the test module runs (use a module-level
teardown or an autouse pytest fixture) so other tests aren't
affected—specifically wrap the change to cli_args.cpu (the symbol referenced in
the import) with save/restore logic or perform the toggle inside a fixture that
yields and resets the original value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1bf2b866-576d-4b0b-b629-992a88010ac3

📥 Commits

Reviewing files that changed from the base of the PR and between 10d466b and 019f24f.

📒 Files selected for processing (25)
  • comfy/latent_formats.py
  • comfy/ldm/modules/diffusionmodules/model.py
  • comfy/ldm/seedvr/attention.py
  • comfy/ldm/seedvr/color_fix.py
  • comfy/ldm/seedvr/constants.py
  • comfy/ldm/seedvr/model.py
  • comfy/ldm/seedvr/vae.py
  • comfy/model_base.py
  • comfy/model_detection.py
  • comfy/sd.py
  • comfy/supported_models.py
  • comfy/supported_models_base.py
  • comfy_extras/nodes_seedvr.py
  • nodes.py
  • tests-unit/comfy_extras_test/test_seedvr2_conditioning.py
  • tests-unit/comfy_extras_test/test_seedvr2_nodes.py
  • tests-unit/comfy_extras_test/test_seedvr2_post_processing.py
  • tests-unit/comfy_test/model_detection_test.py
  • tests-unit/comfy_test/seedvr_vae_forward_test.py
  • tests-unit/comfy_test/test_seedvr2_dtype.py
  • tests-unit/comfy_test/test_seedvr2_internals.py
  • tests-unit/comfy_test/test_seedvr2_model.py
  • tests-unit/comfy_test/test_seedvr2_vae_decode.py
  • tests-unit/comfy_test/test_seedvr2_vae_tiled.py
  • tests-unit/comfy_test/test_seedvr_progressive_sampler.py

Comment on lines +284 to +286
h2 = output.shape[-3] - (output.shape[-3] % 2)
w2 = output.shape[-2] - (output.shape[-2] % 2)
output = output[:, :, :h2, :w2, :]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't trim odd-sized outputs here.

Lines 284-286 unconditionally drop the last row/column from odd-sized results, so post-processing no longer preserves the original resized dimensions. That breaks the node’s stated alignment behavior and also shifts/restores alpha onto a smaller canvas for odd-width or odd-height inputs.

Proposed fix
-        h2 = output.shape[-3] - (output.shape[-3] % 2)
-        w2 = output.shape[-2] - (output.shape[-2] % 2)
-        output = output[:, :, :h2, :w2, :]
         if decoded_was_4d:
             output = output.reshape(-1, output.shape[-3], output.shape[-2], output.shape[-1])
         return io.NodeOutput(output)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
h2 = output.shape[-3] - (output.shape[-3] % 2)
w2 = output.shape[-2] - (output.shape[-2] % 2)
output = output[:, :, :h2, :w2, :]
if decoded_was_4d:
output = output.reshape(-1, output.shape[-3], output.shape[-2], output.shape[-1])
return io.NodeOutput(output)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_seedvr.py` around lines 284 - 286, The code currently
crops odd-sized tensors by computing h2/w2 and slicing output (variables h2, w2,
output) which discards the last row/column; instead preserve the original
height/width (use h = output.shape[-3], w = output.shape[-2]) and stop slicing
away pixels. If downstream code requires even dimensions, change the behavior to
pad to the next even size (compute pad_h = 0 if h%2==0 else 1, pad_w = 0 if
w%2==0 else 1) and apply symmetric zero/transparent padding to output (and the
alpha plane) rather than trimming; track the original h/w so you can remove
padding only if and when needed, ensuring alignment logic and alpha restore
operate on the original resized canvas rather than a cropped one.

Comment on lines 35 to 37
half_dim = embedding_dim // 2
emb = math.log(10000) / (half_dim - 1)
emb = math.log(10000) / (half_dim - downscale_freq_shift)
emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard the new frequency-shift parameter.

downscale_freq_shift now feeds the divisor directly. Values >= embedding_dim // 2 make the frequency scale undefined or inverted, so this helper can return broken timestep embeddings instead of failing fast.

Suggested guard
 def get_timestep_embedding(timesteps, embedding_dim, flip_sin_to_cos=False, downscale_freq_shift=1):
     assert len(timesteps.shape) == 1

     half_dim = embedding_dim // 2
+    if half_dim > 0 and downscale_freq_shift >= half_dim:
+        raise ValueError("downscale_freq_shift must be smaller than embedding_dim // 2")
     emb = math.log(10000) / (half_dim - downscale_freq_shift)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
half_dim = embedding_dim // 2
emb = math.log(10000) / (half_dim - 1)
emb = math.log(10000) / (half_dim - downscale_freq_shift)
emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb)
def get_timestep_embedding(timesteps, embedding_dim, flip_sin_to_cos=False, downscale_freq_shift=1):
assert len(timesteps.shape) == 1
half_dim = embedding_dim // 2
if half_dim > 0 and downscale_freq_shift >= half_dim:
raise ValueError("downscale_freq_shift must be smaller than embedding_dim // 2")
emb = math.log(10000) / (half_dim - downscale_freq_shift)
emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/modules/diffusionmodules/model.py` around lines 35 - 37, Guard the
new downscale_freq_shift before it's used to avoid division by zero/negative
scale: check that downscale_freq_shift is an integer >= 0 and strictly less than
half_dim (where half_dim = embedding_dim // 2) so that (half_dim -
downscale_freq_shift) > 0; if the check fails, raise a clear ValueError
explaining that downscale_freq_shift must be in [0, half_dim-1]. Apply this
validation immediately before the existing emb computation that uses emb =
math.log(10000) / (half_dim - downscale_freq_shift) so the function (timestep
embedding helper) fails fast instead of producing invalid tensors.

Comment on lines +36 to +45
def wavelet_decomposition(image: Tensor, levels: int = WAVELET_DECOMP_LEVELS):
high_freq = torch.zeros_like(image)

for i in range(levels):
radius = 2 ** i
low_freq = wavelet_blur(image, radius)
high_freq.add_(image).sub_(low_freq)
image = low_freq

return high_freq, low_freq

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

low_freq undefined when levels=0.

If levels=0 is passed, the loop body never executes and low_freq remains unbound, causing UnboundLocalError at the return statement. While the default is 5 and this is an internal function, a defensive initialization ensures robustness.

Proposed fix
 def wavelet_decomposition(image: Tensor, levels: int = WAVELET_DECOMP_LEVELS):
     high_freq = torch.zeros_like(image)
+    low_freq = image  # fallback when levels=0
 
     for i in range(levels):
         radius = 2 ** i
         low_freq = wavelet_blur(image, radius)
         high_freq.add_(image).sub_(low_freq)
         image = low_freq
 
     return high_freq, low_freq
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/seedvr/color_fix.py` around lines 36 - 45, The function
wavelet_decomposition can return an unbound low_freq when levels==0; to fix,
initialize low_freq before the loop (e.g., set low_freq = image or low_freq =
image.clone()) so low_freq is always defined; keep high_freq as
torch.zeros_like(image) and then run the existing loop that uses wavelet_blur,
high_freq.add_(...), and image = low_freq—this ensures the final return
(high_freq, low_freq) is safe even for levels=0.

Comment thread comfy/ldm/seedvr/model.py
Comment thread comfy/ldm/seedvr/vae.py
Comment thread comfy/ldm/seedvr/vae.py
Comment thread comfy/sd.py
Comment on lines +23 to +28
from comfy.cli_args import args as cli_args

if not torch.cuda.is_available():
cli_args.cpu = True

from comfy.ldm.seedvr.vae import VideoAutoencoderKL # noqa: E402

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid leaking args.cpu across test modules.

tests-unit/comfy_test/seedvr_vae_forward_test.py, tests-unit/comfy_test/test_seedvr2_internals.py, tests-unit/comfy_test/test_seedvr2_model.py, tests-unit/comfy_test/test_seedvr2_vae_decode.py, and tests-unit/comfy_test/test_seedvr2_vae_tiled.py all set the same process-global CLI flag during import and never restore it. The shared root cause is test-state leakage: once one module forces CPU mode, later modules in the same pytest worker can run under the wrong backend configuration. Save the previous value and restore it after each module (for example with module teardown or an autouse fixture).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests-unit/comfy_test/seedvr_vae_forward_test.py` around lines 23 - 28, The
test module mutates the global CLI flag by setting cli_args.cpu based on
torch.cuda.is_available(), causing cross-module state leakage; update the module
to save the original value of cli_args.cpu before changing it and restore it
after the test module runs (use a module-level teardown or an autouse pytest
fixture) so other tests aren't affected—specifically wrap the change to
cli_args.cpu (the symbol referenced in the import) with save/restore logic or
perform the toggle inside a fixture that yields and resets the original value.

@pollockjj pollockjj force-pushed the seedvr2-native-support-v5 branch from 019f24f to cfb9c31 Compare June 12, 2026 16:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy/sd.py`:
- Around line 1283-1297: The code is incorrectly truncating temporal frames by
slicing pixel_samples to maximum after converting through upscale/downscale;
instead compute any latent-aligned length needed (e.g. latent_max =
self.upscale_ratio[0](self.downscale_ratio[0](pixel_samples.shape[2]))) but do
NOT drop frames — pass the full pixel_samples to encode_tiled_3d and either pad
pixel_samples to latent_max or let encode_tiled_3d handle padding/alignment;
update the block around maximum/pixel_samples/encode_tiled_3d (symbols: maximum,
pixel_samples, encode_tiled_3d, encode_tiled) so temporal length is preserved
and tiling math uses padding/alignment rather than slicing.

In `@tests-unit/comfy_test/model_detection_test.py`:
- Around line 174-184: Add an assertion in
test_seedvr2_3b_shared_mm_detection_config to lock the 3B-only vid_out_norm
flag: after calling detect_unet_config(sd, "") and validating unet_config
fields, assert that unet_config["vid_out_norm"] is True so the 3B-specific
runtime knob (vid_out_norm) is covered; locate the test function
test_seedvr2_3b_shared_mm_detection_config and the detect_unet_config result to
add this check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d80d0a2-2f48-4281-a04a-3b6e7289df8a

📥 Commits

Reviewing files that changed from the base of the PR and between 019f24f and cfb9c31.

📒 Files selected for processing (25)
  • comfy/latent_formats.py
  • comfy/ldm/modules/diffusionmodules/model.py
  • comfy/ldm/seedvr/attention.py
  • comfy/ldm/seedvr/color_fix.py
  • comfy/ldm/seedvr/constants.py
  • comfy/ldm/seedvr/model.py
  • comfy/ldm/seedvr/vae.py
  • comfy/model_base.py
  • comfy/model_detection.py
  • comfy/sd.py
  • comfy/supported_models.py
  • comfy/supported_models_base.py
  • comfy_extras/nodes_seedvr.py
  • nodes.py
  • tests-unit/comfy_extras_test/test_seedvr2_conditioning.py
  • tests-unit/comfy_extras_test/test_seedvr2_nodes.py
  • tests-unit/comfy_extras_test/test_seedvr2_post_processing.py
  • tests-unit/comfy_test/model_detection_test.py
  • tests-unit/comfy_test/seedvr_vae_forward_test.py
  • tests-unit/comfy_test/test_seedvr2_dtype.py
  • tests-unit/comfy_test/test_seedvr2_internals.py
  • tests-unit/comfy_test/test_seedvr2_model.py
  • tests-unit/comfy_test/test_seedvr2_vae_decode.py
  • tests-unit/comfy_test/test_seedvr2_vae_tiled.py
  • tests-unit/comfy_test/test_seedvr_progressive_sampler.py
✅ Files skipped from review due to trivial changes (1)
  • comfy/latent_formats.py
🚧 Files skipped from review as they are similar to previous changes (22)
  • comfy/supported_models_base.py
  • nodes.py
  • comfy/ldm/modules/diffusionmodules/model.py
  • comfy/model_base.py
  • comfy/supported_models.py
  • tests-unit/comfy_test/test_seedvr2_internals.py
  • comfy/model_detection.py
  • tests-unit/comfy_test/test_seedvr2_model.py
  • comfy/ldm/seedvr/attention.py
  • tests-unit/comfy_extras_test/test_seedvr2_nodes.py
  • tests-unit/comfy_extras_test/test_seedvr2_conditioning.py
  • comfy/ldm/seedvr/color_fix.py
  • comfy/ldm/seedvr/constants.py
  • tests-unit/comfy_test/test_seedvr2_dtype.py
  • tests-unit/comfy_test/test_seedvr2_vae_decode.py
  • tests-unit/comfy_test/test_seedvr2_vae_tiled.py
  • tests-unit/comfy_extras_test/test_seedvr2_post_processing.py
  • comfy/ldm/seedvr/vae.py
  • tests-unit/comfy_test/seedvr_vae_forward_test.py
  • comfy_extras/nodes_seedvr.py
  • comfy/ldm/seedvr/model.py
  • tests-unit/comfy_test/test_seedvr_progressive_sampler.py

Comment thread comfy/sd.py
Comment on lines +174 to +184
def test_seedvr2_3b_shared_mm_detection_config(self):
sd = _make_seedvr2_3b_shared_mm_sd()
unet_config = detect_unet_config(sd, "")

assert unet_config is not None
assert unet_config["image_model"] == "seedvr2"
assert unet_config["vid_dim"] == 2560
assert unet_config["heads"] == 20
assert unet_config["num_layers"] == 32
assert unet_config["mlp_type"] == "swiglu"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Lock down the 3B-only vid_out_norm flag here.

The 3B SeedVR2 branch in comfy/model_detection.py also sets vid_out_norm = True, but this test never asserts it. That leaves the one 3B-specific runtime knob free to regress while the detection test still passes.

Suggested assertion
     def test_seedvr2_3b_shared_mm_detection_config(self):
         sd = _make_seedvr2_3b_shared_mm_sd()
         unet_config = detect_unet_config(sd, "")

         assert unet_config is not None
         assert unet_config["image_model"] == "seedvr2"
         assert unet_config["vid_dim"] == 2560
         assert unet_config["heads"] == 20
         assert unet_config["num_layers"] == 32
         assert unet_config["mlp_type"] == "swiglu"
+        assert unet_config["vid_out_norm"] is True
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests-unit/comfy_test/model_detection_test.py` around lines 174 - 184, Add an
assertion in test_seedvr2_3b_shared_mm_detection_config to lock the 3B-only
vid_out_norm flag: after calling detect_unet_config(sd, "") and validating
unet_config fields, assert that unet_config["vid_out_norm"] is True so the
3B-specific runtime knob (vid_out_norm) is covered; locate the test function
test_seedvr2_3b_shared_mm_detection_config and the detect_unet_config result to
add this check.

@10902940129041

Copy link
Copy Markdown
Unable to load SeedVR2 3B in 16-bit, fp8 only (log)

[ERROR] !!! Exception during processing !!! expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
[ERROR] Traceback (most recent call last):
File "/ComfyUI/execution.py", line 536, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 336, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 310, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "/ComfyUI/execution.py", line 298, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "/ComfyUI/nodes.py", line 1586, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/nodes.py", line 1550, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/sample.py", line 74, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1444, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1334, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1316, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/patcher_extension.py", line 113, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1254, in outer_sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1229, in inner_sample
samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/patcher_extension.py", line 113, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 999, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/k_diffusion/sampling.py", line 205, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 639, in call
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1202, in call
return self.outer_predict_noise(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1209, in outer_predict_noise
).execute(x, timestep, model_options, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/patcher_extension.py", line 113, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 1212, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 619, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 210, in calc_cond_batch
return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 218, in _calc_cond_batch_outer
return executor.execute(model, conds, x_in, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/patcher_extension.py", line 113, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/samplers.py", line 334, in calc_cond_batch
output = model.apply_model(input_x, timestep
, **c).chunk(batch_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/model_base.py", line 189, in apply_model
return comfy.patcher_extension.WrapperExecutor.new_class_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/patcher_extension.py", line 113, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/model_base.py", line 233, in _apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/ldm/seedvr/model.py", line 1462, in forward
vid, txt, vid_shape, txt_shape = block(
^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/ldm/seedvr/model.py", line 975, in forward
vid_attn, txt_attn = self.attn(vid_attn, txt_attn, vid_shape, txt_shape, cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/ldm/seedvr/model.py", line 735, in forward
vid_qkv, txt_qkv = self.proj_qkv(vid, txt)
^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/ldm/seedvr/model.py", line 620, in forward
vid = vid_module(vid, *get_args("vid", args), **get_kwargs("vid", kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy/ops.py", line 503, in forward
return super().forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/dist-packages/torch/nn/modules/linear.py", line 134, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half


tl:dr: RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

@pollockjj

Copy link
Copy Markdown
Contributor Author

@10902940129041 thanks for the report.

Unable to load SeedVR2 3B in 16-bit, fp8 only (log)

tl:dr: RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

I tried to reproduce this on current PR head using the SeedVR2 3B fp16 checkpoint with the PR image workflow, but I could not reproduce the dtype error. I also tested the progressive sampler path, and that completed successfully as well.

Your traceback shows native KSampler with Euler reaching a direct fp16 Linear, where the input is still float32. In my local run, that tensor is cast before SeedVR2 forward, so there is likely a difference in the workflow or runtime path that I am not covering yet.

Could you share:

  • the ComfyUI commit SHA shown by your install;
  • the workflow JSON that reproduces the crash;
  • the startup log header with the Python/Torch/CUDA/GPU lines.

That should be enough for me to compare your run against the reproduction here and narrow down the dtype mismatch.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy/model_base.py`:
- Around line 1760-1769: The code does not guard against empty reference_latents
lists before attempting to concatenate. When reference_latents is an empty list
(which is not None), the ordered list becomes empty, the loop does not execute,
and torch.cat(stacked, dim=2) will throw an error. Add a check to verify that
reference_latents has at least one element after the initial None check, and
only proceed with processing the latents and building the stacked concatenation
if the list is non-empty. This guard should be placed immediately after the
reference_latents is not None check and before the ordered list is constructed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4893c8f1-80c3-4f76-ada4-c53f2a4b9d94

📥 Commits

Reviewing files that changed from the base of the PR and between cfb9c31 and ad04a61.

📒 Files selected for processing (5)
  • comfy/model_base.py
  • comfy/model_detection.py
  • comfy/sd.py
  • comfy/supported_models.py
  • nodes.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • nodes.py
  • comfy/model_detection.py
  • comfy/supported_models.py
  • comfy/sd.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy/model_base.py`:
- Around line 1760-1769: The code does not guard against empty reference_latents
lists before attempting to concatenate. When reference_latents is an empty list
(which is not None), the ordered list becomes empty, the loop does not execute,
and torch.cat(stacked, dim=2) will throw an error. Add a check to verify that
reference_latents has at least one element after the initial None check, and
only proceed with processing the latents and building the stacked concatenation
if the list is non-empty. This guard should be placed immediately after the
reference_latents is not None check and before the ordered list is constructed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4893c8f1-80c3-4f76-ada4-c53f2a4b9d94

📥 Commits

Reviewing files that changed from the base of the PR and between cfb9c31 and ad04a61.

📒 Files selected for processing (5)
  • comfy/model_base.py
  • comfy/model_detection.py
  • comfy/sd.py
  • comfy/supported_models.py
  • nodes.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • nodes.py
  • comfy/model_detection.py
  • comfy/supported_models.py
  • comfy/sd.py
🛑 Comments failed to post (1)
comfy/model_base.py (1)

1760-1769: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard empty reference_latents before concatenation.

If reference_latents is an empty list, stacked stays empty and torch.cat(stacked, dim=2) throws at runtime. Please add a non-empty check before building reference_latent.

Suggested fix
         reference_latents = kwargs.get("reference_latents", None)
-        if reference_latents is not None:
+        if reference_latents is not None and len(reference_latents) > 0:
             # SCAIL-2 multi-reference: reference_latents[0] is the primary ref, [1:] are additional
             # references. Stack as [additional..., primary] so the primary stays adjacent to the video.
             ordered = list(reference_latents[1:]) + list(reference_latents[:1])
             stacked = []
             for lat in ordered:
                 lat = self.process_latent_in(lat)
                 stacked.append(torch.cat([lat, torch.ones_like(lat[:, :4])], dim=1))
             out['reference_latent'] = comfy.conds.CONDRegular(torch.cat(stacked, dim=2))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        reference_latents = kwargs.get("reference_latents", None)
        if reference_latents is not None and len(reference_latents) > 0:
            # SCAIL-2 multi-reference: reference_latents[0] is the primary ref, [1:] are additional
            # references. Stack as [additional..., primary] so the primary stays adjacent to the video.
            ordered = list(reference_latents[1:]) + list(reference_latents[:1])
            stacked = []
            for lat in ordered:
                lat = self.process_latent_in(lat)
                stacked.append(torch.cat([lat, torch.ones_like(lat[:, :4])], dim=1))
            out['reference_latent'] = comfy.conds.CONDRegular(torch.cat(stacked, dim=2))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_base.py` around lines 1760 - 1769, The code does not guard
against empty reference_latents lists before attempting to concatenate. When
reference_latents is an empty list (which is not None), the ordered list becomes
empty, the loop does not execute, and torch.cat(stacked, dim=2) will throw an
error. Add a check to verify that reference_latents has at least one element
after the initial None check, and only proceed with processing the latents and
building the stacked concatenation if the list is non-empty. This guard should
be placed immediately after the reference_latents is not None check and before
the ordered list is constructed.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants