Gate static pin_memory on the host-RAM budget (fixes AMD/ROCm large-model load stall, #13730) by liminfei-amd · Pull Request #14525 · Comfy-Org/ComfyUI

liminfei-amd · 2026-06-17T14:09:33Z

Overview

Fixes #13730. On AMD/ROCm a clean launch stalls at Requested to load LTXAV while system RAM fills and
spills to swap, even though VRAM sits at ~65%. It's host-side pinned-memory exhaustion, not VRAM.

Root cause

partially_load() pins every offloaded weight via the static path pin_memory, which:

ignores ensure_pin_registerable()'s result and unconditionally cudaHostRegisters, up to
MAX_PINNED_MEMORY = 0.90 * RAM (27 GiB on a 30 GiB box);
produces pins that free_registrations can only reclaim from is_dynamic() models — and dynamic VRAM
is off by default on AMD, so they're never reclaimable.

Page-locked RAM isn't swappable, so the loader exhausts RAM and thrashes. The dynamic path
(pinned_memory.py) already guards this with ensure_pin_budget / ensure_pin_registerable; only the
static path was missing it. (--high-ram bypasses the budget, so it doesn't help here.)

Change (+8 / −1, `comfy/model_management.py`)

Gate the static path on the host-RAM budget, like the dynamic path already does; skip pinning when it
can't be met (the weight stays in pageable RAM — correct, just not page-locked).

-    ensure_pin_registerable(size)
+    if not ensure_pin_budget(size) or not ensure_pin_registerable(size):
+        return False

Unchanged when RAM is ample or under --high-ram.

Validation — real server + `/prompt`, single-variable A/B

RX 7900 GRE 16 GiB (gfx1100), 30 GiB RAM, ROCm 7.2. Real ltx-2.3-22b-dev-fp8 + gemma_3_12B via a
real workflow POSTed to a clean-launch server's /prompt. Watchdog kills the server if RAM drops below
~1 GiB. Only the 8-line gate differs.

Without — min MemAvailable 0.7 GiB, swap 1.8 GiB, killed at 102s; never finished (the stall).

With — load completes past the stall point:

Requested to load LTXAV
loaded partially; 14685.35 MB loaded, 9151.30 MB offloaded
Prompt executed in 91.94 seconds

Reproduce it yourself (real workflow + driver)

Put ltx-2.3-22b-dev-fp8.safetensors in models/diffusion_models/ and
gemma_3_12B_it_fp4_mixed.safetensors in models/text_encoders/.

ltx_workflow.json (the real graph POSTed to /prompt):

{
  "1": {"class_type": "CLIPLoader", "inputs": {"clip_name": "gemma_3_12B_it_fp4_mixed.safetensors", "type": "ltxv"}},
  "2": {"class_type": "CLIPTextEncode", "inputs": {"clip": ["1", 0], "text": "a cat walking in a garden, cinematic"}},
  "3": {"class_type": "CLIPTextEncode", "inputs": {"clip": ["1", 0], "text": "blurry, low quality"}},
  "4": {"class_type": "UNETLoader", "inputs": {"unet_name": "ltx-2.3-22b-dev-fp8.safetensors", "weight_dtype": "default"}},
  "5": {"class_type": "EmptyLTXVLatentVideo", "inputs": {"width": 768, "height": 512, "length": 97, "batch_size": 1}},
  "6": {"class_type": "KSampler", "inputs": {"model": ["4", 0], "seed": 0, "steps": 1, "cfg": 3.0, "sampler_name": "euler", "scheduler": "normal", "positive": ["2", 0], "negative": ["3", 0], "latent_image": ["5", 0], "denoise": 1.0}},
  "7": {"class_type": "SaveLatent", "inputs": {"samples": ["6", 0], "filename_prefix": "ltx13730"}}
}

Driver — boots a clean-launch server, POSTs the workflow, and SIGKILLs if RAM drops below ~1 GiB so the
box can't freeze:

import subprocess, time, json, os, signal, sys, urllib.request, psutil
FLOOR, BOOT_TIMEOUT, RUN_TIMEOUT, PORT = 1.0, 180, 300, 8188
env = dict(os.environ, HIP_VISIBLE_DEVICES="0")
log = open("server.log", "w")
# clean launch = AMD default pinned ON / dynamic OFF
srv = subprocess.Popen([sys.executable, "main.py", "--listen", "127.0.0.1",
                        "--port", str(PORT), "--disable-auto-launch"],
                       env=env, stdout=log, stderr=subprocess.STDOUT)
mem = lambda: psutil.virtual_memory().available / 1024**3
# wait for the server
t0 = time.time()
while time.time() - t0 < BOOT_TIMEOUT:
    try:
        urllib.request.urlopen("http://127.0.0.1:%d/system_stats" % PORT, timeout=2); break
    except Exception:
        if srv.poll() is not None: sys.exit("server died at boot")
        time.sleep(1)
# POST the real workflow
wf = json.load(open("ltx_workflow.json"))
req = urllib.request.Request("http://127.0.0.1:%d/prompt" % PORT,
                             data=json.dumps({"prompt": wf}).encode(),
                             headers={"Content-Type": "application/json"})
pid = json.load(urllib.request.urlopen(req, timeout=15))["prompt_id"]
# watchdog RAM floor + poll /history
t1, mn, outcome = time.time(), 99, "?"
while True:
    a = mem(); mn = min(mn, a)
    if a < FLOOR:
        srv.send_signal(signal.SIGKILL); outcome = "RAM_FLOOR_KILL"; break
    if time.time() - t1 > RUN_TIMEOUT:
        srv.send_signal(signal.SIGKILL); outcome = "STALL_TIMEOUT"; break
    try:
        h = json.load(urllib.request.urlopen("http://127.0.0.1:%d/history/%s" % (PORT, pid), timeout=3))
        if pid in h and h[pid]["status"].get("completed"):
            outcome = "PROMPT_COMPLETED"; break
    except Exception:
        pass
    time.sleep(0.5)
print("OUTCOME=%s min_memavail=%.1fG elapsed=%.0fs" % (outcome, mn, time.time() - t1))
srv.send_signal(signal.SIGKILL)

Run python driver.py on baseline → RAM_FLOOR_KILL; apply the patch and rerun → PROMPT_COMPLETED
with Prompt executed in ... in server.log. (The minimal latent triggers an unrelated
sampling-stage shape error after the load — the point is the load now passes.)

A question

This is a minimal budget gate. A deeper option is to make the static pins reclaimable under
pressure (as the dynamic path's free_registrations / _steal_pin are) so the loader releases pins on
demand instead of declining new ones. Happy to go that way instead/in addition if you prefer.

Caveat: test board is GRE 16 GiB vs the reporter's XTX 24 GiB — same gfx1100/RDNA3/ROCm 7.2 and ~RAM;
smaller VRAM just triggers offload sooner.

AI usage disclosure: this change was prepared with AI assistance; a human reviewed and verified it and
can explain every line.

On AMD/ROCm a clean launch stalls at "Requested to load LTXAV" while system RAM fills and spills to swap, even though VRAM sits at ~65%. It is host-side pinned-memory exhaustion, not VRAM pressure. partially_load() pins every offloaded weight via the static path pin_memory, which ignores ensure_pin_registerable()'s result and unconditionally cudaHostRegisters up to MAX_PINNED_MEMORY (0.90*RAM on Linux). Those pins are only reclaimable from is_dynamic() models by free_registrations, and dynamic VRAM is off by default on AMD, so they are never reclaimable. Page-locked RAM is not swappable, so the loader exhausts RAM and thrashes. The dynamic-VRAM pin path (comfy/pinned_memory.py) already guards this with ensure_pin_budget/ensure_pin_registerable; only the static path was missing it. Gate pin_memory the same way and skip pinning when the budget cannot be met (the weight stays in pageable RAM, still correct, just not page-locked). Behavior is unchanged when RAM is ample and under --high-ram.

coderabbitai · 2026-06-17T14:13:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7a1cd199-75b9-4acf-a037-7e02fa745591

📥 Commits

Reviewing files that changed from the base of the PR and between a590d60 and 8d8e0c6.

📒 Files selected for processing (1)

comfy/model_management.py

📝 Walkthrough

Walkthrough

In comfy/model_management.py, the pin_memory() function gains two early-exit checks before it attempts CUDA host registration. Calls to ensure_pin_budget and ensure_pin_registerable are inserted at the top of the function; if either returns a falsy value, pin_memory() immediately returns False and leaves the tensor in pageable RAM without ever reaching cudaHostRegister.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: gating pin_memory on host-RAM budget to fix AMD/ROCm model load stalls, with the specific issue reference.
Description check	✅ Passed	The description thoroughly documents the root cause, the specific code change, validation methodology, and results on real hardware, all directly related to the changeset.
Linked Issues check	✅ Passed	The PR directly addresses issue `#13730` by implementing budget gates on the static pin_memory path to prevent host-RAM exhaustion during large model loading on AMD/ROCm systems.
Out of Scope Changes check	✅ Passed	The changeset is narrowly scoped to the identified root cause: adding budget/registerable checks to the static pin_memory path in comfy/model_management.py, with no extraneous modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

maxevilmind · 2026-06-21T13:44:28Z

hey, this actually makes quite a lot of sense, I will try to test on my setup when I have some free time

liminfei-amd requested review from Kosinkadink, alexisrolland, comfyanonymous, guill, kijai and rattus128 as code owners June 17, 2026 14:09

liminfei-amd mentioned this pull request Jun 18, 2026

LTX 2.3 FP8/Q4KM stalls during Requested to load LTXAV on RX 7900 XTX + ROCm unless dynamic VRAM / pinned memory / async offload are disabled #13730

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gate static pin_memory on the host-RAM budget (fixes AMD/ROCm large-model load stall, #13730)#14525

Gate static pin_memory on the host-RAM budget (fixes AMD/ROCm large-model load stall, #13730)#14525
liminfei-amd wants to merge 1 commit into
Comfy-Org:masterfrom
liminfei-amd:amd-rocm/13730-pin-budget-gate

liminfei-amd commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

maxevilmind commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liminfei-amd commented Jun 17, 2026

Overview

Root cause

Change (+8 / −1, comfy/model_management.py)

Validation — real server + /prompt, single-variable A/B

A question

Uh oh!

coderabbitai Bot commented Jun 17, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

maxevilmind commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Change (+8 / −1, `comfy/model_management.py`)

Validation — real server + `/prompt`, single-variable A/B