self-hosted/ai
§01·recipe · video

LTX Video 2.3 on RX 7900 XTX: 22B Audio-Video on ROCm via GGUF + the --disable-pinned-memory Load-Stall Fix

videoadvanced24GB+ VRAMJun 17, 2026

This advanced recipe sets up LTX Video 2.3 on the RX 7900 XTX, needing about 24 GB of VRAM.

models
tools
prerequisites
  • AMD Radeon RX 7900 XTX (24 GB VRAM, RDNA3 / Navi 31 / gfx1100) or equivalent ROCm-supported card
  • Linux (Ubuntu 24.04 / 22.04 or RHEL) with the AMD ROCm stack installed (ROCm 7.2.x)
  • 64 GB system RAM strongly recommended (the Gemma 3 12B text encoder is offloaded to RAM, and ROCm's aggressive RAM offload needs the headroom)
  • Python 3.12+ and PyTorch built for ROCm (not CUDA)
  • ComfyUI installed (latest version) + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes
  • ~30 GB free disk for the Q6_K distilled transformer + connectors + Gemma encoder + dual VAEs

What You'll Build

Generate short, synchronized audio + video clips locally with LTX Video 2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 24 GB Radeon RX 7900 XTX (RDNA3, Navi 31, gfx1100) through the ROCm stack. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." This recipe runs a GGUF transformer with the heavy Gemma 3 12B text encoder offloaded to system RAM, and — critically — works around a documented ROCm load-stall that hangs LTX-2.3 on the 7900 XTX at the "Requested to load LTXAV" step.

Hardware data: RX 7900 XTX (24GB VRAM) · Q6_K distilled GGUF + CPU-offloaded Gemma 3 12B · ComfyUI on ROCm 7.2 · See benchmark data

⚠️ This is a ROCm recipe, not CUDA. The RX 7900 XTX runs on AMD's ROCm/HIP stack — there is no cu124/cu128 wheel, no xformers install, and no FP8/FP4 path here. RDNA3 has no FP8/FP4 hardware (its WMMA units accept FP16, BF16, INT8, INT4 only), so an FP8 LTX checkpoint would just upcast to FP16/BF16 with no memory saving and no compute acceleration. The attention path is PyTorch SDPA (ComfyUI's default), not FlashAttention-2 and not xformers. If a guide tells you to pip install xformers, build a flash-attn wheel, or pick a cu12x wheel for this card, it's written for the wrong vendor.

⚠️ At 24 GB, VRAM is not the binding constraint — ROCm memory management is. Unlike the 16 GB NVIDIA recipes for this model (which fight to fit), the 7900 XTX has plenty of room for a high-quality GGUF transformer. The real failure mode on this card is ROCm's pinned-memory / smart-memory / async-offload machinery stalling or OOM-ing during the large LTX weight load. An RX 7900 XTX owner running ROCm 7.2 reports LTX 2.3 "stalls during Requested to load LTXAV" and "always crashes with OOMs" on a clean launch, fixed only by disabling those features (ComfyUI Issue #13730). See Running for the exact launch flags.

Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT the older LTX-2 19B line (Lightricks/LTX-2) nor the LTX-Video 0.9.x family. The 7900 XTX load-stall is documented across both LTX-2 19B (Issue #11949) and LTX-2.3 (Issue #13730); the fix flags are identical because the root cause is ROCm's memory manager, not the model version.

ℹ️ License. The LTX-2.3 weights are released under the ltx-2-community-license-agreement (the model card's license: is other with license_name: ltx-2-community-license-agreement, license link) — not Apache-2.0. Read the license before any commercial use.

Requirements

ComponentMinimumTested
GPU24 GB VRAM, ROCm-supported AMD cardRX 7900 XTX (24 GB, RDNA3 / gfx1100)
RAM32 GB64 GB recommended (Gemma encoder + ROCm RAM offload)
Storage~30 GBQ6_K transformer (17.77 GB) + connectors + Gemma encoder + dual VAEs
DriverAMD ROCm 7.2.x on LinuxROCm 7.2 (Issue #13730 reporter)
SoftwareComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodesPython 3.12+, PyTorch (ROCm build)

The LTX-2.3 codebase "was tested with Python >=3.12 ... and supports PyTorch ~= 2.7" per the Lightricks/LTX-2.3 model card. The card's CUDA reference ("CUDA version >12.7") is for NVIDIA; on the 7900 XTX you install the ROCm PyTorch wheel instead (Step 2), and PyTorch's HIP runtime presents as the cuda device namespace.

Why not the full BF16 transformer on a 24 GB card? The distilled BF16 transformer is 42.04 GB on disk (unsloth/LTX-2.3-GGUF tree, ltx-2.3-22b-distilled-BF16.gguf = 42,035,412,000 bytes) — it does not fit resident in 24 GB. A 22B transformer in BF16 needs ~42 GB regardless of vendor, so even on AMD you run a high-bit GGUF quant. The difference from the 16 GB NVIDIA recipe is that at 24 GB you drop the tight Q4 squeeze and step up to a near-lossless Q6_K / Q8_0 quant for better quality.

Installation

1. Install ComfyUI

Per the ComfyUI README:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate

2. Install PyTorch for ROCm

The RX 7900 XTX (gfx1100) is an officially ROCm-supported GPU, so it uses the stable ROCm PyTorch wheel — not a CUDA wheel. Per the ComfyUI README "AMD GPUs (Linux)" section:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2

ℹ️ Verify the ROCm tag before you copy it. As of this writing the ComfyUI README pins rocm7.2 as the stable wheel — but the rocmX.Y tag moves over time (6.3 → 6.4 → 7.x). Read the current line in the live ComfyUI README before running. Confirm you got the ROCm build: python -c "import torch; print(torch.__version__)" should print a +rocm7.2-style suffix, and torch.cuda.is_available() returns True (ROCm masquerades as the cuda device namespace under HIP).

3. Install ComfyUI dependencies and the LTX-Video, GGUF, and KJNodes custom nodes

# core deps
pip install -r requirements.txt

cd custom_nodes

# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

# city96's GGUF loader — required for the quantized transformer
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

# Kijai's KJNodes — used by the recommended offload workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
cd ..

The unsloth/LTX-2.3-GGUF model card lists city96/ComfyUI-GGUF and kijai/ComfyUI-KJNodes for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler and example workflows.

4. Download a GGUF transformer (lead: Q6_K distilled, 17.77 GB)

The distilled checkpoint runs at 8 steps with CFG 1.0. With 24 GB you are not memory-bound the way a 16 GB card is — pick a high-bit quant for near-lossless quality. Q6_K (17.77 GB) is the recommended lead: it leaves clear headroom for the connectors and VAE-decode activations. Q5_K_S (15.25 GB) is a lighter option, and Q8_0 (22.76 GB) is the maximum-quality quant if you keep everything else off the GPU (it fills most of the card).

# Q6_K distilled (17.77 GB on disk) — recommended lead for 24 GB
huggingface-cli download unsloth/LTX-2.3-GGUF \
  distilled/ltx-2.3-22b-distilled-Q6_K.gguf \
  --local-dir ComfyUI/models/unet/

# OR Q8_0 distilled (22.76 GB) — max quality, fills most of the 24 GB card
# huggingface-cli download unsloth/LTX-2.3-GGUF \
#   distilled/ltx-2.3-22b-distilled-Q8_0.gguf \
#   --local-dir ComfyUI/models/unet/

Distilled-transformer GGUF file sizes (verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):

Quantunsloth distilled file sizeFit on 24 GB
Q4_K_S13.12 GBcomfortable (but lower quality than needed here)
Q5_K_S15.25 GBcomfortable
Q6_K17.77 GBrecommended lead — room for connectors + VAE
Q8_022.76 GBtight — fills most of the card
BF1642.04 GBdoes NOT fit (out of scope at 24 GB)

5. Download the embeddings connectors and the audio + video VAE

The GGUF transformer needs its matching text-projection connectors and the audio + video VAE:

huggingface-cli download unsloth/LTX-2.3-GGUF \
  text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
  vae/ltx-2.3-22b-distilled_video_vae.safetensors \
  vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
  --local-dir ComfyUI/models/

(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — verified live via the unsloth/LTX-2.3-GGUF tree API.)

6. Download a quantized Gemma 3 12B text encoder

LTX-2.3 uses Gemma 3 12B as its text encoder. Even though the 7900 XTX has 24 GB, the cleanest setup keeps the encoder off the GPU and lets the transformer own VRAM — and ROCm's RAM-offload path (Step "Running") handles the encoder on CPU. Download a quantized encoder:

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  mmproj-BF16.gguf \
  --local-dir ComfyUI/models/text_encoders/

The encoder file is 7.43 GB and the mmproj 0.85 GB (verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API).

Running

This is the part that breaks on the 7900 XTX if you launch ComfyUI clean. LTX-2.3 stalls at "Requested to load LTXAV" and then OOMs because ROCm's pinned-memory, smart-memory, and async-offload paths interact badly with the large LTX weight load. An RX 7900 XTX owner on ROCm 7.2 reports their only working configuration is to disable them (Issue #13730):

# Working launch on RX 7900 XTX / ROCm 7.2 — the load-stall fix
python main.py --disable-pinned-memory --disable-async-offload --disable-dynamic-vram

That reporter states plainly: "Without this config and a clean comfy launch it always crashes with OOMs." (Issue #13730).

The single most important flag is --disable-pinned-memory (ComfyUI cli_args.py: "Disable pinned memory use."). A second RX 7900 XTX owner running LTX-2 i2v confirms the same flag is load-bearing — without it the second generation crashes; with it a 1024×1024 run completes (they measured 114.22 s on the second run) (Issue #11949). AMD's own ComfyUI-on-Radeon install guide likewise documents: "If running on low-memory configs, try adding the --lowvram and --disable-pinned-memory parameters to the run command."

If the load-stall persists, add --disable-smart-memory (cli_args.py: "Force ComfyUI to agressively offload to regular ram instead of keeping models in vram when it can."), which forces the Gemma encoder and idle weights out to system RAM — this is why 64 GB RAM is recommended:

# If the stall persists, also force RAM offload
python main.py --disable-pinned-memory --disable-async-offload --disable-dynamic-vram --disable-smart-memory

Once running, open the browser UI (default http://127.0.0.1:8188) and load one of the example workflows shipped by the Lightricks node:

ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/

Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 4. Wire the Gemma 3 12B encoder through the GGUF text-encoder loader. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.

Recommended distilled settings

ParameterValueSource
Sampler steps8Distilled checkpoint default per Lightricks/LTX-2.3 model card
CFG1.0Same
Resolutionwidth & height divisible by 32Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32."
Frame countdivisible by 8 + 1 (e.g. 65, 97, 121)Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1."

Start small (e.g. 512×512, 65 frames) to confirm the load clears the ROCm stall before scaling resolution.

Results

  • Speed: Omitted. No RX-7900-XTX LTX-2.3 22B benchmark at a fixed configuration has been verified, and /check/ltx-video-2-3/rx-7900-xtx currently has no benchmark data. The single 7900-XTX timing in the wild (1024×1024 in 114.22 s, Issue #11949) is for the older LTX-2 19B model, not LTX-2.3 22B, and is a one-off second-run number — not a stable benchmark, so it is not quoted as this recipe's speed. If you've measured LTX-2.3 timings on a 7900 XTX, please contribute them so they land on /check/ltx-video-2-3/rx-7900-xtx.
  • VRAM usage: At 24 GB the Q6_K transformer (17.77 GB) sits resident with headroom for the connectors and VAE-decode activations, while the Gemma encoder runs from system RAM. VRAM is not the binding constraint on this card — the binding factor is ROCm's memory-management stall during the weight load (see Running / Troubleshooting). The min_vram_gb: 24 reflects the tested card and the high-bit-quant lead path; lighter quants (Q5_K_S, Q4_K_S) reduce the resident footprint further. See /check/ltx-video-2-3/rx-7900-xtx for any community-submitted measurement.
  • Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. At 24 GB you run Q6_K or Q8_0 rather than the 16 GB cards' Q4 — near-lossless versus the BF16 transformer. There is no FP8/FP4 path to consider on RDNA3; the GGUF integer quant is the memory-saving route, not FP8.

For the full benchmark data and other-GPU comparisons, see /check/ltx-video-2-3/rx-7900-xtx.

Troubleshooting

LTX-2.3 stalls at "Requested to load LTXAV" and then OOMs (the headline 7900 XTX trap)

This is the dominant 7900 XTX failure mode and it is not about running out of 24 GB — it is ROCm's memory manager mishandling the large LTX weight load. An RX 7900 XTX / ROCm 7.2 owner reports LTX 2.3 "stalls during Requested to load LTXAV" and that a clean launch "always crashes with OOMs" (Issue #13730). Fixes, in order:

  1. Launch with --disable-pinned-memory --disable-async-offload --disable-dynamic-vram — the exact working config from the Issue #13730 reporter. This alone clears the stall for most users.
  2. Add --disable-smart-memory to force idle weights and the Gemma encoder out to system RAM (cli_args.py: "Force ComfyUI to agressively offload to regular ram..."). Needs ample RAM (64 GB recommended).
  3. Confirm --disable-pinned-memory is present. It is the single most-cited flag for this class of failure on the 7900 XTX — a second owner confirms LTX crashes without it and runs with it (Issue #11949), and AMD's ComfyUI-Radeon guide recommends it for low-memory configs.

The same reporter notes LTX "works flawlessly on NVIDIA 3090" but needs these workarounds on AMD — this is a ROCm smart-memory/pinned-memory interaction, documented as the go-to fix for large video-model loads on RDNA3.

"Torch not compiled with CUDA enabled"

A CUDA build of PyTorch got installed instead of the ROCm build. Per the ComfyUI README troubleshooting note, uninstall and reinstall against the ROCm wheel index:

pip uninstall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2

torch.cuda.is_available() should return True even on AMD — ROCm presents under the cuda device namespace via HIP.

Do not install xformers or FlashAttention on this card

HF and ComfyUI guides written for NVIDIA frequently suggest pip install xformers or a FlashAttention wheel. On RDNA3 these are the wrong path: the ROCm xformers fork is limited (no FP32, head-dim ≤ 256), and consumer-card CK FlashAttention builds routinely fail on gfx1100. ComfyUI already routes attention through PyTorch SDPA on this stack — stick with the default. There is likewise no FP8 path here: RDNA3 has no FP8 hardware, so an FP8 LTX checkpoint upcasts to BF16/FP16 with no memory win. Use the GGUF integer quants for the memory savings instead.

Encoder is slow on CPU

Forcing the Gemma 3 12B encoder to RAM (via --disable-smart-memory / low-VRAM modes) makes the text-encode pass slow. The Lightricks node ships an "LTXV Audio Text Encoder Loader" that a community user reports loads Gemma "8x times faster then normal loader" (sic) (ComfyUI-LTXVideo Issue #303 comment). Swap the default Gemma loader for it where available.

Audio-video output not synchronized

LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.

common questions
How much VRAM does LTX Video 2.3 need?

About 24 GB — the minimum this recipe targets.

Which GPUs is LTX Video 2.3 tested on?

RX 7900 XTX (24 GB).

How hard is this setup?

Advanced — follow the steps above.