self-hosted/ai
§01·recipe · video

LTX Video 2.3 on RX 7800 XT: 22B Audio-Video at the 16 GB ROCm Floor via GGUF + CPU-Offloaded Gemma

videoadvanced16GB+ VRAMJun 19, 2026

This advanced recipe sets up LTX Video 2.3 on the RX 7800 XT, needing about 16 GB of VRAM.

models
tools
prerequisites
  • AMD Radeon RX 7800 XT (16 GB VRAM, RDNA3 / Navi 32 / gfx1101) or equivalent ROCm-supported card
  • Linux (Ubuntu 24.04 / 22.04 or RHEL) with the AMD ROCm stack installed (ROCm 7.2.x)
  • 64 GB system RAM strongly recommended (the Gemma 3 12B text encoder is offloaded to RAM, and ROCm's pinned-memory path needs the headroom)
  • Python 3.12+ and PyTorch built for ROCm (not CUDA)
  • ComfyUI installed (latest version) + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes
  • ~22 GB free disk for the Q4_K_S distilled transformer + connectors + Gemma encoder + dual VAEs

What You'll Build

Generate short, synchronized audio + video clips locally with LTX Video 2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 16 GB Radeon RX 7800 XT (RDNA3, Navi 32, gfx1101) through the ROCm stack. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." This recipe runs a distilled GGUF transformer with the heavy Gemma 3 12B text encoder offloaded to system RAM, and works around a documented ROCm load-stall that hangs LTX-2.3 on RDNA3 at the "Requested to load LTXAV" step.

Hardware data: RX 7800 XT (16 GB VRAM) · Q4_K_S distilled GGUF + CPU-offloaded Gemma 3 12B · ComfyUI on ROCm 7.2 · See benchmark data

⚠️ This is a ROCm recipe, not CUDA. The RX 7800 XT runs on AMD's ROCm/HIP stack — there is no cu124/cu128 wheel, no xformers install, and no FP8/FP4 path here. RDNA3 has no FP8/FP4 hardware (its WMMA units accept FP16, BF16, INT8, INT4 only), so an FP8 LTX checkpoint — or an FP8 Gemma encoder — would just upcast to FP16/BF16 with no memory saving and no compute acceleration. The attention path is PyTorch SDPA (ComfyUI's default), not FlashAttention-2 and not xformers. If a guide tells you to pip install xformers, build a flash-attn wheel, or pick a cu12x wheel for this card, it's written for the wrong vendor.

⚠️ At 16 GB, two constraints stack: the Gemma encoder AND ROCm memory management. Unlike the 24 GB RX 7900 XTX (which has room to step up to a Q6_K transformer), the 7800 XT must run a tight Q4-class GGUF transformer AND force the Gemma 3 12B text encoder onto the CPU. The encoder is the binding constraint: Gemma 3 12B "needs ~24-27GB VRAM to operate" and on a 16 GB card "makes LTX-2 unusable" even with FP8 models and all optimizations applied — leaving it on the GPU OOMs at "Peak Usage: 29068 MiB" (ComfyUI-LTXVideo Issue #303, filed on a 16 GB RTX 5080). On top of that, RDNA3 hits a ROCm load-stall during the LTX weight load. See Running for both fixes.

Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT the older LTX-2 19B line (Lightricks/LTX-2) nor the LTX-Video 0.9.x family. The Gemma-encoder OOM (Issue #303) and the RDNA3 load-stall (ComfyUI Issue #13730) are cited across both LTX-2 19B and LTX-2.3 because they share the same Gemma 3 12B encoder and the same ROCm memory manager — the model version doesn't change either failure.

ℹ️ License. The LTX-2.3 weights are released under the ltx-2-community-license-agreement (the model card's license: is other with license_name: ltx-2-community-license-agreement, license link) — not Apache-2.0. Read the license before any commercial use.

Requirements

ComponentMinimumTested
GPU16 GB VRAM, ROCm-supported AMD cardRX 7800 XT (16 GB, RDNA3 / gfx1101)
RAM32 GB64 GB recommended (Gemma encoder + ROCm RAM offload)
Storage~22 GBQ4_K_S transformer (13.12 GB) + connectors + Gemma encoder + dual VAEs
DriverAMD ROCm 7.2.x on LinuxROCm 7.2 (Issue #13730 reporter, on RDNA3)
SoftwareComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodesPython 3.12+, PyTorch (ROCm build)

The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7" per the Lightricks/LTX-2.3 model card. The card's CUDA reference is for NVIDIA; on the 7800 XT you install the ROCm PyTorch wheel instead (Step 2), and PyTorch's HIP runtime presents as the cuda device namespace.

The RX 7800 XT is an officially ROCm-supported card: the AMD ROCm install-on-linux system-requirements matrix lists AMD Radeon RX 7800 XT RDNA3 gfx1101 ✅ explicitly (distinct from the RX 7900-series, which is gfx1100). Because it is officially supported, no HSA_OVERRIDE_GFX_VERSION masquerade is needed — that environment variable is a legacy fallback for unsupported cards, not a step here.

Why a tight Q4 quant and not the full BF16 transformer? The distilled BF16 transformer is 42.04 GB on disk (unsloth/LTX-2.3-GGUF tree, ltx-2.3-22b-distilled-BF16.gguf = 42,035,412,000 bytes) — nowhere near 16 GB. A 22B transformer in BF16 needs ~42 GB regardless of vendor, so on AMD you run a GGUF integer quant. There is no FP8 escape hatch on RDNA3, so the memory savings come entirely from the GGUF integer tiers below.

Installation

1. Install ComfyUI

Per the ComfyUI README:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate

2. Install PyTorch for ROCm

The RX 7800 XT (gfx1101) is an officially ROCm-supported GPU, so it uses the stable ROCm PyTorch wheel — not a CUDA wheel. Per the ComfyUI README "AMD GPUs (Linux)" section:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2

ℹ️ Verify the ROCm tag before you copy it. As of this writing the ComfyUI README pins rocm7.2 as the stable wheel — but the rocmX.Y tag moves over time (6.3 → 6.4 → 7.x). Read the current line in the live ComfyUI README before running. Confirm you got the ROCm build: python -c "import torch; print(torch.__version__)" should print a +rocm7.2-style suffix, and torch.cuda.is_available() returns True (ROCm masquerades as the cuda device namespace under HIP).

3. Install ComfyUI dependencies and the LTX-Video, GGUF, and KJNodes custom nodes

# core deps
pip install -r requirements.txt

cd custom_nodes

# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

# city96's GGUF loader — required for the quantized transformer
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

# Kijai's KJNodes — used by the recommended offload workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
cd ..

The unsloth/LTX-2.3-GGUF model card provides the quantized transformer for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler and example workflows, and city96/ComfyUI-GGUF + kijai/ComfyUI-KJNodes provide the GGUF loader and offload nodes.

4. Download a distilled GGUF transformer that fits 16 GB

The distilled checkpoint runs at 8 steps with CFG 1.0 and is the right choice for a tight card. On 16 GB you want a transformer small enough to leave room for the connectors and VAE-decode activations once the Gemma encoder is on the CPU — Q4_K_S (13.12 GB) or, for more comfortable headroom, Q3_K_S (9.95 GB):

# Q4_K_S distilled (13.12 GB on disk) — tightest standard Q4 tier with full Q4 quality
huggingface-cli download unsloth/LTX-2.3-GGUF \
  distilled/ltx-2.3-22b-distilled-Q4_K_S.gguf \
  --local-dir ComfyUI/models/unet/

# OR Q3_K_S distilled (9.95 GB) — more headroom, small quality trade
# huggingface-cli download unsloth/LTX-2.3-GGUF \
#   distilled/ltx-2.3-22b-distilled-Q3_K_S.gguf \
#   --local-dir ComfyUI/models/unet/

Distilled-transformer GGUF file sizes (decimal GB, verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):

Quantunsloth distilled file sizeFit on 16 GB (Gemma on CPU)
Q2_K8.28 GBvery comfortable (lower quality)
Q3_K_S9.95 GBcomfortable — room for connectors + VAE
Q3_K_M10.77 GBcomfortable
Q4_K_S13.12 GBrecommended lead — full Q4 quality, tight but fits
Q4_K_M14.33 GBtighter
Q5_K_S15.25 GBvery tight — leaves little for VAE-decode
Q6_K17.77 GBdoes NOT fit on 16 GB (out of scope here)
BF1642.04 GBdoes NOT fit

If you prefer the Unsloth-Dynamic tiers, the matching files are distilled/ltx-2.3-22b-distilled-UD-Q4_K_S.gguf (14.09 GB) and distilled/ltx-2.3-22b-distilled-UD-Q3_K_S.gguf (11.20 GB) in the same folder — also 16 GB-safe. There is also a newer distilled-1.1 revision in the repo ("A different aesthetic experience and improved audio compared to v1.0" per the Lightricks/LTX-2.3 model card); either revision works and this recipe pins the original distilled/.

5. Download the embeddings connectors and the audio + video VAE

The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE:

huggingface-cli download unsloth/LTX-2.3-GGUF \
  text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
  vae/ltx-2.3-22b-distilled_video_vae.safetensors \
  vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
  --local-dir ComfyUI/models/

(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — decimal GB, verified live via the unsloth/LTX-2.3-GGUF tree API.)

6. Download a quantized Gemma 3 12B text encoder

LTX-2.3 uses Gemma 3 12B as its text encoder, and on a 16 GB card it MUST run on the CPU (Step "Running"). Download the GGUF QAT-Q4 encoder — the genuine memory-saver on RDNA3 (an FP8 Gemma file would just upcast to BF16 on this card with no benefit):

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  mmproj-BF16.gguf \
  --local-dir ComfyUI/models/text_encoders/

The encoder file is 7.43 GB and the mmproj 0.85 GB (decimal GB, verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API).

Running

Two things break on the RX 7800 XT if you launch ComfyUI clean: the Gemma encoder OOMs the 16 GB card if it lands on the GPU, and ROCm stalls during the LTX weight load. Both are fixed at launch.

1. Force the Gemma encoder onto the CPU. On 16 GB the encoder cannot stay on the GPU — use --lowvram (which, per ComfyUI's cli_args.py, "makes the text encoders run on the CPU") or --novram (stream all weights from RAM). An RTX 5080 owner — same 16 GB tier — reports --novram drops the GPU footprint dramatically: "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment). Treat that specific 3 GB figure as the 5080 reporter's measurement, not a 7800 XT benchmark.

2. Fix the RDNA3 ROCm load-stall. Without workaround flags, LTX-2.3 on an RDNA3 / ROCm 7.2 box stalls at "Requested to load LTXAV" and can hang or fill RAM/swap until the system becomes unresponsive (Issue #13730). Per a community comment on that thread, the root cause is host-side pinned-memory exhaustion rather than VRAM — the static pin-memory path fills unreclaimable host pins on AMD (Issue #13730 comment). The load-bearing flags are therefore --disable-pinned-memory (cli_args.py: "Disable pinned memory use.") and --disable-smart-memory (cli_args.py: "Force ComfyUI to agressively offload to regular ram instead of keeping models in vram when it can."). Note that --disable-dynamic-vram does nothing on AMD — a ComfyUI contributor notes dynamic VRAM is disabled by default there, so the flag is a no-op (Issue #13730 contributor comment) — so it is omitted below.

Combine both fixes:

# Working launch on RX 7800 XT / ROCm 7.2: Gemma on CPU + the ROCm pinned-memory stall fix
python main.py --lowvram --disable-pinned-memory --disable-smart-memory
# Alternative: stream ALL weights from RAM (the 3-GB-footprint path from Issue #303)
python main.py --novram --disable-pinned-memory --disable-smart-memory

A separate RX 7900 XTX owner running LTX-2 i2v independently reports --disable-pinned-memory is load-bearing on RDNA3 — "second run crashes when not using --disable-pinned-memory" (Issue #11949). AMD's own ComfyUI-on-Radeon install guide likewise documents adding --lowvram and --disable-pinned-memory for low-memory configs.

Once running, open the browser UI (default http://127.0.0.1:8188) and load one of the example workflows shipped by the Lightricks node:

ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/

Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 4. Wire the Gemma 3 12B encoder through the GGUF text-encoder loader. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.

Recommended distilled settings

ParameterValueSource
Sampler steps8Distilled checkpoint default per Lightricks/LTX-2.3 model card
CFG1.0Same
Resolutionwidth & height divisible by 32Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32."
Frame countdivisible by 8 + 1 (e.g. 65, 97, 121)Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1."

Start small (e.g. 512×512, 65 frames) to confirm the load clears the ROCm stall and the encoder offload holds before scaling resolution.

Results

  • Speed: Omitted. No RX-7800-XT LTX-2.3 22B benchmark at a fixed configuration has been published, and /check/ltx-video-2-3/rx-7800-xt currently returns verdict: unknown with no benchmark data. There is no close-sibling AMD measurement to cite even as a lower bound. If you've measured LTX-2.3 timings on a 7800 XT, please contribute them so they land on /check/ltx-video-2-3/rx-7800-xt.
  • VRAM usage: With the Gemma encoder forced to the CPU, the 16 GB card runs the distilled Q4_K_S GGUF transformer (13.12 GB on disk) resident with room for the connectors and VAE-decode activations. Leaving the encoder on the GPU OOMs a 16 GB card at "Peak Usage: 29068 MiB" (Issue #303 body, filed on a 16 GB RTX 5080). The min_vram_gb: 16 reflects the Q4_K_S lead path with the encoder on CPU; lighter quants (Q3_K_S 9.95 GB) free further headroom. The binding factor on AMD is ROCm's pinned-memory stall during the weight load, not raw VRAM — see Running / Troubleshooting. See /check/ltx-video-2-3/rx-7800-xt for any community-submitted measurement.
  • Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. Q4_K_S keeps full Q4 quality; Q3_K_S frees ~3 GB of headroom (9.95 GB vs 13.12 GB on disk) at a small quality cost. There is no FP8/FP4 path to consider on RDNA3 — the GGUF integer quant is the memory-saving route, for both the transformer and the Gemma encoder.

For the full benchmark data and other-GPU comparisons, see /check/ltx-video-2-3/rx-7800-xt.

Troubleshooting

LTX-2.3 stalls at "Requested to load LTXAV" and then hangs or fills RAM (the headline RDNA3 trap)

This is the dominant RDNA3 failure mode and it is not about running out of 16 GB of VRAM — it is ROCm's host-side pinned-memory path filling up. An RDNA3 / ROCm 7.2 owner reports LTX-2.3 stalls at "Requested to load LTXAV" and that, without workarounds, the system can fill RAM, VRAM, and swap until Linux becomes unresponsive (Issue #13730). A community commenter traces it to the static pin_memory path ignoring the host-RAM pin budget on AMD (comment). Fixes, in order:

  1. Launch with --disable-pinned-memory --disable-smart-memory — the pinned-memory disable targets the root cause; smart-memory disable forces idle weights and the Gemma encoder out to system RAM. Needs ample RAM (64 GB recommended).
  2. Add --lowvram (or --novram) to keep the Gemma 3 12B encoder off the GPU — on 16 GB it cannot share the card with the transformer.
  3. Do not bother with --disable-dynamic-vram — a ComfyUI contributor notes dynamic VRAM is off by default on AMD, so the flag is a no-op there (Issue #13730 contributor comment).

A proper fix is tracked upstream (PR #14525, gating the static pin path on the same budget check the dynamic path uses); until it lands, the launch flags above are the workaround.

Out of memory loading the Gemma 3 12B text encoder

The unquantized Gemma 3 12B encoder "needs ~24-27GB VRAM to operate" and "makes LTX-2 unusable on consumer GPUs with 16GB VRAM (RTX 5080, RTX 4080, etc.), even with FP8 models and all optimizations applied" (Issue #303). On the 7800 XT, keep it on the CPU:

  1. Launch with --lowvram so the text encoders run on CPU, or --novram to stream all weights from RAM (one 16 GB-card owner reports the GPU footprint drops to ~3 GB with --novramcomment).
  2. Use the GGUF QAT-Q4 Gemma from Step 6 (7.43 GB). On RDNA3, do not substitute an FP8 single-file Gemma encoder as a memory-saver — RDNA3 has no FP8 hardware, so it upcasts to BF16/FP16 with no memory win. The GGUF integer quant is the genuine reduction.

Encoder is slow on CPU

Forcing the Gemma 3 12B encoder to RAM makes the text-encode pass slow. A community user reports the Lightricks node's LTXV Audio Text Encoder Loader loads Gemma far faster than the default loader — "LTXV Audio Text Encoder Loader" for Gemma works 8x times faster then normal loader (sic) (Issue #303 comment). Swap the default Gemma loader for it where available.

"Torch not compiled with CUDA enabled"

A CUDA build of PyTorch got installed instead of the ROCm build. Per the ComfyUI README troubleshooting note, uninstall and reinstall against the ROCm wheel index:

pip uninstall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2

torch.cuda.is_available() should return True even on AMD — ROCm presents under the cuda device namespace via HIP.

Do not install xformers or FlashAttention on this card

HF and ComfyUI guides written for NVIDIA frequently suggest pip install xformers or a FlashAttention wheel. On RDNA3 these are the wrong path: the ROCm xformers fork is limited (no FP32, head-dim ≤ 256), and consumer-card CK FlashAttention builds routinely fail on gfx110x. ComfyUI already routes attention through PyTorch SDPA on this stack — stick with the default. There is likewise no FP8 path here: RDNA3 has no FP8 hardware, so an FP8 LTX checkpoint upcasts to BF16/FP16 with no memory win. Use the GGUF integer quants for the memory savings instead.

Audio-video output not synchronized

LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.

common questions
How much VRAM does LTX Video 2.3 need?

About 16 GB — the minimum this recipe targets.

Which GPUs is LTX Video 2.3 tested on?

RX 7800 XT (16 GB).

How hard is this setup?

Advanced — follow the steps above.