What You'll Build
Generate synchronized audio-video clips locally with LTX Video 2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on an RTX 5090, the first consumer GPU that meets the 32GB+ VRAM minimum stated in the official ComfyUI-LTXVideo README. Where the 16 GB RTX 5060 Ti sibling had to lean on Q4_K_S GGUF + a quantized Gemma encoder to squeeze under the envelope, the 5090 runs the FP8 transformer + unquantized Gemma 3 12B side-by-side without spillover, and adds a Blackwell-native NVFP4 path (Lightricks/LTX-2.3-nvfp4) that older Ada / Ampere cards cannot accelerate.
Hardware data: RTX 5090 (32 GB VRAM) · FP8 distilled at 8 steps · See benchmark data
Variant pin. This recipe is for LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3, released 2026-04-13). It is NOT for LTX-2 (19B sibling, repo Lightricks/LTX-2) and NOT for the older LTX-Video 0.9.x line (2B / 13B, repo Lightricks/LTX-Video). All three live under the same Lightricks umbrella; the version suffix is load-bearing.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 32 GB VRAM per the official ComfyUI-LTXVideo README | RTX 5090 (32 GB) |
| RAM | 32 GB | 64 GB (HF discussion #16 bmgjet single-card 5090 setup) |
| Storage | ~25 GB (FP8) / ~21 GB (NVFP4) / ~50 GB (full BF16) | — |
| Software | ComfyUI + ComfyUI-LTXVideo, Python 3.10+, CUDA 12.7+, PyTorch ~2.7 | — |
The full unquantized LTX-2.3 needs 32 GB+ VRAM per the official ComfyUI-LTXVideo README — that minimum is met exactly by the 5090; the 16 GB sibling recipe at /check/ltx-video-2-3/rtx-5060-ti is the GGUF-squeeze path for cards below this floor.
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
On Blackwell (sm_120) install a cu128-class PyTorch wheel — the default pip install torch will pick this up automatically on PyTorch 2.7+. Per the Lightricks/LTX-2.3 model card, the codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7."
2. Install the LTX-Video, GGUF, and KJNodes custom nodes
cd ComfyUI/custom_nodes
# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt
# city96's GGUF loader — required for GGUF and Gemma-GGUF text encoder
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt
# Kijai's KJNodes — used by recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
This three-node trio is the same one listed by the unsloth/LTX-2.3-GGUF model card for the GGUF workflow and is the canonical setup across the LTX-2.3 sibling recipes.
3. Download the FP8 distilled checkpoint (primary path)
Kijai's official ComfyUI-shaped mirror at Kijai/LTX2.3_comfy ships scaled FP8 transformers that target Ada / Hopper / Blackwell FP8 matmul. Per the Kijai/LTX2.3_comfy README: "the models marked input_scaled additionally have activation scaling, and are set to run with fp8 matmuls on supported hardware (roughly 40xx and later Nvidia GPUs)." The 5090's Blackwell sm_120 hits this fast path.
# FP8 distilled transformer (~21.86 GB on disk)
huggingface-cli download Kijai/LTX2.3_comfy \
diffusion_models/ltx-2.3-22b-distilled_transformer_only_fp8_scaled.safetensors \
--local-dir ComfyUI/models/diffusion_models/
# Or the input-scaled FP8 matmul variant (~21.57 GB)
huggingface-cli download Kijai/LTX2.3_comfy \
diffusion_models/ltx-2.3-22b-distilled_transformer_only_fp8_input_scaled_v2.safetensors \
--local-dir ComfyUI/models/diffusion_models/
# VAE + text projection (Kijai mirror exposes these standalone)
huggingface-cli download Kijai/LTX2.3_comfy \
vae/LTX23_video_vae_bf16.safetensors \
vae/LTX23_audio_vae_bf16.safetensors \
text_encoders/ltx-2.3_text_projection_bf16.safetensors \
--local-dir ComfyUI/models/
File sizes verified via Kijai/LTX2.3_comfy/tree/main/diffusion_models.
4. Download the Gemma 3 12B text encoder
LTX-2.3 uses Gemma 3 12B as its text encoder. At 32 GB VRAM the unquantized QAT-Q4 file runs alongside the FP8 transformer comfortably:
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
mmproj-BF16.gguf \
--local-dir ComfyUI/models/text_encoders/
Cards below the 32 GB floor must use the smaller-encoder workarounds documented in ComfyUI-LTXVideo Issue #303 (sibling-variant LTX-2 19B reporter; the encoder OOM mechanism is model-class-independent and applies equally to LTX-2.3 since both use the same Gemma 3 12B encoder). The 5090 sidesteps that entire failure class.
5. (Optional, Blackwell-only) Download the NVFP4 transformer
Lightricks ships an official NVFP4 quant trained by Quantization-Aware Distillation at Lightricks/LTX-2.3-nvfp4. The NVFP4 file is a single 20.21 GB safetensors that fits inside 32 GB with comfortable headroom for the encoder + VAE + activations:
huggingface-cli download Lightricks/LTX-2.3-nvfp4 \
ltx-2.3-22b-dev-nvfp4.safetensors \
--local-dir ComfyUI/models/diffusion_models/
The NVFP4 format requires Blackwell sm_120 microscaling hardware — Ada (RTX 4090) and Ampere (RTX 3090) cards cannot accelerate this format. On the 5090 it is the fastest single-stream path documented; see Troubleshooting for the known LTX-2 19B sibling-variant performance regression at ComfyUI-LTXVideo Issue #335.
6. (Alternative) GGUF distilled path
If you want to share VRAM with another model on the same card, the GGUF distilled files at QuantStack/LTX-2.3-GGUF and unsloth/LTX-2.3-GGUF ship the same per-tier ladder as the 16 GB sibling recipe — useful if you need < 20 GB resident:
| Quant | QuantStack distilled file size | unsloth dev file size |
|---|---|---|
| Q3_K_S | 13.00 GB | 9.26 GB |
| Q4_K_S | 15.56 GB | 12.22 GB |
| Q4_K_M | 16.54 GB | 13.34 GB |
| Q5_K_M | 18.06 GB | 14.97 GB |
| Q6_K | 19.56 GB | 16.55 GB |
| Q8_0 | 23.75 GB | 21.19 GB |
| BF16 | — | 39.15 GB |
Sizes verified via QuantStack/LTX-2.3-GGUF/tree/main/LTX-2.3-distilled and unsloth/LTX-2.3-GGUF tree API.
Running
Launch ComfyUI:
python main.py --listen
Load one of the example workflows shipped by the Lightricks node:
ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/
Wire the FP8 transformer (or NVFP4 transformer) into the diffusion-model loader and point the text encoder at the Gemma 3 12B GGUF you downloaded in Step 4.
Recommended distilled settings
| Parameter | Value | Source |
|---|---|---|
| Sampler steps | 8 | Distilled checkpoint default per Lightricks/LTX-2.3 model card |
| CFG | 1.0 | Same |
| Resolution | width × height divisible by 32 | Lightricks/LTX-2.3 model card "General tips" |
| Frame count | multiple of 8 + 1 (e.g. 65, 97, 121, 241, 481) | Same |
Start at 1280×720 / 65 frames to confirm the install before scaling to 1920×1080 multi-second clips.
Results
- Speed: A community user (
bmgjet, HF discussion #16 comment 2026-03-08) on a 64 GB RAM + RTX 5090 single-card setup running LTX-2.3-22b Dev FP8 at 8 steps, 1280×720 reported"82 sec for 481 frames"and"547sec for 481 frames at 1920X1080". Note both measurements come from a community speaker (isHfStaff: false, no Lightricks team badge), not Lightricks itself — direct first-party 5090 timings have not been published. Empirical / corroborating data will appear at /check/ltx-video-2-3/rtx-5090 as community benchmarks land via /contribute. - VRAM usage: The official ComfyUI-LTXVideo README states
"32GB+ VRAM"as the prerequisite for the canonical install — met exactly by the 5090. The on-disk FP8 distilled transformer is 21.86 GB (Kijai/LTX2.3_comfy/tree/main/diffusion_models) and the NVFP4 alternative is 20.21 GB (Lightricks/LTX-2.3-nvfp4/tree/main); the Gemma 3 12B text encoder loads on top of that. ComfyUI-LTXVideo Issue #303 reports a"Peak Usage: 29068 MiB"on a 16 GB RTX 5080 running LTX-2 19B (sibling variant) with FP8 + unquantized Gemma — that peak is just under the 5090's 32 GB ceiling, leaving ~3 GB of operating margin on the LTX-2.3 22B equivalent path. - Quality notes: The distilled checkpoint (8 steps, CFG=1) is the recommended balance for the 5090 — the full
ltx-2.3-22b-devBF16 file is 42.98 GB on disk per Lightricks/LTX-2.3/tree/main and will exceed the 32 GB envelope without offload. NVFP4 trades a small accuracy delta for a Blackwell-native acceleration path; the Lightricks NVFP4 card explicitly notes it was "trained by Quantization Aware Distillation for improved accuracy."
For the full benchmark data, see /check/ltx-video-2-3/rtx-5090.
Troubleshooting
Gemma 3 12B text encoder OOM on the unquantized loader
The default gemma_3_12B_it.safetensors 12B-parameter encoder is the dominant peak-VRAM contributor at first inference — ComfyUI-LTXVideo Issue #303 documents "Peak Usage: 29068 MiB" even on FP8 LTX-2 19B (sibling variant). On the 5090's 32 GB envelope this fits with ~3 GB margin if the Gemma is also FP8/QAT-Q4 — but if you try the unquantized BF16 Gemma weights, the combined peak crosses the ceiling. Use the gemma-3-12b-it-qat-UD-Q4_K_XL.gguf file from Step 4 and confirm peak VRAM stays below 30 GB before scaling resolution.
Quality artifacts ("screen-door pattern") on the Dev base model
ComfyUI-LTXVideo Issue #379 is an open thread reporting that the ltx-2-19b-dev base model (sibling variant — LTX-2 19B, not LTX-2.3 22B) produces a "gritty screen-door pattern" on RTX 5090 / Blackwell, while the distilled checkpoint does not show the artifact. A community user (monnky) reproduces the same artifact on an RTX 3060 with the Dev model — i.e. the failure is not Blackwell-specific, it's a Dev-vs-distilled choice. The recipe above defaults to the distilled checkpoint for that reason; if you need the full Dev model's flexibility, expect the artifact and either run extra sampling steps (monnky tried 30 steps without resolution) or stick to distilled.
NVFP4 performance gap versus the NVIDIA-published benchmark
ComfyUI-LTXVideo Issue #335 is a closed thread where a community user reports LTX-2 19B NVFP4 (sibling variant — not LTX-2.3) running ~30-40% slower than the published expectation on RTX 5090 (~66 sec for 143 frames at 720p, vs ~40-45 sec expected). The thread surfaces PCIe-lane sensitivity and per-version-of-ComfyUI inconsistency — both of which are model-class-independent and likely to affect LTX-2.3 NVFP4 inference too. If you observe slower-than-expected NVFP4 throughput, confirm PCIe Gen5 x16 connection and update ComfyUI / ComfyUI-LTXVideo to the latest main.
FlashAttention-2 on Blackwell (sm_120)
LTX-2.3's PyTorch codebase defaults to PyTorch SDPA / xformers (not raw FlashAttention-2), so the FA2 sm_120 wheel gap is rarely the blocking issue. If you do try FA2 via --use-flash-attention or a custom wrapper, sm_120 kernel coverage is tracked at Dao-AILab/flash-attention#2168 — fall back to SDPA on Blackwell until that lands.
"32GB+ VRAM" minimum context
The 5090 is the only consumer NVIDIA card that meets the official 32GB+ minimum from the ComfyUI-LTXVideo README. Below this floor (RTX 4090 24 GB, RTX 3090 24 GB, RTX 5080 16 GB, etc.) the documented path is GGUF + quantized Gemma — see Issue #303 Impact section which explicitly lists "RTX 5090 / RTX PRO 6000" as the supported consumer envelope. The 16 GB sibling recipe at /check/ltx-video-2-3/rtx-5060-ti covers the GGUF-squeeze path.