self-hosted/ai
§01·recipe · video

LTX Video 2.3 on RTX 5090: First Card That Hits the 32 GB Floor

videointermediate28GB+ VRAMMay 25, 2026
models
tools
prerequisites
  • NVIDIA RTX 5090 (32 GB VRAM) — the only consumer card that meets the official 32GB+ minimum
  • Python 3.10+ and CUDA 12.7+ (PyTorch 2.7-class; cu128 build recommended for sm_120)
  • ComfyUI installed (latest version) + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes
  • 32 GB+ system RAM (64 GB recommended for the unquantized Gemma 3 12B text encoder)
  • ~50 GB free disk space (full BF16) or ~25 GB (FP8) or ~21 GB (NVFP4)

What You'll Build

Generate synchronized audio-video clips locally with LTX Video 2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on an RTX 5090, the first consumer GPU that meets the 32GB+ VRAM minimum stated in the official ComfyUI-LTXVideo README. Where the 16 GB RTX 5060 Ti sibling had to lean on Q4_K_S GGUF + a quantized Gemma encoder to squeeze under the envelope, the 5090 runs the FP8 transformer + unquantized Gemma 3 12B side-by-side without spillover, and adds a Blackwell-native NVFP4 path (Lightricks/LTX-2.3-nvfp4) that older Ada / Ampere cards cannot accelerate.

Hardware data: RTX 5090 (32 GB VRAM) · FP8 distilled at 8 steps · See benchmark data

Variant pin. This recipe is for LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3, released 2026-04-13). It is NOT for LTX-2 (19B sibling, repo Lightricks/LTX-2) and NOT for the older LTX-Video 0.9.x line (2B / 13B, repo Lightricks/LTX-Video). All three live under the same Lightricks umbrella; the version suffix is load-bearing.

Requirements

ComponentMinimumTested
GPU32 GB VRAM per the official ComfyUI-LTXVideo READMERTX 5090 (32 GB)
RAM32 GB64 GB (HF discussion #16 bmgjet single-card 5090 setup)
Storage~25 GB (FP8) / ~21 GB (NVFP4) / ~50 GB (full BF16)
SoftwareComfyUI + ComfyUI-LTXVideo, Python 3.10+, CUDA 12.7+, PyTorch ~2.7

The full unquantized LTX-2.3 needs 32 GB+ VRAM per the official ComfyUI-LTXVideo README — that minimum is met exactly by the 5090; the 16 GB sibling recipe at /check/ltx-video-2-3/rtx-5060-ti is the GGUF-squeeze path for cards below this floor.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

On Blackwell (sm_120) install a cu128-class PyTorch wheel — the default pip install torch will pick this up automatically on PyTorch 2.7+. Per the Lightricks/LTX-2.3 model card, the codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7."

2. Install the LTX-Video, GGUF, and KJNodes custom nodes

cd ComfyUI/custom_nodes

# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

# city96's GGUF loader — required for GGUF and Gemma-GGUF text encoder
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

# Kijai's KJNodes — used by recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt

This three-node trio is the same one listed by the unsloth/LTX-2.3-GGUF model card for the GGUF workflow and is the canonical setup across the LTX-2.3 sibling recipes.

3. Download the FP8 distilled checkpoint (primary path)

Kijai's official ComfyUI-shaped mirror at Kijai/LTX2.3_comfy ships scaled FP8 transformers that target Ada / Hopper / Blackwell FP8 matmul. Per the Kijai/LTX2.3_comfy README: "the models marked input_scaled additionally have activation scaling, and are set to run with fp8 matmuls on supported hardware (roughly 40xx and later Nvidia GPUs)." The 5090's Blackwell sm_120 hits this fast path.

# FP8 distilled transformer (~21.86 GB on disk)
huggingface-cli download Kijai/LTX2.3_comfy \
  diffusion_models/ltx-2.3-22b-distilled_transformer_only_fp8_scaled.safetensors \
  --local-dir ComfyUI/models/diffusion_models/

# Or the input-scaled FP8 matmul variant (~21.57 GB)
huggingface-cli download Kijai/LTX2.3_comfy \
  diffusion_models/ltx-2.3-22b-distilled_transformer_only_fp8_input_scaled_v2.safetensors \
  --local-dir ComfyUI/models/diffusion_models/

# VAE + text projection (Kijai mirror exposes these standalone)
huggingface-cli download Kijai/LTX2.3_comfy \
  vae/LTX23_video_vae_bf16.safetensors \
  vae/LTX23_audio_vae_bf16.safetensors \
  text_encoders/ltx-2.3_text_projection_bf16.safetensors \
  --local-dir ComfyUI/models/

File sizes verified via Kijai/LTX2.3_comfy/tree/main/diffusion_models.

4. Download the Gemma 3 12B text encoder

LTX-2.3 uses Gemma 3 12B as its text encoder. At 32 GB VRAM the unquantized QAT-Q4 file runs alongside the FP8 transformer comfortably:

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  mmproj-BF16.gguf \
  --local-dir ComfyUI/models/text_encoders/

Cards below the 32 GB floor must use the smaller-encoder workarounds documented in ComfyUI-LTXVideo Issue #303 (sibling-variant LTX-2 19B reporter; the encoder OOM mechanism is model-class-independent and applies equally to LTX-2.3 since both use the same Gemma 3 12B encoder). The 5090 sidesteps that entire failure class.

5. (Optional, Blackwell-only) Download the NVFP4 transformer

Lightricks ships an official NVFP4 quant trained by Quantization-Aware Distillation at Lightricks/LTX-2.3-nvfp4. The NVFP4 file is a single 20.21 GB safetensors that fits inside 32 GB with comfortable headroom for the encoder + VAE + activations:

huggingface-cli download Lightricks/LTX-2.3-nvfp4 \
  ltx-2.3-22b-dev-nvfp4.safetensors \
  --local-dir ComfyUI/models/diffusion_models/

The NVFP4 format requires Blackwell sm_120 microscaling hardware — Ada (RTX 4090) and Ampere (RTX 3090) cards cannot accelerate this format. On the 5090 it is the fastest single-stream path documented; see Troubleshooting for the known LTX-2 19B sibling-variant performance regression at ComfyUI-LTXVideo Issue #335.

6. (Alternative) GGUF distilled path

If you want to share VRAM with another model on the same card, the GGUF distilled files at QuantStack/LTX-2.3-GGUF and unsloth/LTX-2.3-GGUF ship the same per-tier ladder as the 16 GB sibling recipe — useful if you need < 20 GB resident:

QuantQuantStack distilled file sizeunsloth dev file size
Q3_K_S13.00 GB9.26 GB
Q4_K_S15.56 GB12.22 GB
Q4_K_M16.54 GB13.34 GB
Q5_K_M18.06 GB14.97 GB
Q6_K19.56 GB16.55 GB
Q8_023.75 GB21.19 GB
BF1639.15 GB

Sizes verified via QuantStack/LTX-2.3-GGUF/tree/main/LTX-2.3-distilled and unsloth/LTX-2.3-GGUF tree API.

Running

Launch ComfyUI:

python main.py --listen

Load one of the example workflows shipped by the Lightricks node:

ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/

Wire the FP8 transformer (or NVFP4 transformer) into the diffusion-model loader and point the text encoder at the Gemma 3 12B GGUF you downloaded in Step 4.

Recommended distilled settings

ParameterValueSource
Sampler steps8Distilled checkpoint default per Lightricks/LTX-2.3 model card
CFG1.0Same
Resolutionwidth × height divisible by 32Lightricks/LTX-2.3 model card "General tips"
Frame countmultiple of 8 + 1 (e.g. 65, 97, 121, 241, 481)Same

Start at 1280×720 / 65 frames to confirm the install before scaling to 1920×1080 multi-second clips.

Results

  • Speed: A community user (bmgjet, HF discussion #16 comment 2026-03-08) on a 64 GB RAM + RTX 5090 single-card setup running LTX-2.3-22b Dev FP8 at 8 steps, 1280×720 reported "82 sec for 481 frames" and "547sec for 481 frames at 1920X1080". Note both measurements come from a community speaker (isHfStaff: false, no Lightricks team badge), not Lightricks itself — direct first-party 5090 timings have not been published. Empirical / corroborating data will appear at /check/ltx-video-2-3/rtx-5090 as community benchmarks land via /contribute.
  • VRAM usage: The official ComfyUI-LTXVideo README states "32GB+ VRAM" as the prerequisite for the canonical install — met exactly by the 5090. The on-disk FP8 distilled transformer is 21.86 GB (Kijai/LTX2.3_comfy/tree/main/diffusion_models) and the NVFP4 alternative is 20.21 GB (Lightricks/LTX-2.3-nvfp4/tree/main); the Gemma 3 12B text encoder loads on top of that. ComfyUI-LTXVideo Issue #303 reports a "Peak Usage: 29068 MiB" on a 16 GB RTX 5080 running LTX-2 19B (sibling variant) with FP8 + unquantized Gemma — that peak is just under the 5090's 32 GB ceiling, leaving ~3 GB of operating margin on the LTX-2.3 22B equivalent path.
  • Quality notes: The distilled checkpoint (8 steps, CFG=1) is the recommended balance for the 5090 — the full ltx-2.3-22b-dev BF16 file is 42.98 GB on disk per Lightricks/LTX-2.3/tree/main and will exceed the 32 GB envelope without offload. NVFP4 trades a small accuracy delta for a Blackwell-native acceleration path; the Lightricks NVFP4 card explicitly notes it was "trained by Quantization Aware Distillation for improved accuracy."

For the full benchmark data, see /check/ltx-video-2-3/rtx-5090.

Troubleshooting

Gemma 3 12B text encoder OOM on the unquantized loader

The default gemma_3_12B_it.safetensors 12B-parameter encoder is the dominant peak-VRAM contributor at first inference — ComfyUI-LTXVideo Issue #303 documents "Peak Usage: 29068 MiB" even on FP8 LTX-2 19B (sibling variant). On the 5090's 32 GB envelope this fits with ~3 GB margin if the Gemma is also FP8/QAT-Q4 — but if you try the unquantized BF16 Gemma weights, the combined peak crosses the ceiling. Use the gemma-3-12b-it-qat-UD-Q4_K_XL.gguf file from Step 4 and confirm peak VRAM stays below 30 GB before scaling resolution.

Quality artifacts ("screen-door pattern") on the Dev base model

ComfyUI-LTXVideo Issue #379 is an open thread reporting that the ltx-2-19b-dev base model (sibling variant — LTX-2 19B, not LTX-2.3 22B) produces a "gritty screen-door pattern" on RTX 5090 / Blackwell, while the distilled checkpoint does not show the artifact. A community user (monnky) reproduces the same artifact on an RTX 3060 with the Dev model — i.e. the failure is not Blackwell-specific, it's a Dev-vs-distilled choice. The recipe above defaults to the distilled checkpoint for that reason; if you need the full Dev model's flexibility, expect the artifact and either run extra sampling steps (monnky tried 30 steps without resolution) or stick to distilled.

NVFP4 performance gap versus the NVIDIA-published benchmark

ComfyUI-LTXVideo Issue #335 is a closed thread where a community user reports LTX-2 19B NVFP4 (sibling variant — not LTX-2.3) running ~30-40% slower than the published expectation on RTX 5090 (~66 sec for 143 frames at 720p, vs ~40-45 sec expected). The thread surfaces PCIe-lane sensitivity and per-version-of-ComfyUI inconsistency — both of which are model-class-independent and likely to affect LTX-2.3 NVFP4 inference too. If you observe slower-than-expected NVFP4 throughput, confirm PCIe Gen5 x16 connection and update ComfyUI / ComfyUI-LTXVideo to the latest main.

FlashAttention-2 on Blackwell (sm_120)

LTX-2.3's PyTorch codebase defaults to PyTorch SDPA / xformers (not raw FlashAttention-2), so the FA2 sm_120 wheel gap is rarely the blocking issue. If you do try FA2 via --use-flash-attention or a custom wrapper, sm_120 kernel coverage is tracked at Dao-AILab/flash-attention#2168 — fall back to SDPA on Blackwell until that lands.

"32GB+ VRAM" minimum context

The 5090 is the only consumer NVIDIA card that meets the official 32GB+ minimum from the ComfyUI-LTXVideo README. Below this floor (RTX 4090 24 GB, RTX 3090 24 GB, RTX 5080 16 GB, etc.) the documented path is GGUF + quantized Gemma — see Issue #303 Impact section which explicitly lists "RTX 5090 / RTX PRO 6000" as the supported consumer envelope. The 16 GB sibling recipe at /check/ltx-video-2-3/rtx-5060-ti covers the GGUF-squeeze path.