What You'll Build
Generate short, synchronized audio + video clips locally with LTX-2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 24 GB RTX 4090. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — and the RTX 4090's 24 GB falls below that 32 GB native floor. So this recipe runs the distilled GGUF transformer, exactly like the 16 GB cards — but the 4090's extra 8 GB buys you a roomier, higher-fidelity quant (Q5_K_S / Q6_K instead of the 16 GB floor's Q4_K_S) and room to keep a quantized Gemma 3 12B encoder alongside it.
Hardware data: RTX 4090 (24GB VRAM, Ada sm_89) · distilled Q6_K GGUF + quantized Gemma 3 12B · See benchmark data
⚠️ 24 GB is "in the gap" — better than the 16 GB floor, short of the 32 GB native fit. Two facts define the 4090's tier. (1) The unquantized Gemma 3 12B encoder "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per ComfyUI-LTXVideo Issue #303 — so it does not fit on a 24 GB card either. A 4090 owner confirms it directly: "So I tried this on my 24 GB VRAM 4090. No shot at running Gemma. But I tried the --novram option..." (Issue #303 comment). (2) The Blackwell-native paths the RTX 5090 sibling leans on are off the table here: the official NVFP4 quant requires Blackwell sm_120 microscaling hardware (Ada and Ampere cannot accelerate it), and Kijai's scaled FP8 transformers are ~23–25 GB on disk (Kijai/LTX2.3_comfy/tree/main/diffusion_models) — they alone nearly fill the 24 GB card with no room for the encoder + VAE + activations. GGUF, not native, on a 4090.
Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT for the older LTX-2 19B line (repo Lightricks/LTX-2) and NOT for the LTX-Video 0.9.x 2B/13B family (repo Lightricks/LTX-Video). Several Issue #303 reports cited below were filed against LTX-2 19B; they are cited only for the shared Gemma 3 12B encoder failure mode, which is identical across both lines because they use the same 12B encoder.
ℹ️ License. The LTX-2.3 weights are released under the
ltx-2-community-license-agreement(the model card'slicense:isotherwithlicense_name: ltx-2-community-license-agreement, license link) — not Apache-2.0. Read the license before any commercial use.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM (Q4_K_S floor) | RTX 4090 (24GB, Ada sm_89) — roomier Q5/Q6 quant |
| RAM | 32GB | 64GB recommended (Issue #303 — 16/24 GB-card reporters used 32–64GB) |
| Storage | ~25GB | ~25GB (Q6_K transformer + connectors + quantized Gemma encoder + VAE) |
| Software | ComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes | Python 3.10+, CUDA 12.7+ |
The full unquantized LTX-2.3 requires a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. The 24 GB 4090 is below that floor, so you still run a distilled GGUF quant — but with 8 GB more headroom than the 16 GB cards, you can step the transformer up from Q4_K_S to Q5_K_S or Q6_K for better fidelity, and you can keep the quantized Gemma encoder resident rather than streaming it (see Running).
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
On the RTX 4090 (Ada Lovelace, sm_89) the default pip install torch already includes sm_89 kernels in the stable cu124-class wheel — unlike Blackwell (sm_120), no special --index-url / cu128 wheel selection is needed. The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7" per the Lightricks/LTX-2.3 model card.
2. Install the LTX-Video, GGUF, and KJNodes custom nodes
cd ComfyUI/custom_nodes
# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt
# city96's GGUF loader — required for the quantized transformer + Gemma-GGUF encoder
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt
# Kijai's KJNodes — used by the recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
The unsloth/LTX-2.3-GGUF model card lists city96/ComfyUI-GGUF and kijai/ComfyUI-KJNodes for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler and example workflows.
3. Download a roomier distilled GGUF transformer (the 24 GB advantage)
The distilled checkpoint runs at 8 steps with CFG 1.0 and is the right choice for a consumer card. Where the 16 GB floor recipe tops out at Q4_K_S (13.12 GB) to leave room for the encoder + VAE activations, the 4090's 24 GB lets you go higher for better fidelity — Q5_K_S (15.25 GB) or Q6_K (17.77 GB):
# Q6_K distilled (17.77 GB on disk) — highest practical quant for 24 GB
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q6_K.gguf \
--local-dir ComfyUI/models/unet/
# OR Q5_K_S distilled (15.25 GB) — more headroom for a resident encoder
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q5_K_S.gguf \
--local-dir ComfyUI/models/unet/
Distilled-transformer GGUF file sizes (verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):
| Quant | unsloth distilled file size | 4090 fit notes |
|---|---|---|
| Q3_K_S | 9.95 GB | 16 GB floor's headroom pick |
| Q4_K_S | 13.12 GB | 16 GB floor's standard pick |
| Q5_K_S | 15.25 GB | comfortable on 24 GB + resident quantized encoder |
| Q6_K | 17.77 GB | highest practical 24 GB quant |
| Q8_0 | 22.76 GB | nearly fills the card — encoder must offload |
| BF16 | 42.04 GB | out of scope for 24 GB |
There is also a newer distilled-1.1 revision in the same repo (a "different aesthetic experience and improved audio compared to v1.0" per the Lightricks/LTX-2.3 model card); either revision works and this recipe pins the original distilled/. The full ltx-2.3-22b-dev BF16 transformer is 42.04 GB on disk (unsloth/LTX-2.3-GGUF tree) and is out of scope for 24 GB.
4. Download the embeddings connectors and VAE
The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE (LTX-2.3 ships them separately for the GGUF flow):
huggingface-cli download unsloth/LTX-2.3-GGUF \
text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
vae/ltx-2.3-22b-distilled_video_vae.safetensors \
vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
--local-dir ComfyUI/models/
(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — verified live via the unsloth/LTX-2.3-GGUF tree API.)
5. Download a quantized Gemma 3 12B text encoder
LTX-2.3 uses Gemma 3 12B as its text encoder. The unquantized Gemma "needs ~24-27GB VRAM to operate" per Issue #303 — which, as the 4090 owner above found, does not fit even on 24 GB. So you download a quantized encoder. The GGUF QAT-Q4 encoder is the smallest broadly-supported option:
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
mmproj-BF16.gguf \
--local-dir ComfyUI/models/text_encoders/
The encoder file is 7.43 GB and the mmproj 0.85 GB (verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API). An FP8 single-file alternative (gemma_3_12B_it_fp8_e4m3fn.safetensors, 13.21 GB at GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn) is documented in Issue #303's workaround comment — see Troubleshooting for when to prefer it.
Running
The 24 GB card gives you two viable strategies — pick based on whether you want the encoder resident (faster) or want maximum transformer headroom.
Strategy A — quantized encoder resident (the 24 GB advantage). With a Q5_K_S/Q6_K transformer (15–18 GB) plus the QAT-Q4 Gemma (~7.4 GB), the budget is tight but workable on 24 GB; reserve a little VRAM so ComfyUI spills only the overflow:
# Reserve headroom so the encoder co-resides cleanly with the transformer
python main.py --listen --reserve-vram 4
--reserve-vram is the official low-VRAM knob from the ComfyUI-LTXVideo README ("Use --reserve-vram ComfyUI parameter"). Keeping the quantized encoder on-GPU lets the GPU process the text-encode pass instead of the CPU — the speed win the 16 GB cards can't get.
Strategy B — stream from RAM (always fits, slower text-encode). This is the exact path the 4090 owner used after finding the unquantized Gemma wouldn't fit (comment):
# Stream all weights from RAM — GPU footprint collapses, text-encode runs on CPU
python main.py --listen --novram
With --novram an RTX 5080 16GB owner reports the GPU footprint drops dramatically: "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter). Treat that 3 GB figure as the 5080 reporter's measurement on the streaming path, not a 4090 benchmark.
Open the browser UI and load one of the example workflows shipped by the Lightricks node:
ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/
Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 3. Wire the Gemma 3 12B encoder through the GGUF text-encoder loader. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.
Recommended distilled settings
| Parameter | Value | Source |
|---|---|---|
| Sampler steps | 8 | Distilled checkpoint default per Lightricks/LTX-2.3 model card |
| CFG | 1.0 | Same |
| Resolution | width & height divisible by 32 | Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32." |
| Frame count | divisible by 8 + 1 (e.g. 65, 97, 121) | Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1." |
Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution.
Results
- Speed: Omitted. No RTX 4090 benchmark for LTX-2.3 22B at a fixed configuration has been published, and /check/ltx-video-2-3/rtx-4090 currently has no benchmark data. The only same-model community timings are on a different card (an RTX 5090, 32 GB, faster memory bandwidth — see the 5090 recipe) and cannot transfer down to the 4090 cleanly. Empirical 4090 timings will appear at /check/ltx-video-2-3/rtx-4090 once a community benchmark lands via /contribute.
- VRAM usage: The
min_vram_gb: 20reflects this recipe's lead path — a roomier resident GGUF transformer (Q6_K, 17.77 GB on disk) plus the VAE-decode activations, with the quantized Gemma encoder either co-resident (Strategy A) or streamed (Strategy B). This is a derived working envelope for the 24 GB card, not a measured peak; a 24 GB card has room to step up the transformer quant beyond the 16 GB floor's Q4_K_S (13.12 GB). What 24 GB does not buy you is the unquantized Gemma encoder on-GPU — it OOMs at "Peak Usage: 29068 MiB" (Issue #303 body, filed on a 16 GB card) and a 4090 owner confirms "No shot at running Gemma" (comment). See /check/ltx-video-2-3/rtx-4090. - Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. On 24 GB you can run Q6_K (17.77 GB) for noticeably less quantization loss than the 16 GB floor's Q4_K_S — the main quality lever this card class unlocks. Note the 4090's Ada (sm_89, 40-series) architecture does support FP8 matmul on Kijai's input-scaled FP8 transformers (per the Kijai/LTX2.3_comfy README: set to "run with fp8 matmuls on supported hardware (roughly 40xx and later Nvidia GPUs)"), unlike Ampere — but at 23–25 GB on disk those FP8 files leave no room for the encoder on 24 GB, so GGUF remains the practical lead here.
For the full benchmark data, see /check/ltx-video-2-3/rtx-4090.
Troubleshooting
"No shot at running Gemma" — the unquantized encoder OOMs even on 24 GB
This is the defining 4090 surprise: 24 GB looks like it should fit a 12B encoder, but it does not. The unquantized Gemma 3 12B "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per Issue #303, and a 4090 owner reports "No shot at running Gemma" on their 24 GB card (comment). Fixes, in order of preference:
- Use the quantized Gemma from Step 5 (QAT-Q4 GGUF, 7.43 GB) — small enough to co-reside with a Q5/Q6 transformer on 24 GB.
- Launch with
--reserve-vram 4(Strategy A) to keep the quantized encoder resident while spilling overflow, or--novram(Strategy B) to stream everything from RAM. - The FP8 single-file
gemma_3_12B_it_fp8_e4m3fn.safetensors(13.21 GB) from a workaround comment is an alternative, but at 13.21 GB it leaves little room next to a large transformer on 24 GB — prefer the QAT-Q4 GGUF unless you specifically need the FP8 encoder.
mat1 and mat2 shapes cannot be multiplied after enabling --novram
The same 4090 owner hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a --preview-method latent2rgb flag (Issue #303 comment). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.
Encoder is slow on CPU under --novram
The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass — this is the main reason to prefer Strategy A on a 24 GB card. If you do stay on --novram, a community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.
NVFP4 will not accelerate on the 4090
The official Lightricks/LTX-2.3-nvfp4 quant requires Blackwell sm_120 microscaling hardware. The RTX 4090 is Ada (sm_89), so NVFP4 cannot be accelerated on it — that path belongs to the RTX 5090 recipe. Stick to GGUF (or the FP8-matmul-capable Kijai transformers, VRAM permitting) on the 4090.
FlashAttention on Ada (sm_89)
Unlike Blackwell (sm_120), the RTX 4090's Ada sm_89 architecture is fully covered by prebuilt FlashAttention wheels, so there is no kernel-availability gap to work around — and LTX-2.3's ComfyUI path defaults to PyTorch SDPA regardless. If you compile the optional LTX-Video FP8-matmul kernels, make sure your CUDA toolkit is 12.8+; a mismatched toolkit produces an sm89 assertion failure at the FP8 matmul (Issue #182). The GGUF path in this recipe does not use those kernels.
Audio-video output not synchronized
LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.