self-hosted/ai
§01·recipe · video

LTX-2.3 on RTX 4090: 22B Audio-Video in the 24 GB Gap — Roomier GGUF, Still Below the 32 GB Native Floor

videoadvanced20GB+ VRAMJun 26, 2026

This advanced recipe sets up LTX-2.3 on the RTX 4090, needing about 20 GB of VRAM.

models
tools
prerequisites
  • NVIDIA RTX 4090 (24GB VRAM, Ada sm_89) — 24 GB sits between the 16 GB GGUF floor and the 32 GB native floor
  • 64GB system RAM recommended (the Gemma 3 12B text encoder is quantized and offloaded/streamed, not run unquantized on-GPU)
  • Python 3.12+ and CUDA 12.7+ — the LTX-2.3 codebase was tested with Python >=3.12; on the RTX 4090 (Ada, sm_89) the default `pip install torch` already ships sm_89 kernels in the stable cu124-class wheel, so no cu128 / special index-url is needed
  • ComfyUI installed (latest version) + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes
  • ~25GB free disk space for the roomier GGUF transformer + connectors + encoder + VAE

What You'll Build

Generate short, synchronized audio + video clips locally with LTX-2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 24 GB RTX 4090. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — and the RTX 4090's 24 GB falls below that 32 GB native floor. So this recipe runs the distilled GGUF transformer, exactly like the 16 GB cards — but the 4090's extra 8 GB buys you a roomier, higher-fidelity quant (Q5_K_S / Q6_K instead of the 16 GB floor's Q4_K_S) and room to keep a quantized Gemma 3 12B encoder alongside it.

Hardware data: RTX 4090 (24GB VRAM, Ada sm_89) · distilled Q6_K GGUF + quantized Gemma 3 12B · See benchmark data

⚠️ 24 GB is "in the gap" — better than the 16 GB floor, short of the 32 GB native fit. Two facts define the 4090's tier. (1) The unquantized Gemma 3 12B encoder "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per ComfyUI-LTXVideo Issue #303 — so it does not fit on a 24 GB card either. A 4090 owner confirms it directly: "So I tried this on my 24 GB VRAM 4090. No shot at running Gemma. But I tried the --novram option..." (Issue #303 comment). (2) The Blackwell-native paths the RTX 5090 sibling leans on are off the table here: the official NVFP4 quant requires Blackwell sm_120 microscaling hardware (Ada and Ampere cannot accelerate it), and Kijai's scaled FP8 transformers are ~23–25 GB on disk (Kijai/LTX2.3_comfy/tree/main/diffusion_models) — they alone nearly fill the 24 GB card with no room for the encoder + VAE + activations. GGUF, not native, on a 4090.

Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT for the older LTX-2 19B line (repo Lightricks/LTX-2) and NOT for the LTX-Video 0.9.x 2B/13B family (repo Lightricks/LTX-Video). Several Issue #303 reports cited below were filed against LTX-2 19B; they are cited only for the shared Gemma 3 12B encoder failure mode, which is identical across both lines because they use the same 12B encoder.

ℹ️ License. The LTX-2.3 weights are released under the ltx-2-community-license-agreement (the model card's license: is other with license_name: ltx-2-community-license-agreement, license link) — not Apache-2.0. Read the license before any commercial use.

Requirements

ComponentMinimumTested
GPU16GB VRAM (Q4_K_S floor)RTX 4090 (24GB, Ada sm_89) — roomier Q5/Q6 quant
RAM32GB64GB recommended (Issue #303 — 16/24 GB-card reporters used 32–64GB)
Storage~25GB~25GB (Q6_K transformer + connectors + quantized Gemma encoder + VAE)
SoftwareComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodesPython 3.10+, CUDA 12.7+

The full unquantized LTX-2.3 requires a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. The 24 GB 4090 is below that floor, so you still run a distilled GGUF quant — but with 8 GB more headroom than the 16 GB cards, you can step the transformer up from Q4_K_S to Q5_K_S or Q6_K for better fidelity, and you can keep the quantized Gemma encoder resident rather than streaming it (see Running).

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

On the RTX 4090 (Ada Lovelace, sm_89) the default pip install torch already includes sm_89 kernels in the stable cu124-class wheel — unlike Blackwell (sm_120), no special --index-url / cu128 wheel selection is needed. The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7" per the Lightricks/LTX-2.3 model card.

2. Install the LTX-Video, GGUF, and KJNodes custom nodes

cd ComfyUI/custom_nodes

# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

# city96's GGUF loader — required for the quantized transformer + Gemma-GGUF encoder
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

# Kijai's KJNodes — used by the recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt

The unsloth/LTX-2.3-GGUF model card lists city96/ComfyUI-GGUF and kijai/ComfyUI-KJNodes for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler and example workflows.

3. Download a roomier distilled GGUF transformer (the 24 GB advantage)

The distilled checkpoint runs at 8 steps with CFG 1.0 and is the right choice for a consumer card. Where the 16 GB floor recipe tops out at Q4_K_S (13.12 GB) to leave room for the encoder + VAE activations, the 4090's 24 GB lets you go higher for better fidelity — Q5_K_S (15.25 GB) or Q6_K (17.77 GB):

# Q6_K distilled (17.77 GB on disk) — highest practical quant for 24 GB
huggingface-cli download unsloth/LTX-2.3-GGUF \
  distilled/ltx-2.3-22b-distilled-Q6_K.gguf \
  --local-dir ComfyUI/models/unet/

# OR Q5_K_S distilled (15.25 GB) — more headroom for a resident encoder
huggingface-cli download unsloth/LTX-2.3-GGUF \
  distilled/ltx-2.3-22b-distilled-Q5_K_S.gguf \
  --local-dir ComfyUI/models/unet/

Distilled-transformer GGUF file sizes (verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):

Quantunsloth distilled file size4090 fit notes
Q3_K_S9.95 GB16 GB floor's headroom pick
Q4_K_S13.12 GB16 GB floor's standard pick
Q5_K_S15.25 GBcomfortable on 24 GB + resident quantized encoder
Q6_K17.77 GBhighest practical 24 GB quant
Q8_022.76 GBnearly fills the card — encoder must offload
BF1642.04 GBout of scope for 24 GB

There is also a newer distilled-1.1 revision in the same repo (a "different aesthetic experience and improved audio compared to v1.0" per the Lightricks/LTX-2.3 model card); either revision works and this recipe pins the original distilled/. The full ltx-2.3-22b-dev BF16 transformer is 42.04 GB on disk (unsloth/LTX-2.3-GGUF tree) and is out of scope for 24 GB.

4. Download the embeddings connectors and VAE

The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE (LTX-2.3 ships them separately for the GGUF flow):

huggingface-cli download unsloth/LTX-2.3-GGUF \
  text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
  vae/ltx-2.3-22b-distilled_video_vae.safetensors \
  vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
  --local-dir ComfyUI/models/

(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — verified live via the unsloth/LTX-2.3-GGUF tree API.)

5. Download a quantized Gemma 3 12B text encoder

LTX-2.3 uses Gemma 3 12B as its text encoder. The unquantized Gemma "needs ~24-27GB VRAM to operate" per Issue #303 — which, as the 4090 owner above found, does not fit even on 24 GB. So you download a quantized encoder. The GGUF QAT-Q4 encoder is the smallest broadly-supported option:

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  mmproj-BF16.gguf \
  --local-dir ComfyUI/models/text_encoders/

The encoder file is 7.43 GB and the mmproj 0.85 GB (verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API). An FP8 single-file alternative (gemma_3_12B_it_fp8_e4m3fn.safetensors, 13.21 GB at GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn) is documented in Issue #303's workaround comment — see Troubleshooting for when to prefer it.

Running

The 24 GB card gives you two viable strategies — pick based on whether you want the encoder resident (faster) or want maximum transformer headroom.

Strategy A — quantized encoder resident (the 24 GB advantage). With a Q5_K_S/Q6_K transformer (15–18 GB) plus the QAT-Q4 Gemma (~7.4 GB), the budget is tight but workable on 24 GB; reserve a little VRAM so ComfyUI spills only the overflow:

# Reserve headroom so the encoder co-resides cleanly with the transformer
python main.py --listen --reserve-vram 4

--reserve-vram is the official low-VRAM knob from the ComfyUI-LTXVideo README ("Use --reserve-vram ComfyUI parameter"). Keeping the quantized encoder on-GPU lets the GPU process the text-encode pass instead of the CPU — the speed win the 16 GB cards can't get.

Strategy B — stream from RAM (always fits, slower text-encode). This is the exact path the 4090 owner used after finding the unquantized Gemma wouldn't fit (comment):

# Stream all weights from RAM — GPU footprint collapses, text-encode runs on CPU
python main.py --listen --novram

With --novram an RTX 5080 16GB owner reports the GPU footprint drops dramatically: "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter). Treat that 3 GB figure as the 5080 reporter's measurement on the streaming path, not a 4090 benchmark.

Open the browser UI and load one of the example workflows shipped by the Lightricks node:

ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/

Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 3. Wire the Gemma 3 12B encoder through the GGUF text-encoder loader. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.

Recommended distilled settings

ParameterValueSource
Sampler steps8Distilled checkpoint default per Lightricks/LTX-2.3 model card
CFG1.0Same
Resolutionwidth & height divisible by 32Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32."
Frame countdivisible by 8 + 1 (e.g. 65, 97, 121)Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1."

Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution.

Results

  • Speed: Omitted. No RTX 4090 benchmark for LTX-2.3 22B at a fixed configuration has been published, and /check/ltx-video-2-3/rtx-4090 currently has no benchmark data. The only same-model community timings are on a different card (an RTX 5090, 32 GB, faster memory bandwidth — see the 5090 recipe) and cannot transfer down to the 4090 cleanly. Empirical 4090 timings will appear at /check/ltx-video-2-3/rtx-4090 once a community benchmark lands via /contribute.
  • VRAM usage: The min_vram_gb: 20 reflects this recipe's lead path — a roomier resident GGUF transformer (Q6_K, 17.77 GB on disk) plus the VAE-decode activations, with the quantized Gemma encoder either co-resident (Strategy A) or streamed (Strategy B). This is a derived working envelope for the 24 GB card, not a measured peak; a 24 GB card has room to step up the transformer quant beyond the 16 GB floor's Q4_K_S (13.12 GB). What 24 GB does not buy you is the unquantized Gemma encoder on-GPU — it OOMs at "Peak Usage: 29068 MiB" (Issue #303 body, filed on a 16 GB card) and a 4090 owner confirms "No shot at running Gemma" (comment). See /check/ltx-video-2-3/rtx-4090.
  • Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. On 24 GB you can run Q6_K (17.77 GB) for noticeably less quantization loss than the 16 GB floor's Q4_K_S — the main quality lever this card class unlocks. Note the 4090's Ada (sm_89, 40-series) architecture does support FP8 matmul on Kijai's input-scaled FP8 transformers (per the Kijai/LTX2.3_comfy README: set to "run with fp8 matmuls on supported hardware (roughly 40xx and later Nvidia GPUs)"), unlike Ampere — but at 23–25 GB on disk those FP8 files leave no room for the encoder on 24 GB, so GGUF remains the practical lead here.

For the full benchmark data, see /check/ltx-video-2-3/rtx-4090.

Troubleshooting

"No shot at running Gemma" — the unquantized encoder OOMs even on 24 GB

This is the defining 4090 surprise: 24 GB looks like it should fit a 12B encoder, but it does not. The unquantized Gemma 3 12B "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per Issue #303, and a 4090 owner reports "No shot at running Gemma" on their 24 GB card (comment). Fixes, in order of preference:

  1. Use the quantized Gemma from Step 5 (QAT-Q4 GGUF, 7.43 GB) — small enough to co-reside with a Q5/Q6 transformer on 24 GB.
  2. Launch with --reserve-vram 4 (Strategy A) to keep the quantized encoder resident while spilling overflow, or --novram (Strategy B) to stream everything from RAM.
  3. The FP8 single-file gemma_3_12B_it_fp8_e4m3fn.safetensors (13.21 GB) from a workaround comment is an alternative, but at 13.21 GB it leaves little room next to a large transformer on 24 GB — prefer the QAT-Q4 GGUF unless you specifically need the FP8 encoder.

mat1 and mat2 shapes cannot be multiplied after enabling --novram

The same 4090 owner hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a --preview-method latent2rgb flag (Issue #303 comment). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.

Encoder is slow on CPU under --novram

The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass — this is the main reason to prefer Strategy A on a 24 GB card. If you do stay on --novram, a community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.

NVFP4 will not accelerate on the 4090

The official Lightricks/LTX-2.3-nvfp4 quant requires Blackwell sm_120 microscaling hardware. The RTX 4090 is Ada (sm_89), so NVFP4 cannot be accelerated on it — that path belongs to the RTX 5090 recipe. Stick to GGUF (or the FP8-matmul-capable Kijai transformers, VRAM permitting) on the 4090.

FlashAttention on Ada (sm_89)

Unlike Blackwell (sm_120), the RTX 4090's Ada sm_89 architecture is fully covered by prebuilt FlashAttention wheels, so there is no kernel-availability gap to work around — and LTX-2.3's ComfyUI path defaults to PyTorch SDPA regardless. If you compile the optional LTX-Video FP8-matmul kernels, make sure your CUDA toolkit is 12.8+; a mismatched toolkit produces an sm89 assertion failure at the FP8 matmul (Issue #182). The GGUF path in this recipe does not use those kernels.

Audio-video output not synchronized

LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.

common questions
How much VRAM does LTX-2.3 need?

About 20 GB — the minimum this recipe targets.

Which GPUs is LTX-2.3 tested on?

RTX 4090 (24 GB).

How hard is this setup?

Advanced — follow the steps above.