How much VRAM does LTX-2 need?

About 24 GB — the minimum this recipe targets.

How hard is this setup?

Advanced — follow the steps above.

LTX-2 (19B) on RTX 4090: Audio-Video in the 24 GB Gap — FP8 + Quantized Gemma, Below the 32 GB Native Floor

What You'll Build

Generate short, synchronized audio + video clips locally with LTX-2 — Lightricks' 19B-parameter DiT audio-video foundation model — on a 24 GB RTX 4090. Per the Lightricks/LTX-2 model card, LTX-2 is "a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — and the RTX 4090's 24 GB falls below that 32 GB native floor. So this recipe runs the FP8 transformer with weight offloading plus a quantized Gemma 3 12B encoder, the only combination that fits a 24 GB card.

Hardware data: RTX 4090 (24GB VRAM, Ada sm_89) · ltx-2-19b-dev-fp8 + FP8 Gemma 3 12B encoder · See benchmark data

⚠️ 24 GB is "in the gap" — below the 32 GB native fit, above the 16 GB streaming minimum. Two facts define the 4090's tier. (1) LTX-2's text encoder is Gemma 3 12B, and unquantized it "needs ~24-27GB VRAM to operate" — it OOMs at "Peak Usage: 29068 MiB" per ComfyUI-LTXVideo Issue #303, filed against ltx-2-19b-dev-fp8.safetensors — so the full encoder does not fit on a 24 GB card either. A 4090 owner confirms it directly: "So I tried this on my 24 GB VRAM 4090. No shot at running Gemma. But I tried the --novram option, seems like it actually passes the enhanced prompt node." (Issue #303 comment, community reporter). (2) The native FP8 transformer is 25.22 GB on disk (Lightricks/LTX-2 tree) — alone it exceeds 24 GB, so it must be offloaded, not held fully resident. Offloaded FP8 + quantized encoder, not a fully-resident native fit, on a 4090.

Variant pin. This recipe targets LTX-2 19B (canonical repo Lightricks/LTX-2, 19B audio-video DiT). It is NOT for the newer LTX-2.3 22B line (repo Lightricks/LTX-2.3; see the LTX-2.3 RTX 4090 recipe) and NOT for the older LTX-Video 0.9.x 2B/13B family (repo Lightricks/LTX-Video). All three are Lightricks lines with different checkpoints — pick the files from Lightricks/LTX-2.

ℹ️ License. The LTX-2 weights are released under the LTX-2 Community License (license_name: ltx-2-community-license-agreement, license link) — not Apache-2.0. Read the license before any commercial use.

Requirements

Component	Minimum	Tested
GPU	24GB VRAM with offloading	RTX 4090 (24GB, Ada sm_89)
RAM	32GB	64GB recommended (Issue #303 reporters used 32–64GB)
Storage	~30GB	~30GB (FP8 transformer 25.22 GB + FP8 encoder 12.30 GB + VAE + connectors)
Software	ComfyUI (+ optional ComfyUI-LTXVideo)	Python 3.12+, CUDA 12.7+

The full unquantized LTX-2 wants a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. The 24 GB 4090 is below that floor, so you run the FP8 transformer with offloading and a quantized Gemma encoder — see Running for the offload flags.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

On the RTX 4090 (Ada Lovelace, sm_89) the default pip install torch already includes sm_89 kernels in the stable cu124-class wheel — unlike Blackwell (sm_120), no special --index-url / cu128 wheel selection is needed. The LTX-2 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7" per the Lightricks/LTX-2 model card. LTX-2 is built into ComfyUI core, so the base install already recognizes the model.

2. (Optional) Install the LTX-Video custom nodes

The official Lightricks node pack adds the LTX-2 sampler and the example workflows referenced below:

cd ComfyUI/custom_nodes
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

You can also install it from the ComfyUI Manager (search "LTXVideo"). The node pack ships LTX-2 19B example workflows under example_workflows/2.0/ (e.g. LTX-2_T2V_Distilled_wLora.json, LTX-2_I2V_Distilled_wLora.json).

3. Download the FP8 transformer (the 24 GB lead path)

The native BF16 transformer is 40.31 GB and the NVFP4 quant requires Blackwell hardware (see Troubleshooting), so the FP8 transformer is the practical 4090 choice. Download either the dev or the distilled FP8 checkpoint from the canonical repo:

# Distilled FP8 (8 steps, CFG 1.0) — fastest practical path on a consumer card
huggingface-cli download Lightricks/LTX-2 \
  ltx-2-19b-distilled-fp8.safetensors \
  --local-dir ComfyUI/models/diffusion_models/

# OR dev FP8 (full model, more steps, higher fidelity)
huggingface-cli download Lightricks/LTX-2 \
  ltx-2-19b-dev-fp8.safetensors \
  --local-dir ComfyUI/models/diffusion_models/

LTX-2 19B transformer file sizes (verified live via the Lightricks/LTX-2 tree API):

Checkpoint	File size	4090 fit notes
`ltx-2-19b-dev-fp4`	18.62 GB	NVFP4 — needs Blackwell sm_120 to accelerate (see Troubleshooting)
`ltx-2-19b-distilled-fp8`	25.22 GB	lead path — offload required (exceeds 24 GB alone)
`ltx-2-19b-dev-fp8`	25.22 GB	full model, FP8 — offload required
`ltx-2-19b-distilled`	40.31 GB	BF16 distilled — out of scope for 24 GB
`ltx-2-19b-dev`	40.31 GB	BF16 full — out of scope for 24 GB

The ltx-2-19b-distilled checkpoint is the "distilled version of the full model, 8 steps, CFG=1" per the Lightricks/LTX-2 card — the right choice for a consumer card.

4. Download the connectors and VAE

The transformer needs its text-projection connectors and the audio + video VAE (the canonical repo ships these as subfolders):

huggingface-cli download Lightricks/LTX-2 \
  --include "connectors/*" "vae/*" "audio_vae/*" "vocoder/*" \
  --local-dir ComfyUI/models/LTX-2/

(Connectors 2.67 GB, video VAE 2.28 GB, audio VAE 0.10 GB, vocoder 0.10 GB — verified live via the Lightricks/LTX-2 tree API.)

5. Download a quantized Gemma 3 12B text encoder

LTX-2 uses Gemma 3 12B as its text encoder (model_index.json declares Gemma3ForConditionalGeneration). Unquantized it is 22.71 GB on disk (Comfy-Org/ltx-2 tree) and "needs ~24-27GB VRAM to operate" per Issue #303 — so you download a quantized encoder. Two well-supported options:

# Option A: ComfyUI core team's FP8-scaled Gemma (12.30 GB)
huggingface-cli download Comfy-Org/ltx-2 \
  split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors \
  --local-dir ComfyUI/models/

# Option B: GitMylo's fp8_e4m3fn Gemma (12.30 GB), purpose-built for LTX-2
huggingface-cli download GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn \
  gemma_3_12B_it_fp8_e4m3fn.safetensors \
  --local-dir ComfyUI/models/text_encoders/

The Comfy-Org/ltx-2 repackage publishes the encoder at three sizes — gemma_3_12B_it_fp8_scaled.safetensors (12.30 GB), gemma_3_12B_it_fp4_mixed.safetensors (8.80 GB), and the full gemma_3_12B_it.safetensors (22.71 GB). The GitMylo FP8 file is an "fp8_e4m3fn conversion of the text encoder ... which is used for [LTX-2]" and its README says "simply select the text encoder from here instead" of the fp16 Gemma (GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn). All three encoder URLs HEAD-check 200/302.

Running

The 24 GB card cannot hold the FP8 transformer (25.22 GB) plus the quantized encoder fully resident, so weight offloading is the rule, not the exception. Two viable launch strategies:

Strategy A — offload with --reserve-vram (recommended). ComfyUI streams the overflow to system RAM while keeping the active layers on-GPU. --reserve-vram is the official low-VRAM knob from the ComfyUI-LTXVideo README ("Use --reserve-vram ComfyUI parameter"):

python main.py --listen --reserve-vram 5

The README also documents dedicated low-VRAM loader nodes (low_vram_loaders.py) that "perform the model offloading such that generation fits in 32 GB VRAM" (ComfyUI-LTXVideo README) — on the 24 GB 4090 you combine those loaders with --reserve-vram to get under the line. A community 4090/5080 thread reports the same offload flags working: "main.py --windows-standalone-build --lowvram --disable-smart-memory --reserve-vram 1.0" (Issue #308 comment, community reporter), filed under an Issue whose body confirms the bare error "This error means you ran out of memory on your GPU."

Strategy B — stream everything from RAM with --novram (always fits, slower text-encode). This is the path the 4090 owner fell back to (Issue #303 comment):

python main.py --listen --novram

With --novram a community RTX 5080 16GB owner reports the GPU footprint collapses: "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter). Treat that 3 GB figure as the 5080 reporter's measurement on the streaming path, not a 4090 benchmark — the 4090's advantage is keeping more on-GPU via Strategy A for faster text-encode.

Open the browser UI, load an LTX-2 19B example workflow (ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.0/), point the diffusion-model loader at the FP8 transformer from Step 3, and select the quantized Gemma encoder from Step 5 in the text-encoder loader. The output (synchronized audio + video) lands in ComfyUI/output/.

Recommended distilled settings

Parameter	Value	Source
Sampler steps	8	Distilled checkpoint default per Lightricks/LTX-2 card ("8 steps, CFG=1")
CFG	1.0	Same
Resolution	width & height divisible by 32	Lightricks/LTX-2 card: "Width & height settings must be divisible by 32."
Frame count	divisible by 8 + 1 (e.g. 65, 97, 121)	Lightricks/LTX-2 card: "Frame count must be divisible by 8 + 1."

Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution.

Results

Speed: Omitted. No RTX 4090 benchmark for LTX-2 19B at a fixed configuration has been published, and /check/ltx-2/rtx-4090 currently has no benchmark data. The only same-model community timing is a "3 GB VRAM to make a 720p video" footprint on a different card (an RTX 5080 16GB under --novram, Issue #303 comment) — a VRAM observation, not a 4090 throughput number, and it cannot transfer to the 4090 cleanly. Empirical 4090 timings will appear at /check/ltx-2/rtx-4090 once a community benchmark lands via /contribute.
VRAM usage: The min_vram_gb: 24 reflects this recipe's lead path — the FP8 transformer (25.22 GB on disk) offloaded with --reserve-vram plus a quantized Gemma encoder (12.30 GB), targeting the 24 GB 4090's "in the gap" position below the 32 GB native floor. This is a derived working envelope for the 24 GB card, not a measured peak. What 24 GB does not buy you is the unquantized Gemma encoder on-GPU — it OOMs at "Peak Usage: 29068 MiB" (Issue #303 body) and a 4090 owner confirms "No shot at running Gemma" (comment). See /check/ltx-2/rtx-4090.
Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. The FP8 quant is the main practical lever on 24 GB; the BF16 transformer (40.31 GB) is out of scope. The 4090's Ada (sm_89) architecture supports FP8 matmul, so the FP8 path runs natively — but NVFP4 (the smaller ltx-2-19b-dev-fp4, 18.62 GB) cannot be accelerated on Ada (see Troubleshooting).

For the full benchmark data, see /check/ltx-2/rtx-4090.

Troubleshooting

"No shot at running Gemma" — the unquantized encoder OOMs even on 24 GB

This is the defining 4090 surprise: 24 GB looks like it should fit a 12B encoder, but it does not. The unquantized Gemma 3 12B "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per Issue #303, and a 4090 owner reports "No shot at running Gemma" on their 24 GB card (comment). Fixes, in order of preference:

Use a quantized Gemma from Step 5 (FP8-scaled or fp8_e4m3fn, 12.30 GB) — small enough to co-load with the offloaded transformer.
Launch with --reserve-vram 5 (Strategy A) to offload the overflow, or --novram (Strategy B) to stream everything from RAM.

Out of memory / "Allocation on device" on the 4090

Issue #308 ("LTX-2 ERROR WITH A RTX 4090") documents this with the bare message "This error means you ran out of memory on your GPU." (Issue #308 body). The community workaround in that thread is the offload flag combination "main.py --windows-standalone-build --lowvram --disable-smart-memory --reserve-vram 1.0" (comment). Also lower the resolution/frame count and confirm batch size is 1.

`mat1 and mat2 shapes cannot be multiplied` after enabling `--novram`

The same 4090 owner hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a --preview-method latent2rgb flag (Issue #303 comment). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.

Encoder is slow on CPU under `--novram`

The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass — this is the main reason to prefer Strategy A on a 24 GB card. If you stay on --novram, a community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.

NVFP4 will not accelerate on the 4090

The ltx-2-19b-dev-fp4 checkpoint (18.62 GB, the "full model in nvfp4 quantization" per the Lightricks/LTX-2 card) is the smallest transformer and looks attractive for 24 GB — but NVFP4 requires Blackwell sm_120 microscaling hardware, which the RTX 4090 (Ada, sm_89) lacks, so it cannot be accelerated on this card. Stick to the FP8 transformer on the 4090.

Audio-video output not synchronized

LTX-2 produces synchronized video + audio in a single model per the Lightricks/LTX-2 card. If you only get silent video, confirm you loaded the audio VAE + vocoder from Step 4 and an audio-enabled example workflow. Report problems via the submission form.