How much VRAM does LTX-2.3 need?

About 16 GB — the minimum this recipe targets.

How hard is this setup?

Advanced — follow the steps above.

LTX-2.3 on RTX 5060 Ti: GGUF Video Generation in ComfyUI

What You'll Build

Generate short text-to-video and image-to-video clips locally with LTX-2.3 — a 22B-parameter DiT video model from Lightricks — on a 16GB consumer GPU. The full-precision model demands 32GB+ VRAM, so this recipe runs the Q4_K_S GGUF quantization of the distilled checkpoint together with a 4-bit QAT Gemma 3 text encoder.

Hardware data: RTX 5060 Ti (16GB VRAM) · Q4_K_S GGUF distilled + Gemma 3 12B QAT-Q4 · See benchmark data

⚠️ Known issue: The full LTX-2.3-22B with the unquantized Gemma 3 12B text encoder does NOT fit in 16GB VRAM. A user with an RTX 5080 16GB reported OOM at "Peak Usage: 29068 MiB" even with FP8 quantization (Lightricks/ComfyUI-LTXVideo#303). Stick to GGUF + quantized Gemma.

Requirements

Component	Minimum	Tested
GPU	16GB VRAM (Ampere or newer)	RTX 5060 Ti (16GB)
RAM	32GB	32GB
Storage	~30GB	~30GB (model + encoder + VAE)
Software	ComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes	Python 3.10+, CUDA 12.7+

The full unquantized LTX-2.3 needs 32GB+ VRAM per the official ComfyUI-LTXVideo README — running on 16GB requires the GGUF path below.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Install the LTX-Video, GGUF, and KJNodes custom nodes

cd ComfyUI/custom_nodes

# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

# city96's GGUF loader — required for the quantized UNet
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

# Kijai's KJNodes — used by recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt

Source: unsloth/LTX-2.3-GGUF model card lists this exact custom-node trio for the GGUF workflow.

3. Download the Q4_K_S distilled GGUF weights

# UNet — Q4_K_S distilled (~13 GB; sweet spot for 16GB VRAM)
huggingface-cli download QuantStack/LTX-2.3-GGUF \
  LTX-2.3-distilled/LTX-2.3-distilled-Q4_K_S.gguf \
  --local-dir ComfyUI/models/unet/

# Or the Unsloth dynamic-quant variant
huggingface-cli download unsloth/LTX-2.3-GGUF \
  LTX-2.3-distilled-UD-Q4_K_S.gguf \
  --local-dir ComfyUI/models/unet/

Quant-tier file-size reference (from QuantStack/LTX-2.3-GGUF):

Quant	File size
Q3_K_S	14 GB
Q4_K_S	16.7 GB
Q4_K_M	17.8 GB
Q5_K_S	18.5 GB
Q8_0	25.5 GB

The Unsloth repo ships a more aggressively packed Q4_K_S at 13 GB (unsloth/LTX-2.3-GGUF) — try Unsloth first on 16GB.

4. Download the quantized Gemma 3 12B text encoder

The standard gemma_3_12B_it.safetensors 12B-parameter encoder is too large to coexist with the 22B LTX-2.3 weights on a 16GB card — Lightricks/ComfyUI-LTXVideo#303 documents a 29068 MiB OOM peak on RTX 5080 16GB with the unquantized pipeline. Use the QAT-Q4 GGUF instead:

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  --local-dir ComfyUI/models/text_encoders/

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  mmproj-BF16.gguf \
  --local-dir ComfyUI/models/text_encoders/

Both files are loaded by ComfyUI-GGUF's Gemma encoder node. The QAT-Q4 quantization significantly reduces the encoder's footprint vs the unquantized 12B file; the exact peak depends on the workflow but the 14926 MiB total-peak datapoint cited in Results is on a 16GB card running this same setup.

5. Download the VAE and the spatial upscaler

The Lightricks/LTX-2.3 repo bundles the VAE inside the 22B .safetensors checkpoints (verified via HF Files tab — no *vae*.safetensors file exists standalone). For the GGUF-only flow we use here, pull the VAE from Kijai's community mirror, which exposes it as a standalone bf16 file (architecture: ltxv shared across the LTX family):

# VAE — community mirror (Lightricks does not ship a standalone VAE file)
huggingface-cli download Kijai/LTXV2_comfy \
  VAE/LTX2_video_vae_bf16.safetensors \
  --local-dir ComfyUI/models/vae/

# Latent spatial upscaler — pick whichever upscale factor the canonical workflow uses
huggingface-cli download Lightricks/LTX-2.3 \
  ltx-2.3-spatial-upscaler-x2-1.1.safetensors \
  --local-dir ComfyUI/models/latent_upscale_models/

File listing confirmed at Lightricks/LTX-2.3 Files and Kijai/LTXV2_comfy/VAE.

Running

Launch ComfyUI:

python main.py --listen

Open the browser UI, then load one of the example workflows shipped by the Lightricks node:

ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/

Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF, and point the text encoder at the GGUF Gemma 3 loader.

Recommended distilled settings

Parameter	Value	Source
Sampler steps	8	Distilled checkpoint default per Lightricks/LTX-2.3 card
CFG	1.0	Same
Resolution	width × height must be divisible by 32	Lightricks/LTX-2.3 card
Frame count	(multiple of 8) + 1 (e.g. 65, 97, 121)	Same

Start small (e.g. 480×832, 65 frames) and scale up only if peak VRAM stays well below 16GB.

Results

Speed: Omitted — no published benchmark on RTX 5060 Ti 16GB at time of writing. For order-of-magnitude context, a community user reported "485 sec for 5 sec 1280×720" on RTX 3050 6GB with Q4 GGUF and CPU offload (Kijai/LTXV2_comfy discussion #7). Empirical 5060 Ti data will appear at /check/ltx-video-2-3/rtx-5060-ti once a benchmark report lands.
VRAM usage: A 16GB ComfyUI user running LTX-2 distilled reported peak 14926 MiB during sampling (Comfy-Org/ComfyUI#11726). Q4_K_S of LTX-2.3 distilled has a similar weight footprint; the 14926 MiB datapoint is the closest cited consumer-GPU peak.
Quality notes: The distilled checkpoint runs at 8 steps with CFG=1 and trades fine motion detail for speed. The LTX-2.3-22b-dev (full) checkpoint produces higher quality but only fits in 16GB at Q3_K_S — quality regressions are noticeable at Q3 and below per the file-size table above.

For up-to-date benchmark data on this pair, see /check/ltx-video-2-3/rtx-5060-ti.

Troubleshooting

OOM when loading the text encoder

The default gemma_3_12B_it.safetensors 12B-parameter encoder will OOM on 16GB cards when loaded alongside the 22B LTX-2.3 weights — the unquantized pipeline peaks at 29068 MiB per Lightricks/ComfyUI-LTXVideo#303. This is the most common 16GB failure mode. Replace with gemma-3-12b-it-qat-UD-Q4_K_XL.gguf from Unsloth (Step 4 above).

OOM during sampling on the full `LTX-2.3-22b-dev` checkpoint

The full (non-distilled) 22B model + 12B encoder hits ~29 GB peak per the same issue. On 16GB:

Use the distilled checkpoint (LTX-2.3-distilled-Q4_K_S.gguf)
Drop resolution to 480×832 or 512×768
Limit frame count to 65 or 97

Gemma GGUF loader fails or outputs gibberish

The Gemma 3 GGUF loader in ComfyUI required PRs #399 and #402 to be merged in ComfyUI-GGUF at the time of writing (Kijai/LTXV2_comfy discussion #7). Pull the latest city96/ComfyUI-GGUF main — both PRs are now merged.

Slow generation

If the encoder is loaded fully on the GPU each generation, VRAM thrashing hurts wall time — keep Gemma offloaded with the KJNodes model-offload nodes. The only cited consumer-GPU datapoint at time of writing is the RTX 3050 6GB run linked above (485s for a 5-second 1280×720 clip with Q4 GGUF + CPU encoder) — wall time on the 5060 Ti will land at /check/ when benchmarks arrive.

Audio-video output not synchronized

LTX-2.3 generates synchronized video + audio in a single model per the Lightricks/LTX-2.3 card ("a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model"). The dedicated audio workflow lives at example_workflows/2.3/ in ComfyUI-LTXVideo. The non-audio workflows produce silent video — make sure you loaded the audio-enabled workflow file if audio is needed.