What You'll Build
Generate short, synchronized audio + video clips locally with LTX Video 2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 16 GB RTX 5080. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — well above the 5080's 16 GB — so this recipe runs the distilled GGUF transformer with the heavy Gemma 3 12B text encoder offloaded to system RAM, the path RTX 5080 owners have proven works in ComfyUI-LTXVideo Issue #303.
Hardware data: RTX 5080 (16GB VRAM) · Q4_K_S distilled GGUF + CPU-offloaded Gemma 3 12B · See benchmark data
⚠️ The 16 GB envelope is tight and the text encoder is the binding constraint, not the transformer. Gemma 3 12B (LTX-2.3's text encoder) needs roughly 24-27 GB resident on its own per the Issue #303 report; on a 16 GB card it MUST run on the CPU (or be streamed from RAM). The transformer fits as a GGUF quant; the encoder is what OOMs you if you leave it on the GPU.
Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT for the older LTX-2 19B line (repo Lightricks/LTX-2) nor the LTX-Video 0.9.x family. Several Issue #303 reports below were filed against LTX-2 19B; they are cited here only for the shared Gemma 3 12B encoder failure mode, which is identical across both because both use the same 12B encoder.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM (Blackwell, Ada, or Ampere) | RTX 5080 (16GB) |
| RAM | 32GB | 64GB (Issue #303 RTX 5080 reports) |
| Storage | ~20GB | ~20GB (Q4_K_S transformer + Gemma encoder + VAE) |
| Software | ComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes | Python 3.10+, CUDA 12.7+ |
The full unquantized LTX-2.3 requires a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. On 16 GB you replace the BF16 transformer with a distilled GGUF quant and keep the Gemma encoder off the GPU — the steps below.
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
On Blackwell (sm_120) the default pip install torch ships sm_120 kernels via the cu128 wheel on PyTorch 2.7+. The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7" per the Lightricks/LTX-2.3 model card.
2. Install the LTX-Video, GGUF, and KJNodes custom nodes
cd ComfyUI/custom_nodes
# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt
# city96's GGUF loader — required for the quantized transformer
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt
# Kijai's KJNodes — used by the recommended offload workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
This three-node trio is the one listed by the unsloth/LTX-2.3-GGUF model card for the GGUF workflow.
3. Download a distilled GGUF transformer that fits 16 GB
The distilled checkpoint runs at 8 steps with CFG 1.0 and is the right choice for a tight card. On 16 GB you want a transformer small enough to leave room for VAE-decode activations — Q4_K_S (12.22 GB) or, for more comfortable headroom, Q3_K_S (9.26 GB):
# Q4_K_S distilled (12.22 GB on disk) — tightest tier with full Q4 quality
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q4_K_S.gguf \
--local-dir ComfyUI/models/unet/
# OR Q3_K_S distilled (9.26 GB) — more headroom, small quality trade
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q3_K_S.gguf \
--local-dir ComfyUI/models/unet/
Distilled-transformer GGUF file sizes (verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):
| Quant | unsloth distilled file size |
|---|---|
| Q2_K | 7.71 GB |
| Q3_K_S | 9.26 GB |
| Q3_K_M | 10.03 GB |
| Q4_K_S | 12.22 GB |
| Q4_K_M | 13.34 GB |
| Q5_K_S | 14.20 GB |
| Q6_K | 16.55 GB |
| Q8_0 | 21.19 GB |
QuantStack/LTX-2.3-GGUF ships an equivalent ladder but its Q4_K_S is larger (15.56 GB) — too tight once the VAE and activations load, so prefer the unsloth distilled/ files on a 16 GB card. There is also a newer distilled-1.1 revision in both repos (its Q4_K_S is 12.07 GB) — either revision works; this recipe pins the original distilled/.
4. Download the embeddings connectors and VAE
The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE (LTX-2.3 ships them separately for the GGUF flow):
huggingface-cli download unsloth/LTX-2.3-GGUF \
text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
vae/ltx-2.3-22b-distilled_video_vae.safetensors \
vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
--local-dir ComfyUI/models/
(Connectors 2.15 GB, video VAE 1.35 GB, audio VAE 0.34 GB — verified live via the unsloth/LTX-2.3-GGUF tree API.)
5. Download a quantized Gemma 3 12B text encoder
LTX-2.3 uses Gemma 3 12B as its text encoder. The default gemma-3-12b-it-qat-q4_0-unquantized weights need ~24-27 GB to operate per Issue #303 — far over the card — so download a quantized encoder and keep it off the GPU (Step 6). The GGUF QAT-Q4 encoder is the smallest broadly-supported option:
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
mmproj-BF16.gguf \
--local-dir ComfyUI/models/text_encoders/
The encoder file is 6.92 GB and the mmproj 0.80 GB (verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API). An FP8 single-file alternative (gemma_3_12B_it_fp8_e4m3fn.safetensors, 12.30 GB) is documented in Issue #303's workaround comment — see Troubleshooting for when to prefer it.
Running
Launch ComfyUI with a VRAM mode that forces the heavy encoder off the GPU. The two RTX-5080-proven options from Issue #303 are --novram (stream all weights from RAM) and --reserve-vram 10 (reserve headroom so the encoder spills to CPU):
# Option A — stream weights from RAM (RTX 5080 owner: "only costs my gpu 3 GB VRAM to make a 720p video")
python main.py --listen --novram
# Option B — reserve VRAM so the encoder offloads (RTX 5080 owner: "these settings work for RTX 5080")
python main.py --listen --reserve-vram 10
Open the browser UI and load one of the example workflows shipped by the Lightricks node:
ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/
Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 3. Wire the Gemma 3 12B encoder through the GGUF text-encoder loader. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.
Recommended distilled settings
| Parameter | Value | Source |
|---|---|---|
| Sampler steps | 8 | Distilled checkpoint default per Lightricks/LTX-2.3 model card |
| CFG | 1.0 | Same |
| Resolution | width × height divisible by 32 | Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32." |
| Frame count | (multiple of 8) + 1 (e.g. 65, 97, 121) | Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1." |
Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution.
Results
- Speed: Omitted. No clean RTX 5080 benchmark for LTX-2.3 22B at a fixed configuration has been published — the RTX 5080 owners in Issue #303 describe the CPU-offloaded encoder qualitatively ("slow when processing on CPU but the inference part works incredibly fast") without a comparable timing. Re-anchoring from the 32 GB RTX 5090 sibling is not valid either: the 5080's memory bandwidth (~960 GB/s) and compute differ enough that forward-extrapolating a 5090 number would mislead. Empirical 5080 timings will appear at /check/ltx-video-2-3/rtx-5080 once a community benchmark lands via /contribute.
- VRAM usage: With the Gemma encoder forced to CPU via
--novram, an RTX 5080 16GB owner reports the GPU footprint drops to roughly 3 GB during 720p generation (Issue #303 comment) — the distilled GGUF transformer streams in as needed. Leave the encoder on the GPU and the same hardware OOMs at "Peak Usage: 29068 MiB" (Issue #303 body, filed on RTX 5080 16GB). The recipe'smin_vram_gb: 16reflects the standard ComfyUI offload path (transformer resident + encoder on CPU), which fills the 16 GB card; see /check/ltx-video-2-3/rtx-5080. - Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. Q4_K_S keeps full Q4 quality; Q3_K_S frees ~3 GB of headroom at a small quality cost. The full
ltx-2.3-22b-devBF16 transformer is 42 GB on disk (unsloth/LTX-2.3-GGUF tree) and is out of scope for 16 GB.
For the full benchmark data, see /check/ltx-video-2-3/rtx-5080.
Troubleshooting
Out of memory loading the Gemma 3 12B text encoder
This is the dominant 16 GB failure mode. The unquantized Gemma 3 12B encoder needs ~24-27 GB and OOMs at "Peak Usage: 29068 MiB" on RTX 5080 16GB per Issue #303 (filed on LTX-2 19B, but the encoder is the same Gemma 3 12B used by LTX-2.3, so the failure transfers). Fixes, in order of preference:
- Launch with
--novramso the encoder runs on CPU. An RTX 5080 owner reports this drops GPU use to ~3 GB (comment). - Or
--reserve-vram 10, which another RTX 5080 owner confirms (comment). - Use the GGUF QAT-Q4 Gemma from Step 5, or the FP8 single-file
gemma_3_12B_it_fp8_e4m3fn.safetensorsdocumented in Jackson3195's workaround comment.
Encoder is painfully slow on CPU
The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass. A community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.
mat1 and mat2 shapes cannot be multiplied after enabling --novram
A user hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a --preview-method latent2rgb flag (Issue #303 comment). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.
FlashAttention-2 on Blackwell (sm_120)
LTX-2.3's codebase defaults to PyTorch SDPA rather than raw FlashAttention-2, so the FA2 sm_120 wheel gap rarely blocks here. If you do force FA2, sm_120 kernel coverage is tracked at Dao-AILab/flash-attention#2168 — fall back to SDPA on Blackwell until it lands.
Audio-video output not synchronized
LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.