What You'll Build
Generate short, synchronized audio + video clips locally with LTX Video 2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 16 GB RTX 4080 SUPER. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — twice the 4080 SUPER's 16 GB — so this recipe runs the distilled GGUF transformer with the heavy Gemma 3 12B text encoder offloaded to system RAM. The RTX 4080 SUPER is called out by name as a target of this exact 16 GB constraint in ComfyUI-LTXVideo Issue #303 — a community feature request whose author lists "RTX 4080, 4080 Super, 5080" among the 16 GB GPUs that the smaller-encoder change would unlock.
Hardware data: RTX 4080 SUPER (16GB VRAM) · Q4_K_S distilled GGUF + CPU-offloaded Gemma 3 12B · See benchmark data
⚠️ The 16 GB envelope is tight and the text encoder is the binding constraint, not the transformer. Gemma 3 12B (LTX-2.3's text encoder) "needs ~24-27GB VRAM to operate" on its own per the community Issue #303 report; on a 16 GB card it MUST run on the CPU (or be streamed from RAM). The transformer fits as a GGUF quant; the encoder is what OOMs you if you leave it on the GPU.
Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT for the older LTX-2 19B line (repo Lightricks/LTX-2) nor the LTX-Video 0.9.x family. The Issue #303 reports below were filed against LTX-2 19B; they are cited here only for the shared Gemma 3 12B encoder failure mode, which is identical across both because both use the same 12B encoder.
ℹ️ License. The LTX-2.3 weights are released under the
ltx-2-community-license-agreement(the model card'slicense:isotherwithlicense_name: ltx-2-community-license-agreement, license link) — not Apache-2.0. Read the license before any commercial use.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM (Ada, Blackwell, or Ampere) | RTX 4080 SUPER (16GB, Ada sm_89, AD103) |
| RAM | 32GB | 64GB recommended (Issue #303 — 16GB-card reporters used 32–64GB) |
| Storage | ~25GB | ~25GB (Q4_K_S transformer + connectors + Gemma encoder + VAE) |
| Software | ComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes | Python 3.10+, CUDA 12.7+ |
The full unquantized LTX-2.3 requires a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. On 16 GB you replace the BF16 transformer with a distilled GGUF quant and keep the Gemma encoder off the GPU — the steps below.
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
On the RTX 4080 SUPER (Ada Lovelace, AD103, sm_89) the default pip install torch already includes sm_89 kernels in the stable cu124-class wheel — unlike Blackwell (sm_120), no special --index-url / cu128 wheel selection is needed. The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7" per the Lightricks/LTX-2.3 model card.
2. Install the LTX-Video, GGUF, and KJNodes custom nodes
cd ComfyUI/custom_nodes
# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt
# city96's GGUF loader — required for the quantized transformer
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt
# Kijai's KJNodes — used by the recommended offload workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
The unsloth/LTX-2.3-GGUF model card provides the quantized transformer for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler and example workflows, and city96/ComfyUI-GGUF + kijai/ComfyUI-KJNodes provide the GGUF loader and offload nodes.
3. Download a distilled GGUF transformer that fits 16 GB
The distilled checkpoint runs at 8 steps with CFG 1.0 and is the right choice for a tight card. On 16 GB you want a transformer small enough to leave room for the connectors and VAE-decode activations — Q4_K_S (13.12 GB) or, for more comfortable headroom, Q3_K_S (9.95 GB):
# Q4_K_S distilled (13.12 GB on disk) — tightest standard Q4 tier with full Q4 quality
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q4_K_S.gguf \
--local-dir ComfyUI/models/unet/
# OR Q3_K_S distilled (9.95 GB) — more headroom, small quality trade
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q3_K_S.gguf \
--local-dir ComfyUI/models/unet/
Distilled-transformer GGUF file sizes (decimal GB, as shown on the unsloth/LTX-2.3-GGUF tree distilled/ listing — standard quants; the repo also ships an Unsloth-Dynamic UD- set at slightly larger sizes):
| Quant | unsloth distilled file size |
|---|---|
| Q2_K | 8.28 GB |
| Q3_K_S | 9.95 GB |
| Q3_K_M | 10.77 GB |
| Q4_K_S | 13.12 GB |
| Q4_K_M | 14.33 GB |
| Q5_K_S | 15.25 GB |
| Q6_K | 17.77 GB |
| Q8_0 | 22.76 GB |
If you prefer the Unsloth-Dynamic tiers, the matching files are distilled/ltx-2.3-22b-distilled-UD-Q4_K_S.gguf (14.09 GB) and distilled/ltx-2.3-22b-distilled-UD-Q3_K_S.gguf (11.20 GB) in the same folder — also 16 GB-safe. There is also a newer distilled-1.1 revision in the repo ("A different aesthetic experience and improved audio compared to v1.0" per the Lightricks/LTX-2.3 model card); either revision works and this recipe pins the original distilled/. The full ltx-2.3-22b-dev BF16 transformer is 42.04 GB on disk (unsloth/LTX-2.3-GGUF tree) and is out of scope for 16 GB.
4. Download the embeddings connectors and VAE
The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE (LTX-2.3 ships them separately for the GGUF flow):
huggingface-cli download unsloth/LTX-2.3-GGUF \
text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
vae/ltx-2.3-22b-distilled_video_vae.safetensors \
vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
--local-dir ComfyUI/models/
(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — decimal GB per the unsloth/LTX-2.3-GGUF tree.)
5. Download a quantized Gemma 3 12B text encoder
LTX-2.3 uses Gemma 3 12B as its text encoder. The unquantized Gemma encoder "needs ~24-27GB VRAM to operate" per Issue #303 — far over the card — so download a quantized encoder and keep it off the GPU (Step 6). The GGUF QAT-Q4 encoder is the smallest broadly-supported option:
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
mmproj-BF16.gguf \
--local-dir ComfyUI/models/text_encoders/
The encoder file is 7.43 GB and the mmproj 0.85 GB (decimal GB per the unsloth/gemma-3-12b-it-qat-GGUF tree). An FP8 single-file alternative (gemma_3_12B_it_fp8_e4m3fn.safetensors) is documented in Issue #303's workaround comment — see Troubleshooting for when to prefer it.
Running
Launch ComfyUI with a VRAM mode that forces the heavy encoder off the GPU. The two 16 GB-card-proven options from Issue #303 are --novram (stream all weights from RAM) and --reserve-vram 10 (reserve headroom so the encoder spills to CPU):
# Option A — stream weights from RAM
python main.py --listen --novram
# Option B — reserve VRAM so the encoder offloads
python main.py --listen --reserve-vram 10
With --novram an RTX 5080 16GB owner — same 16 GB tier as the 4080 SUPER — reports the GPU footprint drops dramatically: "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter on RTX 5080 16GB). For --reserve-vram 10, another RTX 5080 owner confirms "these settings work for RTX 5080" (Issue #303 comment). The 4080 SUPER sits in the same 16 GB envelope, so the same offload discipline applies; treat the specific 3 GB figure as the 5080 reporter's measurement, not a 4080 SUPER benchmark.
Open the browser UI and load one of the example workflows shipped by the Lightricks node:
ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/
Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 3. Wire the Gemma 3 12B encoder through the GGUF text-encoder loader. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.
Recommended distilled settings
| Parameter | Value | Source |
|---|---|---|
| Sampler steps | 8 | Distilled checkpoint default per Lightricks/LTX-2.3 model card |
| CFG | 1.0 | Same |
| Resolution | width & height divisible by 32 | Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32." |
| Frame count | divisible by 8 + 1 (e.g. 65, 97, 121) | Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1." |
Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution.
Results
- Speed: Omitted. No RTX 4080 SUPER benchmark for LTX-2.3 22B at a fixed configuration has been published, and /check/ltx-video-2-3/rtx-4080-super currently returns
verdict: unknownwith no benchmark data. The closely-matched RTX 4080 (same Ada AD103, same 16 GB GDDR6X tier) has no published LTX-2.3 figure either, so there is no close-sibling number to cite even as a lower bound. Empirical 4080 SUPER timings will appear at /check/ltx-video-2-3/rtx-4080-super once a community benchmark lands via /contribute. - VRAM usage: With the Gemma encoder forced to CPU, a 16 GB card runs the distilled GGUF transformer resident while streaming the encoder from RAM. Leave the encoder on the GPU and a 16 GB card OOMs at "Peak Usage: 29068 MiB" (Issue #303 body, filed on an RTX 5080 16GB). The recipe's
min_vram_gb: 16reflects the standard ComfyUI offload path (Q4_K_S transformer 13.12 GB on disk + connectors + VAE-decode activations + encoder on CPU), which fills the 16 GB card; see /check/ltx-video-2-3/rtx-4080-super. - Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. Q4_K_S keeps full Q4 quality; Q3_K_S frees ~3 GB of headroom (9.95 GB vs 13.12 GB on disk) at a small quality cost.
For the full benchmark data, see /check/ltx-video-2-3/rtx-4080-super.
Troubleshooting
Out of memory loading the Gemma 3 12B text encoder
This is the dominant 16 GB failure mode, and the RTX 4080 SUPER is named explicitly among the 16 GB cards in Issue #303: the unquantized Gemma 3 12B encoder "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" on a 16 GB card (the report was filed on LTX-2 19B, but the encoder is the same Gemma 3 12B used by LTX-2.3, so the failure transfers). Fixes, in order of preference:
- Launch with
--novramso the encoder runs on CPU. A 16 GB-card owner reports this drops GPU use to ~3 GB (comment). - Or
--reserve-vram 10, which an RTX 5080 owner confirms (comment). - Use the GGUF QAT-Q4 Gemma from Step 5, or the FP8 single-file
gemma_3_12B_it_fp8_e4m3fn.safetensorsdocumented in a workaround comment.
Encoder is painfully slow on CPU
The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass. A community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.
mat1 and mat2 shapes cannot be multiplied after enabling --novram
A user hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a --preview-method latent2rgb flag (Issue #303 comment). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.
FlashAttention on Ada (sm_89)
Unlike Blackwell (sm_120), the RTX 4080 SUPER's Ada sm_89 architecture is fully covered by prebuilt FlashAttention wheels, so there is no kernel-availability gap to work around — and LTX-2.3's ComfyUI path defaults to PyTorch SDPA regardless. If you instead compile the optional LTX-Video Q8 FP8-matmul kernels, make sure your CUDA toolkit is 12.8+; a mismatched toolkit produces an sm89 assertion failure at the FP8 matmul (Issue #182, a community report filed on a 4080). The GGUF path in this recipe does not use those kernels.
Audio-video output not synchronized
LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.