What You'll Build
Generate short, synchronized audio + video clips locally with LTX-2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 16 GB RTX 4060 Ti. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — and even the README's own low-VRAM loaders only promise that "generation fits in 32 GB VRAM". The 4060 Ti's 16 GB is half that floor, so this recipe runs the distilled GGUF transformer at a 16 GB-fitting quant and streams the quantized Gemma 3 12B encoder from system RAM rather than holding it on the GPU.
Hardware data: RTX 4060 Ti 16GB (Ada sm_89) · distilled Q4_K_S/Q3_K_S GGUF + RAM-streamed quantized Gemma · works verdict · See benchmark data
⚠️ 16 GB cannot hold the Gemma encoder on-GPU — this is the binding constraint, not the transformer. The unquantized Gemma 3 12B text encoder "needs ~24-27GB VRAM to operate" and makes LTX-2 "unusable on consumer GPUs with 16GB VRAM (RTX 5080, RTX 4080, etc.), even with FP8 models and all optimizations applied" — it OOMs at "Peak Usage: 29068 MiB" per ComfyUI-LTXVideo Issue #303, which was filed on a 16 GB RTX 5080 (same VRAM tier as the 4060 Ti). The working answer on 16 GB is to stream weights from RAM with
--novram: a 16 GB-card owner reports "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter on an RTX 5080 16GB).
Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT for the older LTX-2 19B line (repo Lightricks/LTX-2) and NOT for the LTX-Video 0.9.x 2B/13B family (repo Lightricks/LTX-Video). Several Issue #303 reports cited below were filed against the LTX-2 19B FP8 path; they are cited only for the shared Gemma 3 12B encoder failure mode, which is identical across both lines because they use the same 12B encoder.
ℹ️ License. The LTX-2.3 weights are released under the
ltx-2-community-license-agreement(the model card'slicense:isotherwithlicense_name: ltx-2-community-license-agreement) — not Apache-2.0. Read the license before any commercial use.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM (distilled GGUF + streamed encoder) | RTX 4060 Ti (16GB, Ada sm_89) |
| RAM | 32GB | 64GB recommended (Issue #303 — the 16 GB-card --novram reporter ran 64GB RAM) |
| Storage | ~25GB | ~25GB (Q4_K_S transformer + connectors + quantized Gemma encoder + VAE) |
| Software | ComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes | Python 3.12+, CUDA 12.7+ |
The full unquantized LTX-2.3 requires a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. At 16 GB the 4060 Ti runs a distilled GGUF quant of the transformer (Q4_K_S, 13.12 GB on disk, or the smaller Q3_K_S at 9.95 GB for more headroom) and the Gemma encoder is never resident on the GPU — it streams from RAM under --novram. Plan on 64 GB of system RAM so the streamed encoder + transformer weights have somewhere to live.
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
On the RTX 4060 Ti (Ada Lovelace, sm_89) the default pip install torch already includes sm_89 kernels in the stable cu124-class wheel — unlike Blackwell (sm_120), no special --index-url / cu128 wheel selection is needed. The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7." per the Lightricks/LTX-2.3 model card.
2. Install the LTX-Video, GGUF, and KJNodes custom nodes
cd ComfyUI/custom_nodes
# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt
# city96's GGUF loader — required for the quantized transformer + Gemma-GGUF encoder
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt
# Kijai's KJNodes — used by the recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
The unsloth/LTX-2.3-GGUF model card lists city96/ComfyUI-GGUF and kijai/ComfyUI-KJNodes for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler, the low-VRAM loaders, and example workflows.
3. Download a 16 GB-fitting distilled GGUF transformer
The distilled checkpoint runs at 8 steps with CFG 1.0 and is the only practical choice for a 16 GB card. Pick Q4_K_S (13.12 GB) for fidelity, or step down to Q3_K_S (9.95 GB) if you want more spare VRAM for activations:
# Q4_K_S distilled (13.12 GB on disk) — standard 16 GB pick
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q4_K_S.gguf \
--local-dir ComfyUI/models/unet/
# OR Q3_K_S distilled (9.95 GB) — more headroom for VAE-decode activations
huggingface-cli download unsloth/LTX-2.3-GGUF \
distilled/ltx-2.3-22b-distilled-Q3_K_S.gguf \
--local-dir ComfyUI/models/unet/
Distilled-transformer GGUF file sizes (verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):
| Quant | unsloth distilled file size | 4060 Ti 16 GB fit notes |
|---|---|---|
| Q2_K | 8.28 GB | smallest; lowest fidelity |
| Q3_K_S | 9.95 GB | headroom pick — leaves room for VAE-decode |
| Q4_K_S | 13.12 GB | standard 16 GB pick (transformer only; encoder streams) |
| Q4_K_M | 14.33 GB | tight on 16 GB once VAE activations load |
| Q5_K_S | 15.25 GB | too tight on 16 GB — belongs to the 24 GB recipe |
| Q6_K | 17.77 GB | exceeds 16 GB — out of scope here |
| BF16 | 42.04 GB | out of scope |
There is also a newer distilled-1.1 revision in the same repo (a "different aesthetic experience and improved audio compared to v1.0" per the Lightricks/LTX-2.3 model card); either revision works and this recipe pins the original distilled/.
4. Download the embeddings connectors and VAE
The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE (LTX-2.3 ships them separately for the GGUF flow):
huggingface-cli download unsloth/LTX-2.3-GGUF \
text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
vae/ltx-2.3-22b-distilled_video_vae.safetensors \
vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
--local-dir ComfyUI/models/
(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — verified live via the unsloth/LTX-2.3-GGUF tree API.)
5. Download a quantized Gemma 3 12B text encoder (it will stream from RAM)
LTX-2.3 uses Gemma 3 12B as its text encoder. On 16 GB it cannot be GPU-resident — it streams from RAM (see Running). Download the QAT-Q4 GGUF, the smallest broadly-supported option:
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
--local-dir ComfyUI/models/text_encoders/
The encoder file is 7.43 GB (verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API). An FP8 single-file alternative (gemma_3_12B_it_fp8_e4m3fn.safetensors, 13.21 GB at GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn) is documented in Issue #303's workaround comment — but at 13.21 GB it is too large to keep alongside a 13 GB transformer on a 16 GB card, so the QAT-Q4 GGUF + streaming is the right path here (see Troubleshooting).
Running
On 16 GB the binding constraint is the Gemma encoder, not the transformer. The path that actually fits is --novram — stream all weights from RAM so the GPU only holds the active inference tensors:
# Stream all weights from RAM — GPU footprint collapses, text-encode runs on CPU
python main.py --listen --novram
This is the exact path a 16 GB-card owner used: with --novram, "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter on an RTX 5080 16GB). The tradeoff is that the text-encode pass runs on the CPU and is slower; the inference (sampling) pass stays fast.
If you prefer to keep the transformer resident and only spill the overflow, the README's other low-VRAM knob is --reserve-vram — "Use --reserve-vram ComfyUI parameter" per the ComfyUI-LTXVideo README. A community reporter on the same 16 GB tier found "these settings work for RTX 5080" with --reserve-vram 10 (Issue #303 comment). On the 4060 Ti, with only 16 GB total, --novram is the more reliable starting point; reach for --reserve-vram only if your encoder loader keeps the encoder off-GPU.
Open the browser UI and load one of the example workflows shipped by the Lightricks node:
ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/
Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 3. Wire the quantized Gemma 3 12B encoder through the GGUF text-encoder loader. Keep resolution and frame count modest on this card — short tiktok-length clips at 480p–720p are the realistic envelope. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.
Recommended distilled settings
| Parameter | Value | Source |
|---|---|---|
| Sampler steps | 8 | Distilled checkpoint default per Lightricks/LTX-2.3 model card |
| CFG | 1.0 | Same |
| Resolution | width & height divisible by 32 (start 512×512) | Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32." |
| Frame count | divisible by 8 + 1 (e.g. 65, 97, 121) | Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1." |
Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution toward 720p.
Results
- Speed: Omitted. No fixed-configuration RTX 4060 Ti benchmark for LTX-2.3 22B has been published, and the single community datapoint at /check/ltx-video-2-3/rtx-4060-ti-16gb (a Reddit user running the distilled checkpoint) reports a
worksverdict but no comparable speed number — they note only that it is decent for short tiktok-length clips and that speed improved after upgrading their torch version. Empirical 4060 Ti timings will appear at /check/ltx-video-2-3/rtx-4060-ti-16gb once a community benchmark lands via /contribute. - VRAM usage: The benchmark at /check/ltx-video-2-3/rtx-4060-ti-16gb records a
worksverdict with a 16.0 GB peak on this card — i.e. the recipe sits right at the card's ceiling on the resident-transformer path. Themin_vram_gb: 14reflects this recipe's lead path: a distilled Q4_K_S GGUF transformer (13.12 GB on disk) plus VAE-decode activations, with the Gemma encoder streamed from RAM (--novram) so it never adds to the GPU footprint. On the streaming path one 16 GB-card owner measured the GPU footprint collapse to "3 GB VRAM to make a 720p video" (Issue #303 comment). Treat 14 GB as the working envelope and the 16.0 GB/checkfigure as the worst-case resident peak. - Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. On 16 GB you are limited to Q4_K_S (or Q3_K_S for more headroom); the roomier Q5/Q6 quants belong to the 24 GB RTX 4090 recipe. Keep clips short and resolution modest (480p–720p) — this is a tight fit, not a comfortable one.
For the full benchmark data, see /check/ltx-video-2-3/rtx-4060-ti-16gb.
Troubleshooting
"Out of memory" loading the Gemma encoder on 16 GB
This is the defining 16 GB failure: the unquantized Gemma 3 12B "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per Issue #303, which was filed on a 16 GB RTX 5080. Fixes, in order of preference:
- Launch with
--novramto stream all weights from RAM — the encoder runs on CPU and the GPU footprint drops dramatically (the 16 GB reporter measured "3 GB VRAM to make a 720p video", comment). - Use the quantized Gemma from Step 5 (QAT-Q4 GGUF, 7.43 GB) rather than the unquantized encoder.
- Avoid the FP8 single-file
gemma_3_12B_it_fp8_e4m3fn.safetensors(13.21 GB) on this card — at 13.21 GB it cannot co-reside with a ~13 GB transformer on 16 GB. It is documented in a workaround comment but suits larger cards.
mat1 and mat2 shapes cannot be multiplied after enabling --novram
A reporter hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a custom preview flag (Issue #303 comment, community reporter on a 24 GB card — the same flag interaction applies on 16 GB). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.
Encoder is slow on CPU under --novram
The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass — the unavoidable tradeoff of the only-path-that-fits on 16 GB. A community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "for Gemma works 8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.
No CUDA wheel gymnastics needed on Ada (sm_89)
Unlike Blackwell GPUs (sm_120), the RTX 4060 Ti's Ada sm_89 architecture is fully covered by prebuilt PyTorch + FlashAttention wheels, so no cu128 / special index-url selection is required — the default pip install torch already ships sm_89 kernels. LTX-2.3's ComfyUI path defaults to PyTorch SDPA regardless.
Audio-video output not synchronized
LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.