self-hosted/ai
§01·recipe · video

LTX-2.3 on RTX 4060 Ti 16GB: 22B Audio-Video at the 16 GB Floor via Distilled GGUF + Streamed Encoder

videoadvanced14GB+ VRAMJun 27, 2026

This advanced recipe sets up LTX-2.3 on the RTX 4060 Ti 16GB, needing about 14 GB of VRAM.

models
tools
prerequisites
  • NVIDIA RTX 4060 Ti 16GB (Ada sm_89) — 16 GB is exactly the floor LTX-2.3 fits on, and only via a distilled GGUF transformer plus a quantized, RAM-streamed Gemma encoder
  • 64GB system RAM strongly recommended (the Gemma 3 12B text encoder does NOT fit on-GPU at 16 GB — it streams from RAM)
  • Python 3.12+ and CUDA 12.7+ — the LTX-2.3 codebase was tested with Python >=3.12; on the RTX 4060 Ti (Ada, sm_89) the default `pip install torch` already ships sm_89 kernels in the stable cu124-class wheel, so no cu128 / special index-url is needed
  • ComfyUI installed (latest version) + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes
  • ~25GB free disk space for the distilled GGUF transformer + connectors + quantized encoder + VAE

What You'll Build

Generate short, synchronized audio + video clips locally with LTX-2.3 — Lightricks' 22B-parameter DiT audio-video foundation model — on a 16 GB RTX 4060 Ti. Per the Lightricks/LTX-2.3 model card, "LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model." The canonical ComfyUI install wants a "CUDA-compatible GPU with 32GB+ VRAM" (ComfyUI-LTXVideo README) — and even the README's own low-VRAM loaders only promise that "generation fits in 32 GB VRAM". The 4060 Ti's 16 GB is half that floor, so this recipe runs the distilled GGUF transformer at a 16 GB-fitting quant and streams the quantized Gemma 3 12B encoder from system RAM rather than holding it on the GPU.

Hardware data: RTX 4060 Ti 16GB (Ada sm_89) · distilled Q4_K_S/Q3_K_S GGUF + RAM-streamed quantized Gemma · works verdict · See benchmark data

⚠️ 16 GB cannot hold the Gemma encoder on-GPU — this is the binding constraint, not the transformer. The unquantized Gemma 3 12B text encoder "needs ~24-27GB VRAM to operate" and makes LTX-2 "unusable on consumer GPUs with 16GB VRAM (RTX 5080, RTX 4080, etc.), even with FP8 models and all optimizations applied" — it OOMs at "Peak Usage: 29068 MiB" per ComfyUI-LTXVideo Issue #303, which was filed on a 16 GB RTX 5080 (same VRAM tier as the 4060 Ti). The working answer on 16 GB is to stream weights from RAM with --novram: a 16 GB-card owner reports "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter on an RTX 5080 16GB).

Variant pin. This recipe targets LTX-2.3 (22B, canonical repo Lightricks/LTX-2.3). It is NOT for the older LTX-2 19B line (repo Lightricks/LTX-2) and NOT for the LTX-Video 0.9.x 2B/13B family (repo Lightricks/LTX-Video). Several Issue #303 reports cited below were filed against the LTX-2 19B FP8 path; they are cited only for the shared Gemma 3 12B encoder failure mode, which is identical across both lines because they use the same 12B encoder.

ℹ️ License. The LTX-2.3 weights are released under the ltx-2-community-license-agreement (the model card's license: is other with license_name: ltx-2-community-license-agreement) — not Apache-2.0. Read the license before any commercial use.

Requirements

ComponentMinimumTested
GPU16GB VRAM (distilled GGUF + streamed encoder)RTX 4060 Ti (16GB, Ada sm_89)
RAM32GB64GB recommended (Issue #303 — the 16 GB-card --novram reporter ran 64GB RAM)
Storage~25GB~25GB (Q4_K_S transformer + connectors + quantized Gemma encoder + VAE)
SoftwareComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodesPython 3.12+, CUDA 12.7+

The full unquantized LTX-2.3 requires a "CUDA-compatible GPU with 32GB+ VRAM" per the ComfyUI-LTXVideo README. At 16 GB the 4060 Ti runs a distilled GGUF quant of the transformer (Q4_K_S, 13.12 GB on disk, or the smaller Q3_K_S at 9.95 GB for more headroom) and the Gemma encoder is never resident on the GPU — it streams from RAM under --novram. Plan on 64 GB of system RAM so the streamed encoder + transformer weights have somewhere to live.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

On the RTX 4060 Ti (Ada Lovelace, sm_89) the default pip install torch already includes sm_89 kernels in the stable cu124-class wheel — unlike Blackwell (sm_120), no special --index-url / cu128 wheel selection is needed. The LTX-2.3 codebase "was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7." per the Lightricks/LTX-2.3 model card.

2. Install the LTX-Video, GGUF, and KJNodes custom nodes

cd ComfyUI/custom_nodes

# Official Lightricks ComfyUI nodes for LTX-2.3
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

# city96's GGUF loader — required for the quantized transformer + Gemma-GGUF encoder
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

# Kijai's KJNodes — used by the recommended workflows
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt

The unsloth/LTX-2.3-GGUF model card lists city96/ComfyUI-GGUF and kijai/ComfyUI-KJNodes for the GGUF workflow; the Lightricks LTXVideo nodes provide the LTX-2.3 sampler, the low-VRAM loaders, and example workflows.

3. Download a 16 GB-fitting distilled GGUF transformer

The distilled checkpoint runs at 8 steps with CFG 1.0 and is the only practical choice for a 16 GB card. Pick Q4_K_S (13.12 GB) for fidelity, or step down to Q3_K_S (9.95 GB) if you want more spare VRAM for activations:

# Q4_K_S distilled (13.12 GB on disk) — standard 16 GB pick
huggingface-cli download unsloth/LTX-2.3-GGUF \
  distilled/ltx-2.3-22b-distilled-Q4_K_S.gguf \
  --local-dir ComfyUI/models/unet/

# OR Q3_K_S distilled (9.95 GB) — more headroom for VAE-decode activations
huggingface-cli download unsloth/LTX-2.3-GGUF \
  distilled/ltx-2.3-22b-distilled-Q3_K_S.gguf \
  --local-dir ComfyUI/models/unet/

Distilled-transformer GGUF file sizes (verified live via the unsloth/LTX-2.3-GGUF tree API, distilled/ folder):

Quantunsloth distilled file size4060 Ti 16 GB fit notes
Q2_K8.28 GBsmallest; lowest fidelity
Q3_K_S9.95 GBheadroom pick — leaves room for VAE-decode
Q4_K_S13.12 GBstandard 16 GB pick (transformer only; encoder streams)
Q4_K_M14.33 GBtight on 16 GB once VAE activations load
Q5_K_S15.25 GBtoo tight on 16 GB — belongs to the 24 GB recipe
Q6_K17.77 GBexceeds 16 GB — out of scope here
BF1642.04 GBout of scope

There is also a newer distilled-1.1 revision in the same repo (a "different aesthetic experience and improved audio compared to v1.0" per the Lightricks/LTX-2.3 model card); either revision works and this recipe pins the original distilled/.

4. Download the embeddings connectors and VAE

The distilled GGUF transformer needs its matching text-projection connectors and the audio + video VAE (LTX-2.3 ships them separately for the GGUF flow):

huggingface-cli download unsloth/LTX-2.3-GGUF \
  text_encoders/ltx-2.3-22b-distilled_embeddings_connectors.safetensors \
  vae/ltx-2.3-22b-distilled_video_vae.safetensors \
  vae/ltx-2.3-22b-distilled_audio_vae.safetensors \
  --local-dir ComfyUI/models/

(Connectors 2.31 GB, video VAE 1.45 GB, audio VAE 0.36 GB — verified live via the unsloth/LTX-2.3-GGUF tree API.)

5. Download a quantized Gemma 3 12B text encoder (it will stream from RAM)

LTX-2.3 uses Gemma 3 12B as its text encoder. On 16 GB it cannot be GPU-resident — it streams from RAM (see Running). Download the QAT-Q4 GGUF, the smallest broadly-supported option:

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  --local-dir ComfyUI/models/text_encoders/

The encoder file is 7.43 GB (verified live via the unsloth/gemma-3-12b-it-qat-GGUF tree API). An FP8 single-file alternative (gemma_3_12B_it_fp8_e4m3fn.safetensors, 13.21 GB at GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn) is documented in Issue #303's workaround comment — but at 13.21 GB it is too large to keep alongside a 13 GB transformer on a 16 GB card, so the QAT-Q4 GGUF + streaming is the right path here (see Troubleshooting).

Running

On 16 GB the binding constraint is the Gemma encoder, not the transformer. The path that actually fits is --novram — stream all weights from RAM so the GPU only holds the active inference tensors:

# Stream all weights from RAM — GPU footprint collapses, text-encode runs on CPU
python main.py --listen --novram

This is the exact path a 16 GB-card owner used: with --novram, "the inference part works incredibly fast and it only costs my gpu 3 GB VRAM to make a 720p video" (Issue #303 comment, community reporter on an RTX 5080 16GB). The tradeoff is that the text-encode pass runs on the CPU and is slower; the inference (sampling) pass stays fast.

If you prefer to keep the transformer resident and only spill the overflow, the README's other low-VRAM knob is --reserve-vram"Use --reserve-vram ComfyUI parameter" per the ComfyUI-LTXVideo README. A community reporter on the same 16 GB tier found "these settings work for RTX 5080" with --reserve-vram 10 (Issue #303 comment). On the 4060 Ti, with only 16 GB total, --novram is the more reliable starting point; reach for --reserve-vram only if your encoder loader keeps the encoder off-GPU.

Open the browser UI and load one of the example workflows shipped by the Lightricks node:

ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/

Swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF and point it at the distilled GGUF from Step 3. Wire the quantized Gemma 3 12B encoder through the GGUF text-encoder loader. Keep resolution and frame count modest on this card — short tiktok-length clips at 480p–720p are the realistic envelope. The output (silent video, or synchronized audio+video if you load the audio-enabled workflow) lands in ComfyUI/output/.

Recommended distilled settings

ParameterValueSource
Sampler steps8Distilled checkpoint default per Lightricks/LTX-2.3 model card
CFG1.0Same
Resolutionwidth & height divisible by 32 (start 512×512)Lightricks/LTX-2.3 card: "Width & height settings must be divisible by 32."
Frame countdivisible by 8 + 1 (e.g. 65, 97, 121)Lightricks/LTX-2.3 card: "Frame count must be divisible by 8 + 1."

Start small (e.g. 512×512, 65 frames) to confirm the install fits before scaling resolution toward 720p.

Results

  • Speed: Omitted. No fixed-configuration RTX 4060 Ti benchmark for LTX-2.3 22B has been published, and the single community datapoint at /check/ltx-video-2-3/rtx-4060-ti-16gb (a Reddit user running the distilled checkpoint) reports a works verdict but no comparable speed number — they note only that it is decent for short tiktok-length clips and that speed improved after upgrading their torch version. Empirical 4060 Ti timings will appear at /check/ltx-video-2-3/rtx-4060-ti-16gb once a community benchmark lands via /contribute.
  • VRAM usage: The benchmark at /check/ltx-video-2-3/rtx-4060-ti-16gb records a works verdict with a 16.0 GB peak on this card — i.e. the recipe sits right at the card's ceiling on the resident-transformer path. The min_vram_gb: 14 reflects this recipe's lead path: a distilled Q4_K_S GGUF transformer (13.12 GB on disk) plus VAE-decode activations, with the Gemma encoder streamed from RAM (--novram) so it never adds to the GPU footprint. On the streaming path one 16 GB-card owner measured the GPU footprint collapse to "3 GB VRAM to make a 720p video" (Issue #303 comment). Treat 14 GB as the working envelope and the 16.0 GB /check figure as the worst-case resident peak.
  • Quality notes: The distilled checkpoint (8 steps, CFG 1.0) trades fine motion detail for speed. On 16 GB you are limited to Q4_K_S (or Q3_K_S for more headroom); the roomier Q5/Q6 quants belong to the 24 GB RTX 4090 recipe. Keep clips short and resolution modest (480p–720p) — this is a tight fit, not a comfortable one.

For the full benchmark data, see /check/ltx-video-2-3/rtx-4060-ti-16gb.

Troubleshooting

"Out of memory" loading the Gemma encoder on 16 GB

This is the defining 16 GB failure: the unquantized Gemma 3 12B "needs ~24-27GB VRAM to operate" and OOMs at "Peak Usage: 29068 MiB" per Issue #303, which was filed on a 16 GB RTX 5080. Fixes, in order of preference:

  1. Launch with --novram to stream all weights from RAM — the encoder runs on CPU and the GPU footprint drops dramatically (the 16 GB reporter measured "3 GB VRAM to make a 720p video", comment).
  2. Use the quantized Gemma from Step 5 (QAT-Q4 GGUF, 7.43 GB) rather than the unquantized encoder.
  3. Avoid the FP8 single-file gemma_3_12B_it_fp8_e4m3fn.safetensors (13.21 GB) on this card — at 13.21 GB it cannot co-reside with a ~13 GB transformer on 16 GB. It is documented in a workaround comment but suits larger cards.

mat1 and mat2 shapes cannot be multiplied after enabling --novram

A reporter hit this when running --novram together with --use-sage-attention --fast fp16_accumulation and a custom preview flag (Issue #303 comment, community reporter on a 24 GB card — the same flag interaction applies on 16 GB). Remove the sage-attention and custom preview flags; the error traces to preview generation during sampling, not the model itself.

Encoder is slow on CPU under --novram

The --novram path moves the Gemma encoder to CPU, which is slow for the text-encode pass — the unavoidable tradeoff of the only-path-that-fits on 16 GB. A community user reports the "LTXV Audio Text Encoder Loader" node loads Gemma "for Gemma works 8x times faster then normal loader" (sic) (Issue #303 comment). Replace the default Gemma loader with it and load the single safetensors encoder file.

No CUDA wheel gymnastics needed on Ada (sm_89)

Unlike Blackwell GPUs (sm_120), the RTX 4060 Ti's Ada sm_89 architecture is fully covered by prebuilt PyTorch + FlashAttention wheels, so no cu128 / special index-url selection is required — the default pip install torch already ships sm_89 kernels. LTX-2.3's ComfyUI path defaults to PyTorch SDPA regardless.

Audio-video output not synchronized

LTX-2.3 produces synchronized video + audio in a single model per the Lightricks/LTX-2.3 card. The non-audio workflows produce silent video — load the audio-enabled workflow from example_workflows/2.3/ in ComfyUI-LTXVideo if you need sound.

common questions
How much VRAM does LTX-2.3 need?

About 14 GB — the minimum this recipe targets.

Which GPUs is LTX-2.3 tested on?

RTX 4060 Ti 16GB (16 GB).

How hard is this setup?

Advanced — follow the steps above.