How much VRAM does LTX-Video 2B need?

About 10 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

LTX-Video 2B on RTX 4060 Ti 16GB: Fast Local Text-to-Video

What You'll Build

A local text-to-video setup running the Lightricks LTX-Video 2B distilled DiT model on a single RTX 4060 Ti 16GB, producing short clips (e.g. 768×512, 161 frames) from a text prompt. You install the official diffusers LTXPipeline with fp8 layerwise weight-casting plus group offloading so the model fits comfortably inside 16GB.

Hardware data: RTX 4060 Ti 16GB (16GB VRAM) · official diffusers memory-optimized path needs ~10GB VRAM · See benchmark data

ℹ️ This is the 2B variant, not the 13B. The Lightricks/LTX-Video repository hosts several model lines under one roof: the lightweight 2B transformer (this recipe — checkpoint ltxv-2b-0.9.8-distilled.safetensors, 6.34 GB on disk) and a much heavier 13B transformer (ltxv-13b-0.9.8-dev.safetensors, ~28.6 GB on disk) that the model card flags as "requires more VRAM". The newer LTX-2 (audio+video) is a separate, larger model line in a different repository and its numbers do not transfer here. This recipe targets the 2B only.

Requirements

Component	Minimum	Tested
GPU	10GB VRAM (diffusers fp8 + offloading path)	RTX 4060 Ti (16GB)
RAM	16GB system RAM	—
Storage	~7GB for the 2B distilled checkpoint, plus text encoder + VAE	—
Software	Python 3.10+, PyTorch >= 2.1.2, CUDA 12.x	—

Installation

You have two supported runtimes. The diffusers path is the one this recipe's VRAM figure is measured against (~10GB); the official repo / ComfyUI path is the upstream-recommended route for best quality.

1. Install diffusers and dependencies (recommended for the 16GB fit)

python -m venv env
source env/bin/activate
pip install -U torch torchvision
pip install -U "diffusers>=0.32" transformers accelerate sentencepiece imageio imageio-ffmpeg

2. (Alternative) Install the official LTX-Video repo

The canonical repository documents a pip install -e .[inference] flow plus an inference.py script. From the official README:

git clone https://github.com/Lightricks/LTX-Video.git
cd LTX-Video

# create env
python -m venv env
source env/bin/activate
python -m pip install -e .\[inference\]

The 2B distilled checkpoint is selected via a config file — configs/ltxv-2b-0.9.8-distilled.yaml, which points at ltxv-2b-0.9.8-distilled.safetensors.

Running

diffusers (fits 16GB with ~10GB peak)

The official diffusers LTX-Video docs publish a memory-optimized example that combines fp8 layerwise weight-casting with group offloading. The docs state this path requires ~10GB of VRAM — well within the 4060 Ti's 16GB:

import torch
from diffusers import LTXPipeline, AutoModel
from diffusers.hooks import apply_group_offloading
from diffusers.utils import export_to_video

# fp8 layerwise weight-casting
transformer = AutoModel.from_pretrained(
    "Lightricks/LTX-Video",
    subfolder="transformer",
    torch_dtype=torch.bfloat16
)
transformer.enable_layerwise_casting(
    storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16
)

pipeline = LTXPipeline.from_pretrained("Lightricks/LTX-Video", transformer=transformer, torch_dtype=torch.bfloat16)

# group-offloading
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
pipeline.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True)
apply_group_offloading(pipeline.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)
apply_group_offloading(pipeline.vae, onload_device=onload_device, offload_type="leaf_level")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=161,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
    num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

Output lands in output.mp4 in your working directory.

Official repo (inference.py)

If you installed the canonical repo instead, generate from the command line, pointing the --pipeline_config at the 2B config:

python inference.py --prompt "PROMPT" --height 512 --width 768 --num_frames 121 --seed 0 --pipeline_config configs/ltxv-2b-0.9.8-distilled.yaml

Per the parameter guide, the model works on resolutions divisible by 32 and frame counts divisible by 8 + 1 (e.g. 121, 257), works best under 720×1280 and below 257 frames, and for the guidance-distilled 2B you should set guidance_scale to 1.0.

Results

Speed: No first-party benchmark exists yet for the 2B on the RTX 4060 Ti 16GB specifically — see /check/ltx-video-2b/rtx-4060-ti-16gb and please contribute yours via /contribute. For reference on adjacent hardware: the community LTX-VideoQ8 8-bit fork reports generating "720x480x121 videos in under a minute on RTX 4060 Laptop GPU with 8GB VRAM" with "up to 3X speed up in NVIDIA ADA GPUs" (the 4060 Ti is also Ada); and maintainer benibraz notes in discussion #6 that "On an RTX 4090 users have generated 121 frames in 11 seconds" — neither is the 4060 Ti, so treat them as directional only.
VRAM usage: The official diffusers memory-optimized path is documented at ~10GB VRAM (diffusers docs). On the canonical repo, maintainer benibraz reports the model has been run "with 6GB of VRAM and 16GB of RAM with some tricks (quantized clip encoder, etc) and generating 512x512 resolution with 50 frames" (discussion #6). The 4060 Ti's 16GB gives comfortable headroom over all of these.
Quality notes: This is the distilled 2B — the model card describes it as a "Smaller model, slight quality reduction compared to 13b distilled. Ideal for fast generation with light VRAM usage" (README models table). For higher fidelity at the cost of VRAM and speed, the 13B line exists but is out of scope here.

For the full benchmark data, see /check/ltx-video-2b/rtx-4060-ti-16gb.

Troubleshooting

Out of memory at default settings

Even on a 16GB card, the default full-precision text encoder (PixArt/T5-XXL) plus the VAE decode stage can spike VRAM. Keep the fp8 layerwise casting and group offloading from the Running section enabled. Users on smaller cards report success with the same offloading tricks: a 4060 Ti 8GB owner notes in discussion #18 that "Sometimes it works, but sometimes I run out of memory" at default — the 16GB SKU plus offloading avoids that edge.

FP8 kernels for an extra speed boost (Ada)

The RTX 4060 Ti is Ada architecture (sm_89). The canonical repo offers optional FP8 kernels that "provide performance boost on supported graphics cards (Ada architecture and later)" (README). These are optional — the diffusers path above already runs without them.

Frame-count / resolution errors

The model requires the number of frames to be a multiple of 8 + 1 (e.g. 121, 161, 257) and resolution divisible by 32. If you pass other values they will be padded then cropped, which can degrade output — pick conformant values up front per the parameter guide.