How much VRAM does LightX2V need?

About 8 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

LightX2V on RTX 5060 Ti: 4-Step Text-to-Video with Distilled Wan2.1-14B

What You'll Build

Generate short text-to-video clips locally using LightX2V — an inference framework that ships 4-step, CFG-free distilled checkpoints of Wan2.1-T2V-14B — on a consumer 16GB GPU. The distilled checkpoint cuts inference from 40–50 steps down to 4 with no classifier-free guidance, and the official framework documents running 14B Wan models for 480P / 720P video on as little as 8GB VRAM with 16GB system RAM (ModelTC/LightX2V README).

Hardware data: RTX 5060 Ti (16GB VRAM) · 4-step distilled Wan2.1-T2V-14B · See benchmark data

⚠️ Heads-up: "Minimum 8GB VRAM" in the LightX2V docs assumes you stay on the quantized / offloaded path. The unquantized 14B base will OOM even on a 48GB A6000 without SageAttention, fp8 quant, VAE tiling, or CPU offload — see HF discussion #9. On a 16GB card stick to the fp8 / int8 distilled weights and enable offload.

Requirements

Component	Minimum	Tested
GPU	8GB VRAM (CUDA)	RTX 5060 Ti (16GB)
RAM	16GB (per official docs)	16GB+
Storage	~50GB	weights + framework + cache
Software	Python 3.10+, PyTorch 2.6+, CUDA 12.4 or 12.8	per Quickstart

LightX2V is the inference framework; the lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v HF repo is the canonical distilled T2V checkpoint it accelerates.

Installation

You have two supported paths. Path A (diffusers, simplest) is the one-liner from the model card. Path B (LightX2V framework) gives you fp8/int8 quant, SageAttention, offload and the full Wan / LTX / HunyuanVideo zoo.

Path A — Diffusers one-liner

pip install -U diffusers transformers accelerate

Then save and run this script (adapted from the HF model card — see note below):

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained(
    "lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v",
    dtype=torch.bfloat16,
    device_map="cuda",
)
pipe.to("cuda")

prompt = "A man with short gray hair plays a red electric guitar."

output = pipe(prompt=prompt).frames[0]
export_to_video(output, "output.mp4")

ℹ️ Model-card note: as of writing, the HF model card's Python example also loads an input image and passes image=image to the pipeline — that's the Image-to-Video signature. This repo is the T2V-14B distilled checkpoint (per the repo name and the LightX2V quantization docs), so call the pipeline with prompt= only. If you want true I2V, use the matching …-I2V-14B-… distilled checkpoint instead.

The recommended scheduler settings (per the model card) are LCM scheduler with shift=5.0 and guidance_scale=1.0.

Path B — Official LightX2V framework (recommended for 16GB cards)

Use the framework when you want fp8/int8 distilled weights, SageAttention, and explicit CPU offload — all of which are documented as the path to running 14B Wan on 8GB VRAM.

# 1. Clone and install
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
conda create -n lightx2v python=3.11 -y
conda activate lightx2v
pip install -v -e .

Verbatim from the LightX2V Quickstart docs.

# 2. (Recommended) build SageAttention 2 for ~2x attention speedup
git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention && CUDA_ARCHITECTURES="8.0,8.6,8.9,9.0,12.0" \
  EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 \
  pip install -v -e .

The CUDA_ARCHITECTURES="...,12.0" string covers the RTX 5060 Ti's Blackwell sm_120 target — leave it in.

# 3. Pull the 4-step distilled T2V-14B checkpoint
huggingface-cli download lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v \
  --local-dir ./weights/Wan2.1-T2V-14B-StepDistill

Docker is also offered as the "simplest and fastest" alternative per the same Quickstart:

docker pull lightx2v/lightx2v:26011201-cu128

Running

Path A (diffusers)

Just run the Python script from the install section — output.mp4 lands in your working directory after the 4-step inference completes.

Path B (LightX2V framework)

The repo ships ready-to-run shell scripts (Quickstart):

bash scripts/wan/run_wan_t2v.sh

Or call the Python API directly:

from lightx2v import LightX2VPipeline

pipe = LightX2VPipeline(
    model_path="./weights/Wan2.1-T2V-14B-StepDistill",
    model_cls="wan2.1",
    task="t2v",
)
pipe.create_generator(
    attn_mode="sage_attn2",
    infer_steps=4,            # the whole point of the distilled checkpoint
    height=480, width=832,
    num_frames=81,
)
pipe.generate(
    seed=42,
    prompt="A man with short gray hair plays a red electric guitar.",
    save_result_path="./output.mp4",
)

Start at 480×832, 81 frames, 4 steps — that's the lowest-risk configuration on a 16GB card. Push resolution and frame count up only if peak VRAM stays comfortably below 16GB.

Results

Speed: No published benchmark on the RTX 5060 Ti at time of writing. For order-of-magnitude reference, the official ModelTC/LightX2V README reports 20.26 s/it on a single RTX 4090D (24GB) and 5.18 s/it on an H100 with the framework's optimizations enabled — both larger than the 5060 Ti, so expect proportionally slower wall time on the 16GB target. Empirical 5060 Ti numbers will land at /check/lightx2v/rtx-5060-ti once a benchmark report is submitted.
VRAM usage: The framework's official Quickstart sets the floor at "minimum 8GB VRAM" for 14B Wan video at 480P/720P with offload + quant (LightX2V Quickstart). The HF model card additionally confirms fp8 and int8 distillation weights were tested on an RTX 4060 (HF model card), so a 5060 Ti at 16GB has comfortable headroom on the quantized path.
Quality notes: The distilled checkpoint trades fine motion detail and prompt fidelity for the 4-step / no-CFG speedup. Use the recommended LCM scheduler, shift=5.0, guidance_scale=1.0 (HF model card) and stay close to the model's training resolutions (480×832, 720×1280) for best results.

For the full benchmark data, see /check/lightx2v/rtx-5060-ti.

Troubleshooting

Out of memory even on a high-VRAM card

A user reported OOM with the unquantized distilled T2V-14B on a 48GB A6000 (HF discussion #9). On a 16GB RTX 5060 Ti you must combine several optimizations:

Switch to the fp8 or int8 distilled weights from the same HF org
Enable SageAttention 2 (attn_mode="sage_attn2" in the pipeline)
Turn on CPU offload and VAE tiling (see LightX2V quantization docs)
Try torch.compile for an additional ~20% VRAM saving (cited in the same HF discussion)

Slow inference despite the 4-step distillation

The 4-step distilled checkpoint only helps if the LCM scheduler is actually loaded and guidance_scale=1.0. If you forgot to swap the scheduler or left CFG > 1, you're still running the original Wan inference path — verify both settings against the model card.

SageAttention build fails on the RTX 5060 Ti

The 5060 Ti is Blackwell (sm_120). If the SageAttention build skips it, ensure CUDA_ARCHITECTURES in the install command includes 12.0 (as in Step 2 above) and that PyTorch ships with CUDA 12.8 — earlier CUDA toolchains predate Blackwell.

Resolution / frame-count crashes

The Wan2.1 base requires resolutions divisible by 16 and a frame count that follows the model's grouping. Stick to the example configs (480×832 / 81 frames; 720×1280 / 81 frames) until you've measured a comfortable VRAM margin.

Report new issues via submission form — community 5060 Ti benchmarks would directly improve the /check/lightx2v/rtx-5060-ti data.