What You'll Build
Generate short text-to-video clips locally using LightX2V — an inference framework that ships 4-step, CFG-free distilled checkpoints of Wan2.1-T2V-14B — on a consumer 16GB GPU. The distilled checkpoint cuts inference from 40–50 steps down to 4 with no classifier-free guidance, and the official framework documents running 14B Wan models for 480P / 720P video on as little as 8GB VRAM with 16GB system RAM (ModelTC/LightX2V README).
Hardware data: RTX 5060 Ti (16GB VRAM) · 4-step distilled Wan2.1-T2V-14B · See benchmark data
⚠️ Heads-up: "Minimum 8GB VRAM" in the LightX2V docs assumes you stay on the quantized / offloaded path. The unquantized 14B base will OOM even on a 48GB A6000 without SageAttention, fp8 quant, VAE tiling, or CPU offload — see HF discussion #9. On a 16GB card stick to the fp8 / int8 distilled weights and enable offload.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8GB VRAM (CUDA) | RTX 5060 Ti (16GB) |
| RAM | 16GB (per official docs) | 16GB+ |
| Storage | ~50GB | weights + framework + cache |
| Software | Python 3.10+, PyTorch 2.6+, CUDA 12.4 or 12.8 | per Quickstart |
LightX2V is the inference framework; the lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v HF repo is the canonical distilled T2V checkpoint it accelerates.
Installation
You have two supported paths. Path A (diffusers, simplest) is the one-liner from the model card. Path B (LightX2V framework) gives you fp8/int8 quant, SageAttention, offload and the full Wan / LTX / HunyuanVideo zoo.
Path A — Diffusers one-liner
pip install -U diffusers transformers accelerate
Then save and run this script (adapted from the HF model card — see note below):
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained(
"lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v",
dtype=torch.bfloat16,
device_map="cuda",
)
pipe.to("cuda")
prompt = "A man with short gray hair plays a red electric guitar."
output = pipe(prompt=prompt).frames[0]
export_to_video(output, "output.mp4")
ℹ️ Model-card note: as of writing, the HF model card's Python example also loads an input image and passes
image=imageto the pipeline — that's the Image-to-Video signature. This repo is the T2V-14B distilled checkpoint (per the repo name and the LightX2V quantization docs), so call the pipeline withprompt=only. If you want true I2V, use the matching…-I2V-14B-…distilled checkpoint instead.
The recommended scheduler settings (per the model card) are LCM scheduler with shift=5.0 and guidance_scale=1.0.
Path B — Official LightX2V framework (recommended for 16GB cards)
Use the framework when you want fp8/int8 distilled weights, SageAttention, and explicit CPU offload — all of which are documented as the path to running 14B Wan on 8GB VRAM.
# 1. Clone and install
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
conda create -n lightx2v python=3.11 -y
conda activate lightx2v
pip install -v -e .
Verbatim from the LightX2V Quickstart docs.
# 2. (Recommended) build SageAttention 2 for ~2x attention speedup
git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention && CUDA_ARCHITECTURES="8.0,8.6,8.9,9.0,12.0" \
EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 \
pip install -v -e .
The CUDA_ARCHITECTURES="...,12.0" string covers the RTX 5060 Ti's Blackwell sm_120 target — leave it in.
# 3. Pull the 4-step distilled T2V-14B checkpoint
huggingface-cli download lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v \
--local-dir ./weights/Wan2.1-T2V-14B-StepDistill
Docker is also offered as the "simplest and fastest" alternative per the same Quickstart:
docker pull lightx2v/lightx2v:26011201-cu128
Running
Path A (diffusers)
Just run the Python script from the install section — output.mp4 lands in your working directory after the 4-step inference completes.
Path B (LightX2V framework)
The repo ships ready-to-run shell scripts (Quickstart):
bash scripts/wan/run_wan_t2v.sh
Or call the Python API directly:
from lightx2v import LightX2VPipeline
pipe = LightX2VPipeline(
model_path="./weights/Wan2.1-T2V-14B-StepDistill",
model_cls="wan2.1",
task="t2v",
)
pipe.create_generator(
attn_mode="sage_attn2",
infer_steps=4, # the whole point of the distilled checkpoint
height=480, width=832,
num_frames=81,
)
pipe.generate(
seed=42,
prompt="A man with short gray hair plays a red electric guitar.",
save_result_path="./output.mp4",
)
Start at 480×832, 81 frames, 4 steps — that's the lowest-risk configuration on a 16GB card. Push resolution and frame count up only if peak VRAM stays comfortably below 16GB.
Results
- Speed: No published benchmark on the RTX 5060 Ti at time of writing. For order-of-magnitude reference, the official ModelTC/LightX2V README reports
20.26 s/iton a single RTX 4090D (24GB) and5.18 s/iton an H100 with the framework's optimizations enabled — both larger than the 5060 Ti, so expect proportionally slower wall time on the 16GB target. Empirical 5060 Ti numbers will land at /check/lightx2v/rtx-5060-ti once a benchmark report is submitted. - VRAM usage: The framework's official Quickstart sets the floor at "minimum 8GB VRAM" for 14B Wan video at 480P/720P with offload + quant (LightX2V Quickstart). The HF model card additionally confirms fp8 and int8 distillation weights were tested on an RTX 4060 (HF model card), so a 5060 Ti at 16GB has comfortable headroom on the quantized path.
- Quality notes: The distilled checkpoint trades fine motion detail and prompt fidelity for the 4-step / no-CFG speedup. Use the recommended LCM scheduler,
shift=5.0,guidance_scale=1.0(HF model card) and stay close to the model's training resolutions (480×832, 720×1280) for best results.
For the full benchmark data, see /check/lightx2v/rtx-5060-ti.
Troubleshooting
Out of memory even on a high-VRAM card
A user reported OOM with the unquantized distilled T2V-14B on a 48GB A6000 (HF discussion #9). On a 16GB RTX 5060 Ti you must combine several optimizations:
- Switch to the fp8 or int8 distilled weights from the same HF org
- Enable SageAttention 2 (
attn_mode="sage_attn2"in the pipeline) - Turn on CPU offload and VAE tiling (see LightX2V quantization docs)
- Try
torch.compilefor an additional ~20% VRAM saving (cited in the same HF discussion)
Slow inference despite the 4-step distillation
The 4-step distilled checkpoint only helps if the LCM scheduler is actually loaded and guidance_scale=1.0. If you forgot to swap the scheduler or left CFG > 1, you're still running the original Wan inference path — verify both settings against the model card.
SageAttention build fails on the RTX 5060 Ti
The 5060 Ti is Blackwell (sm_120). If the SageAttention build skips it, ensure CUDA_ARCHITECTURES in the install command includes 12.0 (as in Step 2 above) and that PyTorch ships with CUDA 12.8 — earlier CUDA toolchains predate Blackwell.
Resolution / frame-count crashes
The Wan2.1 base requires resolutions divisible by 16 and a frame count that follows the model's grouping. Stick to the example configs (480×832 / 81 frames; 720×1280 / 81 frames) until you've measured a comfortable VRAM margin.
Report new issues via submission form — community 5060 Ti benchmarks would directly improve the /check/lightx2v/rtx-5060-ti data.