What You'll Build
Generate 5-second text-to-video clips locally using Wan 2.1 T2V — currently the most capable open-source video generation model for home GPUs. No cloud services required.
Benchmark: 4 minutes per 5-second 480P video · Peak VRAM: 8.19GB · See all data
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | RTX 4070 (12GB) | RTX 4090 (24GB) |
| VRAM | 8GB | 24GB |
| RAM | 32GB | 64GB |
| Storage | 30GB | 30GB |
Note: The 14B parameter model requires 8GB VRAM minimum. For lower-end GPUs, use quantized versions.
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
2. Install Wan Video ComfyUI Nodes
cd ComfyUI/custom_nodes
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper
cd ComfyUI-WanVideoWrapper
pip install -r requirements.txt
3. Download Wan 2.1 T2V Model
The model is gated — accept the license at HuggingFace first:
huggingface-cli login
huggingface-cli download Wan-AI/Wan2.1-T2V-14B \
--local-dir ./models/wan2.1-t2v-14b
Place downloaded files in ComfyUI/models/wan/
4. Download VAE and Text Encoder
# CLIP text encoder
huggingface-cli download openai/clip-vit-large-patch14 \
--local-dir ./models/clip/
# Wan VAE
huggingface-cli download Wan-AI/Wan2.1-T2V-14B \
wan_2.1_vae.pth --local-dir ./models/vae/
Running
Start ComfyUI:
python main.py --listen
Load the included workflow from ComfyUI-WanVideoWrapper/example_workflows/wan2.1_t2v.json
Recommended Settings
| Parameter | Value | Notes |
|---|---|---|
| Steps | 50 | Wan needs more steps than image models |
| Resolution | 480×832 (portrait) or 832×480 | Start here, 720P possible with 24GB |
| Duration | 5 seconds | Default, up to 10s possible |
| CFG | 6.0 | Standard guidance scale |
Performance
| Config | Time | VRAM | GPU |
|---|---|---|---|
| 480P, 5s, no quantization | 4 min | 8.19GB | RTX 4090 |
| 480P, 5s, FP8 quant | ~2-3 min | ~6GB | RTX 4090 |
| 720P, 5s | ~8-10 min | ~16GB | RTX 4090 |
Source: approved community benchmarks. Full data →
Speed Optimization
Enable FP8 Quantization
For faster generation with minimal quality loss, use FP8 weights:
# Download FP8 quantized version
huggingface-cli download Wan-AI/Wan2.1-T2V-14B-FP8 \
--local-dir ./models/wan2.1-t2v-14b-fp8
Switch to the FP8 model in the ComfyUI workflow for approximately 2× speedup.
Flash Attention
Install Flash Attention for RTX 40/50 series to reduce VRAM usage and improve speed:
pip install flash-attn --no-build-isolation
Troubleshooting
Video is blurry/low quality: Increase steps to 50-80 and check CFG is set to 6.0
OOM on 8GB GPU: Enable FP8 quantization, reduce resolution to 320×480, or use --lowvram
Black video output: Check that VAE is correctly loaded in the workflow
Slow generation: Flash Attention not installed — add it with pip install flash-attn
Compare with Other Video Models
- LTX Video 2.3: Faster (10-45s per clip), lower quality — LTX guide →
- CogVideoX 1.5: Similar quality, slower — CogVideoX guide →