Wan 2.1 T2V on RTX 4090: Local Video Generation Guide

What You'll Build

Generate 5-second text-to-video clips locally using Wan 2.1 T2V — currently the most capable open-source video generation model for home GPUs. No cloud services required.

Benchmark: 4 minutes per 5-second 480P video · Peak VRAM: 8.19GB · See all data

Requirements

Component	Minimum	Tested
GPU	RTX 4070 (12GB)	RTX 4090 (24GB)
VRAM	8GB	24GB
RAM	32GB	64GB
Storage	30GB	30GB

Note: The 14B parameter model requires 8GB VRAM minimum. For lower-end GPUs, use quantized versions.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

2. Install Wan Video ComfyUI Nodes

cd ComfyUI/custom_nodes
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper
cd ComfyUI-WanVideoWrapper
pip install -r requirements.txt

3. Download Wan 2.1 T2V Model

The model is gated — accept the license at HuggingFace first:

huggingface-cli login
huggingface-cli download Wan-AI/Wan2.1-T2V-14B \
  --local-dir ./models/wan2.1-t2v-14b

Place downloaded files in ComfyUI/models/wan/

4. Download VAE and Text Encoder

# CLIP text encoder
huggingface-cli download openai/clip-vit-large-patch14 \
  --local-dir ./models/clip/

# Wan VAE
huggingface-cli download Wan-AI/Wan2.1-T2V-14B \
  wan_2.1_vae.pth --local-dir ./models/vae/

Running

Start ComfyUI:

python main.py --listen

Load the included workflow from ComfyUI-WanVideoWrapper/example_workflows/wan2.1_t2v.json

Recommended Settings

Parameter	Value	Notes
Steps	50	Wan needs more steps than image models
Resolution	480×832 (portrait) or 832×480	Start here, 720P possible with 24GB
Duration	5 seconds	Default, up to 10s possible
CFG	6.0	Standard guidance scale

Performance

Config	Time	VRAM	GPU
480P, 5s, no quantization	4 min	8.19GB	RTX 4090
480P, 5s, FP8 quant	~2-3 min	~6GB	RTX 4090
720P, 5s	~8-10 min	~16GB	RTX 4090

Source: approved community benchmarks. Full data →

Speed Optimization

Enable FP8 Quantization

For faster generation with minimal quality loss, use FP8 weights:

# Download FP8 quantized version
huggingface-cli download Wan-AI/Wan2.1-T2V-14B-FP8 \
  --local-dir ./models/wan2.1-t2v-14b-fp8

Switch to the FP8 model in the ComfyUI workflow for approximately 2× speedup.

Flash Attention

Install Flash Attention for RTX 40/50 series to reduce VRAM usage and improve speed:

pip install flash-attn --no-build-isolation

Troubleshooting

Video is blurry/low quality: Increase steps to 50-80 and check CFG is set to 6.0

OOM on 8GB GPU: Enable FP8 quantization, reduce resolution to 320×480, or use --lowvram

Black video output: Check that VAE is correctly loaded in the workflow

Slow generation: Flash Attention not installed — add it with pip install flash-attn

Compare with Other Video Models

LTX Video 2.3: Faster (10-45s per clip), lower quality — LTX guide →
CogVideoX 1.5: Similar quality, slower — CogVideoX guide →