self-hosted/ai
§01·recipe · video

Wan 2.1 T2V on RTX 4090: Local Video Generation Guide

videobeginner8GB+ VRAMMay 13, 2026
models
tools
prerequisites
  • NVIDIA GPU with ≥ 8GB VRAM (RTX 4090 tested)
  • ComfyUI installed
  • Python 3.10+
  • ~30GB free storage for model weights

What You'll Build

Generate 5-second text-to-video clips locally using Wan 2.1 T2V — currently the most capable open-source video generation model for home GPUs. No cloud services required.

Benchmark: 4 minutes per 5-second 480P video · Peak VRAM: 8.19GB · See all data

Requirements

ComponentMinimumTested
GPURTX 4070 (12GB)RTX 4090 (24GB)
VRAM8GB24GB
RAM32GB64GB
Storage30GB30GB

Note: The 14B parameter model requires 8GB VRAM minimum. For lower-end GPUs, use quantized versions.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

2. Install Wan Video ComfyUI Nodes

cd ComfyUI/custom_nodes
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper
cd ComfyUI-WanVideoWrapper
pip install -r requirements.txt

3. Download Wan 2.1 T2V Model

The model is gated — accept the license at HuggingFace first:

huggingface-cli login
huggingface-cli download Wan-AI/Wan2.1-T2V-14B \
  --local-dir ./models/wan2.1-t2v-14b

Place downloaded files in ComfyUI/models/wan/

4. Download VAE and Text Encoder

# CLIP text encoder
huggingface-cli download openai/clip-vit-large-patch14 \
  --local-dir ./models/clip/

# Wan VAE
huggingface-cli download Wan-AI/Wan2.1-T2V-14B \
  wan_2.1_vae.pth --local-dir ./models/vae/

Running

Start ComfyUI:

python main.py --listen

Load the included workflow from ComfyUI-WanVideoWrapper/example_workflows/wan2.1_t2v.json

Recommended Settings

ParameterValueNotes
Steps50Wan needs more steps than image models
Resolution480×832 (portrait) or 832×480Start here, 720P possible with 24GB
Duration5 secondsDefault, up to 10s possible
CFG6.0Standard guidance scale

Performance

ConfigTimeVRAMGPU
480P, 5s, no quantization4 min8.19GBRTX 4090
480P, 5s, FP8 quant~2-3 min~6GBRTX 4090
720P, 5s~8-10 min~16GBRTX 4090

Source: approved community benchmarks. Full data →

Speed Optimization

Enable FP8 Quantization

For faster generation with minimal quality loss, use FP8 weights:

# Download FP8 quantized version
huggingface-cli download Wan-AI/Wan2.1-T2V-14B-FP8 \
  --local-dir ./models/wan2.1-t2v-14b-fp8

Switch to the FP8 model in the ComfyUI workflow for approximately 2× speedup.

Flash Attention

Install Flash Attention for RTX 40/50 series to reduce VRAM usage and improve speed:

pip install flash-attn --no-build-isolation

Troubleshooting

Video is blurry/low quality: Increase steps to 50-80 and check CFG is set to 6.0

OOM on 8GB GPU: Enable FP8 quantization, reduce resolution to 320×480, or use --lowvram

Black video output: Check that VAE is correctly loaded in the workflow

Slow generation: Flash Attention not installed — add it with pip install flash-attn

Compare with Other Video Models