What You'll Build
A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path on the RTX 5080, with a QuantStack Q8 GGUF alternative for tighter VRAM or colocation.
Hardware data: RTX 5080 (16GB GDDR7 VRAM, Blackwell sm_120) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data
Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models whose single-GPU command the official
Wan-Video/Wan2.2README documents with the note "This command can run on a GPU with at least 80GB VRAM." — far past a 16 GB consumer card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 14B variants need a different recipe entirely.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM (per the official ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading") | RTX 5080 (16 GB, Blackwell sm_120) |
| RAM | 16 GB | 32 GB+ recommended (offloading is RAM-heavy) |
| Storage | ~17 GB (TI2V-5B FP16 weights 9.31 GB + UMT5-XXL FP8 text encoder 6.27 GB + Wan2.2-VAE 1.31 GB) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates), Python 3.12, PyTorch ≥ 2.9 built against CUDA 12.8 / 12.9 (sm_120 kernel coverage) | torch 2.9 + CUDA 12.9 + Python 3.12 |
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 16 GB 5080 you have comfortable headroom on top of that 8 GB floor, so offloading runs in cache mode rather than spillover mode.
Blackwell sm_120 wheel note. The RTX 5080 is a Blackwell sm_120 card. Make sure your PyTorch build ships sm_120 kernels — that means PyTorch ≥ 2.9 built against CUDA 12.8/12.9 (the
cu128/cu129wheels). Older torch wheels without sm_120 kernels fail at model load on the 50-series. This is the same wheel requirement that applies to every Blackwell GPU (5060 Ti / 5070 / 5080 / 5090).
2. Download model files for the native workflow
Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/:
# diffusion model → ComfyUI/models/diffusion_models/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
# text encoder → ComfyUI/models/text_encoders/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
# VAE → ComfyUI/models/vae/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors
The resulting layout matches what the official template expects:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
3. (Alternative) Install ComfyUI-GGUF and a Q8 quant
For lower peak VRAM headroom or to colocate another model on the card, use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF. The QuantStack README states it plainly: "This GGUF file is a direct conversion of" the Wan-AI canonical card, and the repo's base_model is Wan-AI/Wan2.2-TI2V-5B.
Install city96/ComfyUI-GGUF:
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf
Download the Q8_0 file (5.40 GB) and place it in the unet folder:
wget -P ComfyUI/models/unet \
https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q8_0.gguf
In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) and point it at the .gguf file. The text encoder and VAE remain unchanged. Per-tier sizes from the HF tree API: Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — Q8_0 is generally indistinguishable from FP16 for this model.
Running
With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.
If you prefer the command-line route, the official repo documents this exact invocation for TI2V-5B:
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
python generate.py --task ti2v-5B --size 1280*704 \
--ckpt_dir ./Wan2.2-TI2V-5B \
--offload_model True --convert_model_dtype --t5_cpu \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"
The Wan-AI README annotates that command with "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." — it is tuned for a 24 GB card. On the 16 GB 5080 the ComfyUI native route (Installation step 2) is the more reliable path, because ComfyUI's runtime offloader is more aggressive than the CLI's static three-flag offload setup and is the path the documented 8 GB working floor refers to.
Results
- Speed: No first-party RTX 5080 measurement for TI2V-5B currently exists in the Wan-AI HF card, the official Wan2.2 README, or the Wan2.2 GitHub Issues tracker (verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU", and the README names the RTX 4090 (Ada sm_89) as the GPU class that single-GPU command targets. The 5080's Blackwell sm_120 generation has materially higher memory bandwidth (960 GB/s vs the 4090's class) and compute, so wall-clock should improve on that 9-minute figure — but no published 5080 number exists to quote, so we do not forward-extrapolate one. Report your measured 5080 timing via /contribute to land a first-party benchmark row for this pair.
- VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~8 GB headroom on the 16 GB 5080. The native FP16 diffusion file (9.31 GB) plus UMT5-XXL FP8 text encoder (6.27 GB) plus Wan2.2-VAE (1.31 GB) add up to ~17 GB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. Live data: /check/wan-2-2/rtx-5080.
- Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps — the Wan-AI README states "The TI2V-5B model supports 720P video generation at 24 FPS." Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split — TI2V-5B does not).
For the full benchmark data, see /check/wan-2-2/rtx-5080.
Troubleshooting
Model fails to load on the 50-series
The RTX 5080 is a Blackwell sm_120 card. If the model fails at load (or silently falls back to CPU), your PyTorch wheel is missing sm_120 kernels. Install a PyTorch ≥ 2.9 build against CUDA 12.8/12.9 (cu128/cu129) — those are the first wheels that ship sm_120 kernels for the 50-series. This is the same requirement across all Blackwell consumer cards.
Out of memory at 720p
Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial relies on it for the 8 GB minimum claim). If the FP16 path still presses against the 16 GB envelope while you have other models loaded, switch the diffusion model to a QuantStack Q8_0 GGUF (5.40 GB on disk) via the Unet Loader (GGUF) node — peak VRAM drops further and quality loss at Q8 is minimal.
No FP8 weight file for TI2V-5B
TI2V-5B is a dense single-checkpoint model — Wan-AI does not publish an FP8 weight path for it (unlike the 14B-A14B siblings, whose FP8-scaled experts ship via the Comfy-Org repackager). On Blackwell sm_120 the FP8 tensor cores are native and would run a quant at hardware speed if one shipped, but since none is published for TI2V-5B, the canonical path is the FP16 safetensors file in Installation step 2, and the VRAM escape hatch is the GGUF quant ladder above — not FP8. Do not look for a *_fp8_scaled.safetensors file for TI2V-5B; it does not exist.
Want the 14B variants?
Per the official README, the 14B / A14B single-GPU commands are annotated "This command can run on a GPU with at least 80GB VRAM." — out of scope for a 16 GB card at native precision. Community GGUF quants of the 14B Wan variants exist but need a separate workflow; file a request on /contribute if you want a 14B-quantized recipe added once a stable 16 GB workflow lands.