self-hosted/ai
§01·recipe · video

Wan 2.2 TI2V-5B on RTX 5080: 720p Text/Image-to-Video in ComfyUI

videointermediate8GB+ VRAMMay 29, 2026
models
tools
prerequisites
  • NVIDIA RTX 5080 (16GB GDDR7, Blackwell sm_120) — or any 8GB+ card with ComfyUI native offloading
  • Python 3.12 (recommended for Blackwell sm_120 sm_120-kernel wheel coverage)
  • PyTorch ≥ 2.9 built against CUDA 12.8 or 12.9 (the first PyTorch wheels that ship sm_120 kernels for the 50-series)
  • ComfyUI installed and updated to a build that ships the Wan 2.2 templates
  • 32GB+ system RAM recommended (offloading is RAM-heavy)

What You'll Build

A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path on the RTX 5080, with a QuantStack Q8 GGUF alternative for tighter VRAM or colocation.

Hardware data: RTX 5080 (16GB GDDR7 VRAM, Blackwell sm_120) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data

Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models whose single-GPU command the official Wan-Video/Wan2.2 README documents with the note "This command can run on a GPU with at least 80GB VRAM." — far past a 16 GB consumer card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 14B variants need a different recipe entirely.

Requirements

ComponentMinimumTested
GPU8 GB VRAM (per the official ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading")RTX 5080 (16 GB, Blackwell sm_120)
RAM16 GB32 GB+ recommended (offloading is RAM-heavy)
Storage~17 GB (TI2V-5B FP16 weights 9.31 GB + UMT5-XXL FP8 text encoder 6.27 GB + Wan2.2-VAE 1.31 GB)
SoftwareComfyUI (recent build with Wan 2.2 templates), Python 3.12, PyTorch ≥ 2.9 built against CUDA 12.8 / 12.9 (sm_120 kernel coverage)torch 2.9 + CUDA 12.9 + Python 3.12

Installation

1. Install / update ComfyUI

Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 16 GB 5080 you have comfortable headroom on top of that 8 GB floor, so offloading runs in cache mode rather than spillover mode.

Blackwell sm_120 wheel note. The RTX 5080 is a Blackwell sm_120 card. Make sure your PyTorch build ships sm_120 kernels — that means PyTorch ≥ 2.9 built against CUDA 12.8/12.9 (the cu128/cu129 wheels). Older torch wheels without sm_120 kernels fail at model load on the 50-series. This is the same wheel requirement that applies to every Blackwell GPU (5060 Ti / 5070 / 5080 / 5090).

2. Download model files for the native workflow

Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/:

# diffusion model → ComfyUI/models/diffusion_models/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors

# text encoder → ComfyUI/models/text_encoders/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

# VAE → ComfyUI/models/vae/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors

The resulting layout matches what the official template expects:

ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors

Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.

3. (Alternative) Install ComfyUI-GGUF and a Q8 quant

For lower peak VRAM headroom or to colocate another model on the card, use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF. The QuantStack README states it plainly: "This GGUF file is a direct conversion of" the Wan-AI canonical card, and the repo's base_model is Wan-AI/Wan2.2-TI2V-5B.

Install city96/ComfyUI-GGUF:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf

Download the Q8_0 file (5.40 GB) and place it in the unet folder:

wget -P ComfyUI/models/unet \
  https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q8_0.gguf

In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) and point it at the .gguf file. The text encoder and VAE remain unchanged. Per-tier sizes from the HF tree API: Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — Q8_0 is generally indistinguishable from FP16 for this model.

Running

With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.

For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.

If you prefer the command-line route, the official repo documents this exact invocation for TI2V-5B:

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
python generate.py --task ti2v-5B --size 1280*704 \
  --ckpt_dir ./Wan2.2-TI2V-5B \
  --offload_model True --convert_model_dtype --t5_cpu \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"

The Wan-AI README annotates that command with "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." — it is tuned for a 24 GB card. On the 16 GB 5080 the ComfyUI native route (Installation step 2) is the more reliable path, because ComfyUI's runtime offloader is more aggressive than the CLI's static three-flag offload setup and is the path the documented 8 GB working floor refers to.

Results

  • Speed: No first-party RTX 5080 measurement for TI2V-5B currently exists in the Wan-AI HF card, the official Wan2.2 README, or the Wan2.2 GitHub Issues tracker (verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU", and the README names the RTX 4090 (Ada sm_89) as the GPU class that single-GPU command targets. The 5080's Blackwell sm_120 generation has materially higher memory bandwidth (960 GB/s vs the 4090's class) and compute, so wall-clock should improve on that 9-minute figure — but no published 5080 number exists to quote, so we do not forward-extrapolate one. Report your measured 5080 timing via /contribute to land a first-party benchmark row for this pair.
  • VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~8 GB headroom on the 16 GB 5080. The native FP16 diffusion file (9.31 GB) plus UMT5-XXL FP8 text encoder (6.27 GB) plus Wan2.2-VAE (1.31 GB) add up to ~17 GB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. Live data: /check/wan-2-2/rtx-5080.
  • Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps — the Wan-AI README states "The TI2V-5B model supports 720P video generation at 24 FPS." Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split — TI2V-5B does not).

For the full benchmark data, see /check/wan-2-2/rtx-5080.

Troubleshooting

Model fails to load on the 50-series

The RTX 5080 is a Blackwell sm_120 card. If the model fails at load (or silently falls back to CPU), your PyTorch wheel is missing sm_120 kernels. Install a PyTorch ≥ 2.9 build against CUDA 12.8/12.9 (cu128/cu129) — those are the first wheels that ship sm_120 kernels for the 50-series. This is the same requirement across all Blackwell consumer cards.

Out of memory at 720p

Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial relies on it for the 8 GB minimum claim). If the FP16 path still presses against the 16 GB envelope while you have other models loaded, switch the diffusion model to a QuantStack Q8_0 GGUF (5.40 GB on disk) via the Unet Loader (GGUF) node — peak VRAM drops further and quality loss at Q8 is minimal.

No FP8 weight file for TI2V-5B

TI2V-5B is a dense single-checkpoint model — Wan-AI does not publish an FP8 weight path for it (unlike the 14B-A14B siblings, whose FP8-scaled experts ship via the Comfy-Org repackager). On Blackwell sm_120 the FP8 tensor cores are native and would run a quant at hardware speed if one shipped, but since none is published for TI2V-5B, the canonical path is the FP16 safetensors file in Installation step 2, and the VRAM escape hatch is the GGUF quant ladder above — not FP8. Do not look for a *_fp8_scaled.safetensors file for TI2V-5B; it does not exist.

Want the 14B variants?

Per the official README, the 14B / A14B single-GPU commands are annotated "This command can run on a GPU with at least 80GB VRAM." — out of scope for a 16 GB card at native precision. Community GGUF quants of the 14B Wan variants exist but need a separate workflow; file a request on /contribute if you want a 14B-quantized recipe added once a stable 16 GB workflow lands.