What You'll Build
A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path and adds an explicit "spending the headroom" section, because TI2V-5B's ~8 GB working floor on a 32 GB Blackwell card leaves substantial spare VRAM for colocation, longer clips, or higher-effort runs.
Hardware data: RTX 5090 (32GB GDDR7 VRAM, Blackwell sm_120) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data
Why TI2V-5B and not the 14B variants? The official
Wan-Video/Wan2.2README reserves T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B for "a GPU with at least 80GB VRAM" at native precision. Only TI2V-5B is documented as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 32 GB envelope on the 5090 brings the 14B-A14B variants comfortably into reach via the FP8-scaled Comfy-Org repackaged path with timestep-MoE expert swapping (one ~14 GB expert resident at a time) — that's a separate recipe; see the Wan 2.2 A14B recipes catalogue for the FP8 path.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM (per the official ComfyUI tutorial: "should fit well on 8GB vram with the ComfyUI native offloading") | RTX 5090 (32 GB, Blackwell sm_120) |
| RAM | 16 GB | 32 GB+ recommended (offloading is RAM-heavy) |
| Storage | ~13 GB (TI2V-5B FP16 weights + UMT5-XXL FP8 text encoder + Wan2.2-VAE) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates), Python 3.12, PyTorch ≥ 2.9 built against CUDA 12.8 / 12.9 (sm_120 kernel coverage) | torch 2.9 + CUDA 12.9 + Python 3.12 |
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 32 GB 5090 you have ~24 GB of headroom on top of that 8 GB floor (see "Spending the headroom" below). The same wording is echoed verbatim in the ComfyUI-Wiki Wan2.2 guide: "The Wan2.2 5B version combined with ComfyUI's native offloading function can adapt well to 8GB VRAM, making it an ideal choice for beginner users."
2. Download model files for the native workflow
Per the ComfyUI native workflow docs, place these three files in ComfyUI/models/:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Or grab the raw weights from the Wan-AI HF repo using the Wan-AI install guide:
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B
Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
3. (Recommended for Blackwell) Install Sage Attention 2.2
Community-reported in HF discussion #4 on Wan2.2-Animate-14B as required for stable Wan 2.2 runs on the 50-series — the working combo named in that thread is PyTorch 2.9 with CUDA 12.9 on Python 3.12, plus Sage Attention 2.2. Prebuilt Windows wheels for the Blackwell sm_120 target: woct0rdho/SageAttention v2.2 release. On Linux, build from source against the same CUDA toolkit.
4. (Optional) GGUF quants for colocation or longer clips
If you want to colocate another model on the card (see "Spending the headroom" below), the QuantStack/Wan2.2-TI2V-5B-GGUF repackager publishes per-tier GGUF quants of the TI2V-5B diffusion weights — Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — with explicit link-back to the Wan-AI canonical card. Loaded via city96/ComfyUI-GGUF. The native FP16 path above is recommended for first-time use on the 32 GB 5090; GGUF is the escape hatch if you intend to colocate another model.
Running
With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.
Results
- Speed: No first-party RTX 5090 measurement for TI2V-5B currently exists in the Wan-AI HF card, the official Wan2.2 README, or the Wan2.2 GitHub Issues tracker (verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU" — the 5090 sits in the "single consumer-grade GPU" envelope and the Blackwell sm_120 compute generation is materially faster than the RTX 4090 (Ada sm_89) the README originally targeted, so wall-clock should improve on that 9-minute figure. The exact 5090 number is not published; report yours via /contribute to land a first-party benchmark row for this pair.
- VRAM usage: ~8 GB floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~24 GB headroom on the 5090. The native FP16 diffusion file plus UMT5-XXL FP8 text encoder plus Wan2.2-VAE add up to ~13 GB on disk; runtime peak with ComfyUI's offloader is well below that. Live data: /check/wan-2-2/rtx-5090.
- Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps; clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path — there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split; TI2V-5B does not).
For the full benchmark data, see /check/wan-2-2/rtx-5090.
Spending the headroom (32 GB envelope, ~24 GB spare)
Because TI2V-5B's working floor is ~8 GB and the 5090 has 32 GB, you have ~24 GB of spare VRAM during a render. Concrete ways to spend it on this card:
- Drop a GGUF quant to free even more VRAM and run a second model in parallel (e.g. a Q8 LLM for prompt extension via /contribute-style workflow loops). Q8_0 (5.40 GB on disk) is the conservative middle; Q5_K_M (3.81 GB) is the aggressive option. Use city96/ComfyUI-GGUF + QuantStack/Wan2.2-TI2V-5B-GGUF per Installation step 4.
- Step up to a 14B-A14B variant for higher quality — the timestep-MoE structure of the A14B family means one ~14 GB FP8-scaled expert is resident at a time (see the Wan 2.2 A14B recipes for the FP8 ComfyUI path that fits 24 GB; on 32 GB you have additional headroom for higher resolution, more frames, or to drop the FP8 quant in favor of a heavier dtype).
- Disable text-encoder offload in long-context prompts — the UMT5-XXL FP8 text encoder is ~5 GB; on the canonical 8 GB floor ComfyUI keeps it offloaded between samplers, but on 32 GB you can keep it resident for prompt-heavy workflows without spilling.
Troubleshooting
"Incompatibility error with the VGA 5xxx series" on first install
Reported by the original poster in HF discussion #4 on Wan2.2-Animate-14B for the 5060 Ti; the same root cause applies on the 5090. The Blackwell 50-series sm_120 needs a recent CUDA + PyTorch stack — the working combo reported in that thread is PyTorch 2.9 with CUDA 12.9 on Python 3.12. Older torch wheels (without sm_120 kernels) silently fall back to CPU or fail at model load. Default pip install torch from a fresh environment should now pull a wheel with sm_120 coverage; if you're using an older venv, recreate it against the CUDA 12.8 or 12.9 wheel index.
CLI path (python generate.py --task ti2v-5B …) is documented for 24 GB, not optimized for 32 GB
The published single-GPU command from the Wan-AI README — python generate.py --task ti2v-5B --size 1280*704 --ckpt_dir ./Wan2.2-TI2V-5B --offload_model True --convert_model_dtype --t5_cpu --prompt "…" — is documented for a 24 GB card and uses all three offload flags. On the 32 GB 5090 you can keep those flags engaged for safety, but the README notes "If you are running on a GPU with at least 80GB VRAM, you can remove the --offload_model True, --convert_model_dtype and --t5_cpu options to speed up execution" — 32 GB is in between, not at the 80 GB threshold the README clears for unconditional flag removal. Selectively dropping individual flags (e.g. removing --t5_cpu alone to keep the text encoder on GPU) may speed things up, but no first-party guidance covers the 32 GB middle ground — ComfyUI native (Path A above) remains the recommended canonical path, with the runtime offloader managing memory more dynamically than the CLI's static three-flag setup. Note that on the RTX 3090 (Ampere 24 GB) this same CLI command has been reported to OOM during VAE decode even with all three offload flags engaged (Wan2.2 Issue #90); the 5090's 32 GB envelope eliminates that particular concern.
No published Wan-AI/NVIDIA NVFP4 mirror for TI2V-5B as of writing
Blackwell sm_120 hardware accelerates both FP8 (E4M3/E5M2) and NVFP4 (microscaling FP4) — a real Blackwell-only speedup that prior consumer generations cannot use. However, as of writing the Wan-AI org has not published an NVFP4 quant for TI2V-5B specifically (a HF API search of the Wan-AI org lists only Wan-AI/Wan2.2-TI2V-5B and Wan-AI/Wan2.2-TI2V-5B-Diffusers, no -NVFP4 mirror). NVIDIA's catalog ships NVFP4 mirrors only for the 14B-A14B variants (nvidia/Wan2.2-T2V-A14B-Diffusers-NVFP4), not for TI2V-5B. Experimental community NVFP4 quants of TI2V-5B exist (e.g. jorismak/wan2.2-ti2v-5b-nvfp4mixed) but carry very low engagement (single-digit downloads) and lack the per-tier file-size + canonical-org link-back combination required for citation as a stable VRAM source. For the canonical TI2V-5B path on the 5090, stick with the FP16 native flow — the model is small enough that FP8/NVFP4 acceleration would yield a minor speedup at the cost of stability, and the 32 GB envelope means the FP16 path is the no-compromise option.
Sibling Issue #134 (8×5090 multi-GPU S2V-14B OOM) does not apply
Wan2.2 Issue #134 "s2v, 8*5090 generate done, but still oom finally" reports an OOM on an 8×RTX 5090 distributed setup running the S2V-14B variant with --ulysses_size 8 --dit_fsdp --t5_fsdp --offload_model True. The failure mode is specific to the 14B S2V variant running FSDP/Ulysses multi-GPU distribution — a different model variant and a different runtime path from this recipe's single-GPU TI2V-5B ComfyUI flow. No model-class-independent failure mode transfers; the Issue is mentioned only because a search for "5090" in the canonical repo's tracker surfaces it. Single-GPU TI2V-5B has no open 5090-tagged Issue as of writing.
Faster generation on a 5090
The "under 9 minutes" number on the Wan-AI HF card is a no-optimization figure for a generic "consumer-grade GPU" (the README names the 4090 as the GPU class the single-GPU CLI command targets). Community LoRAs (4-step / 8-step distillation variants) and ComfyUI runtime optimizations have shipped for the 5B model after the README was published — track the Wan-AI HF repo and the official Wan2.2 GitHub for distilled variants. Report measured RTX 5090 timings via /contribute — the DB currently has no benchmark row for this pair.