What You'll Build
A local pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the dense single-checkpoint variant of the Wan 2.2 family and the only one the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the recommended path on Ampere (RTX 3090), because the official python generate.py CLI command has been reported to OOM on RTX 3090 24GB even with all the published offload flags (see Troubleshooting below).
Hardware data: RTX 3090 (24GB VRAM, Ampere sm_86) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data
Why TI2V-5B and not the 14B variants? The official
Wan-Video/Wan2.2README reserves T2V-A14B, I2V-A14B, S2V-14B and Animate-14B for "a GPU with at least 80GB VRAM" at native precision; only TI2V-5B is documented as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here — there is no FP8 path or per-expert swap to configure. The 14B variants on a 24 GB card need a different recipe entirely.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 24 GB VRAM. The Wan-AI README documents the CLI single-GPU path for the RTX 4090 specifically ("This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)"); on the RTX 3090 use the ComfyUI native path instead. | RTX 3090 (24GB, Ampere sm_86) |
| RAM | 16 GB | 32 GB recommended |
| Storage | ~13 GB (TI2V-5B FP16 weights + UMT5-XXL FP8 text encoder + Wan2.2-VAE) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates), Python 3.10+, PyTorch ≥ 2.4 | Default pip install torch wheel — no special CUDA pinning needed for Ampere (sm_86) |
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 24 GB 3090 you have substantial headroom and offloading runs in cache mode rather than spillover mode. The same wording is echoed verbatim in the ComfyUI-Wiki Wan2.2 guide: "The Wan2.2 5B version combined with ComfyUI's native offloading function can adapt well to 8GB VRAM, making it an ideal choice for beginner users."
2. Download model files for the native workflow
Per the ComfyUI native workflow docs, place these three files in ComfyUI/models/:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
3. (Optional fallback) GGUF quants for tighter VRAM headroom
If you want to colocate another model on the card or run longer clips, the QuantStack/Wan2.2-TI2V-5B-GGUF repackager publishes per-tier GGUF quants of the TI2V-5B diffusion weights — Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — with explicit link-back to the Wan-AI canonical card. Loaded via city96/ComfyUI-GGUF. The native FP16 path above is recommended for first-time use on a 24 GB card; GGUF is the escape hatch for VRAM pressure.
Running
With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.
Results
- Speed: No published RTX 3090 measurement for TI2V-5B currently exists in trusted sources. As a sibling reference, the Wan-AI HF card reports the model can "generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU" and the Wan-AI README names the RTX 4090 as the GPU class that single-GPU command targets. Video-diffusion DiT forward passes are compute-bound rather than memory-bound, and the RTX 3090 (Ampere sm_86) has roughly half the FP16 compute throughput of the RTX 4090 (Ada sm_89), so expect substantially longer wall-clock per clip on the 3090 — likely in the 15-minute neighbourhood for the same 5-second 720p workload, though no first-party number has been published. Measured 3090 timings via /contribute would replace this estimate with a citation.
- VRAM usage: ~8 GB floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~16 GB headroom on the 3090. The native FP16 diffusion file plus UMT5-XXL FP8 text encoder plus Wan2.2-VAE add up to ~13 GB on disk; runtime peak with ComfyUI's offloader is well below that. Live data: /check/wan-2-2/rtx-3090.
- Quality notes: TI2V-5B is the only Wan 2.2 variant runnable on a single 24 GB consumer GPU. Output is 720p (1280×704 or 704×1280) at 24 fps; clip length is configurable via frame count. The 14B-class siblings (T2V-A14B, I2V-A14B, Animate-14B, S2V-14B) need 80 GB+ at native precision and are out of scope for the RTX 3090 single-GPU setup.
For the full benchmark data, see /check/wan-2-2/rtx-3090.
Troubleshooting
CLI path (python generate.py --task ti2v-5B …) OOMs on the RTX 3090
The published single-GPU command from the Wan-AI README — python generate.py --task ti2v-5B --size 1280*704 --ckpt_dir ./Wan2.2-TI2V-5B --offload_model True --convert_model_dtype --t5_cpu --prompt "…" — is documented for the RTX 4090. On the RTX 3090 (24 GB Ampere) it has been reported to OOM during VAE decode even with all three offload flags engaged: Wan2.2 Issue #90 "CUDA OOM in TI2V-5B with RTX 3090 24GB" (reporter on RTX 3090 24GB, error "CUDA out of memory. Tried to allocate 2.60 GiB. GPU 0 has a total capacity of 23.68 GiB of which 892.75 MiB is free"; confirmed by a second user on a separate run). No maintainer workaround has been posted to that issue as of writing. Use the ComfyUI native path (Path A above) instead — ComfyUI's runtime offloader is more aggressive than the CLI's static three-flag setup and the official tutorial documents an 8 GB working floor regardless of GPU. Track Issue #90 for any future CLI workaround.
Tighter VRAM headroom (colocating with another model, longer clips)
Switch the diffusion model to a GGUF quant from QuantStack/Wan2.2-TI2V-5B-GGUF (Q4_K_S 3.12 GB / Q5_K_M 3.81 GB / Q6_K 4.21 GB / Q8_0 5.40 GB) loaded via city96/ComfyUI-GGUF. The text encoder and VAE remain unchanged. Quality is generally indistinguishable at Q8_0; Q5_K_M and Q6_K are the conservative middle.
FP8 weight files in the directory but not used in this recipe
TI2V-5B is a dense single-checkpoint model — there is no FP8 weight path published by Wan-AI for it (unlike the 14B-A14B siblings which ship FP8-scaled experts via the Comfy-Org repackager). On the RTX 3090 (Ampere sm_86), FP8 tensor-core compute does not exist regardless — FP8 first shipped on Ada sm_89 and Hopper sm_90, so even where FP8 weight files are available for other models in this family, the 3090 would dequantize them to BF16/FP16 on the fly for compute. For TI2V-5B specifically, this distinction is moot: the canonical path is the FP16 safetensors file referenced in Installation step 2.
Faster generation on a 3090
The "under 9 minutes" number on the Wan-AI HF card is a no-optimization figure for a 4090. Community LoRAs (4-step / 8-step distillation variants) and ComfyUI runtime optimizations have shipped for the 5B model after the README was published — track the Wan-AI HF repo and the official Wan2.2 GitHub for distilled variants. Report measured RTX 3090 timings via /contribute — the DB currently has no benchmark row for this pair.