What You'll Build
A local pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable single-GPU on consumer hardware. The recipe walks through both the official python generate.py --task ti2v-5B CLI path and the ComfyUI native workflow that uses the same weights.
Hardware data: RTX 4090 (24GB VRAM) · 720p (1280×704 / 704×1280) at 24 fps · 5-second clip in under 9 minutes per Wan-AI's first-party documentation · See benchmark data
Why TI2V-5B and not the 14B variants? The official
Wan-Video/Wan2.2README states that T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B all require "at least 80GB VRAM" for single-GPU inference. Only TI2V-5B is documented as a single-consumer-GPU target — the HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense, not MoE — it has one fused checkpoint rather than the separate high-noise / low-noise expert files used by the 14B-A14B siblings. The 16× VRAM cliff between 5B (24 GB target) and 14B (80 GB target) is why this recipe pins the 5B variant; the 14B variants on a 24 GB card are covered separately in the Wan 2.2 14B recipe.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 24 GB VRAM (per the Wan-AI README: "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)") | RTX 4090 (24GB) |
| RAM | 16 GB | 32 GB+ recommended (CLI offloading is RAM-heavy) |
| Storage | ~13 GB (TI2V-5B FP16 weights + UMT5-XXL FP8 text encoder + Wan2.2-VAE) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates) OR the Wan-AI CLI, Python 3.10+, PyTorch ≥ 2.4 | Default pip install torch wheel — no special CUDA pinning needed for Ada (sm_89) |
Installation
1. (Path A — ComfyUI native) Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 24 GB 4090 you have substantial headroom and offloading runs in cache mode rather than spillover mode.
2. (Path A continued) Download model files for the native workflow
Per the ComfyUI native workflow docs, place these three files in ComfyUI/models/:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
3. (Path B — official CLI) Clone the Wan-AI repo and pull weights
The first-party single-GPU command from the Wan-AI README:
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B
Running
Path A (ComfyUI): With the template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
Path B (Wan-AI CLI), verbatim from the official README:
python generate.py --task ti2v-5B --size 1280*704 \
--ckpt_dir ./Wan2.2-TI2V-5B \
--offload_model True --convert_model_dtype --t5_cpu \
--prompt "a panda playing guitar by a lake at sunset"
The Wan-AI README documents this exact command as the supported single-GPU invocation: "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." The --offload_model True, --convert_model_dtype, and --t5_cpu flags are what keep peak resident VRAM inside 24 GB; the README also notes that "If you are running on a GPU with at least 80GB VRAM, you can remove the --offload_model True, --convert_model_dtype and --t5_cpu options to speed up execution" — relevant only if you later move to a workstation card.
Results
- Speed: Per the Wan-AI HF card: "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU" — and the Wan-AI README names the RTX 4090 as the GPU class this single-GPU command targets. The DB benchmark row at /check/wan-2-2/rtx-4090 shows a 12-min/clip figure from wavespeed.ai's LTX-2.3 vs WAN 2.2 comparison; the wavespeed.ai post does not specify which Wan 2.2 variant was tested and quotes a 20+ GB VRAM working envelope, so it almost certainly measures one of the 14B variants rather than TI2V-5B. The first-party "under 9 minutes" figure for the 5B variant is the right anchor for this recipe.
- VRAM usage: 24 GB target on the CLI path with the three offload flags above; ~8 GB floor on the ComfyUI native path with the runtime offloader engaged. Per the official ComfyUI tutorial: "should fit well on 8GB vram with the ComfyUI native offloading" — on the 4090 you have ~16 GB of headroom on top of that floor. Live data: /check/wan-2-2/rtx-4090.
- Quality notes: TI2V-5B is the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. Output is 720p (1280×704 or 704×1280) at 24 fps; clip length is configurable via the frame count. The 14B-class siblings (T2V-A14B, I2V-A14B, Animate-14B, S2V-14B) require 80 GB+ at native precision per the README and are out of scope for the single-GPU 4090 setup — see the dedicated Wan 2.2 14B recipe for the FP8 / Comfy-Org repackaged path that brings them into 24 GB via timestep-MoE expert swapping.
For the full benchmark data, see /check/wan-2-2/rtx-4090.
Troubleshooting
CLI path errors with "CUDA out of memory" at 24 GB
Confirm all three offload flags are present: --offload_model True --convert_model_dtype --t5_cpu. Removing any of the three drops you onto the 80 GB path the Wan-AI README reserves for workstation GPUs. If you still OOM with all three flags engaged, switch to Path A — ComfyUI's runtime offloader is more aggressive than the CLI's static three-flag setup and the official tutorial documents a much lower floor.
Faster generation on a 4090
The first-party "under 9 minutes" figure assumes no specific optimization. Community LoRAs (e.g. 4-step / 8-step distillation variants) and ComfyUI runtime optimizations have shipped for the 5B model after the README was published — track the Wan-AI HF repo and the official Wan2.2 GitHub for distilled variants. Report measured timings via /contribute — the DB benchmark row is currently variant-ambiguous and a TI2V-5B-explicit RTX 4090 measurement would replace it.
Want the 14B variants on this 24 GB card?
The native 14B path needs 80 GB. The Wan 2.2 14B recipe walks through the FP8-scaled Comfy-Org repackaged path that loads one timestep-MoE expert at a time (high-noise then low-noise, ~14 GB each at FP8) — that brings the 14B family into 24 GB with output quality close to the native precision.