What You'll Build
A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path on the RTX 4070, with a QuantStack Q8 GGUF alternative for tighter VRAM or colocation.
Hardware data: RTX 4070 (12GB GDDR6X VRAM, Ada Lovelace AD104 sm_89) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data
Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models; the official
Wan-Video/Wan2.2README annotates their single-GPU commands with "This command can run on a GPU with at least 80GB VRAM." — far past a 12 GB consumer card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 14B variants need a different recipe entirely.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM (per the official ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading") | RTX 4070 (12 GB, Ada Lovelace AD104 sm_89) |
| RAM | 16 GB | 32 GB+ recommended (offloading is RAM-heavy) |
| Storage | ~17 GiB (TI2V-5B FP16 weights 9.31 GiB + UMT5-XXL FP8 text encoder 6.27 GiB + Wan2.2-VAE 1.31 GiB) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates), Python 3.10+, PyTorch ≥ 2.4 (default cu124 stable wheel) | — |
The RTX 4070 is an Ada Lovelace card (AD104, sm_89). Unlike Blackwell (50-series) GPUs, no special CUDA-wheel selection is required — the default stable PyTorch wheel (cu124) already ships sm_89 kernels, and prebuilt FlashAttention wheels cover sm_89. (ComfyUI's native diffusion path uses PyTorch SDPA attention, so no FlashAttention wheel is needed for this recipe either way.)
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 12 GB RTX 4070 you have ~4 GB of headroom on top of that 8 GB working floor, so the runtime offloader runs in cache mode rather than aggressive-spillover mode.
2. Download model files for the native workflow
Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/:
# diffusion model → ComfyUI/models/diffusion_models/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
# text encoder → ComfyUI/models/text_encoders/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
# VAE → ComfyUI/models/vae/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors
The resulting layout matches what the official template expects:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
3. (Recommended on 12 GB) Install ComfyUI-GGUF and a Q8 quant
On a 12 GB card the FP16 native path fits via offloading, but a Q8 GGUF gives you more headroom for higher frame counts or for colocating another model on the card. Use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF. The repo's base_model is Wan-AI/Wan2.2-TI2V-5B — it is a direct GGUF conversion of the Wan-AI canonical card.
Install city96/ComfyUI-GGUF:
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf
Download the Q8_0 file (5.40 GB) and place it in the unet folder:
wget -P ComfyUI/models/unet \
https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q8_0.gguf
In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) and point it at the .gguf file. The text encoder and VAE remain unchanged. Per-tier sizes from the HF tree API: Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — Q8_0 is generally indistinguishable from FP16 for this model, and the lower tiers free even more of the 12 GB envelope if you need it.
Running
With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.
A command-line route exists in the official repo, but it is tuned for a larger card — see the Troubleshooting note below before trying it on 12 GB. The ComfyUI native workflow above is the path the documented 8 GB working floor refers to and the recommended one for the RTX 4070.
Results
- Speed: No first-party RTX 4070 measurement for TI2V-5B currently exists in the Wan-AI HF card, the official Wan-Video/Wan2.2 README, or the backend benchmark data (/check/wan-2-2/rtx-4070 returns no benchmark rows, verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU", and that claim names the RTX 4090 (a higher-bandwidth Ada card) as the consumer GPU class it targets — not the RTX 4070. The RTX 4070 has 5888 CUDA cores and ~504 GB/s memory bandwidth, both materially below the 4090, so we do not forward-extrapolate the 4090 timing to this card. Report your measured 4070 timing via /contribute to land a first-party benchmark row for this pair.
- VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~4 GB headroom on the 12 GB RTX 4070. The native FP16 diffusion file (9.31 GiB) plus UMT5-XXL FP8 text encoder (6.27 GiB) plus Wan2.2-VAE (1.31 GiB) add up to ~17 GiB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. Live data: /check/wan-2-2/rtx-4070.
- Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps — the Wan-Video/Wan2.2 README documents that the TI2V-5B model supports 720P video generation at 24 FPS. Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split — TI2V-5B does not).
For the full benchmark data, see /check/wan-2-2/rtx-4070.
Troubleshooting
Out of memory at 720p
Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial relies on it for the 8 GB minimum claim). On the 12 GB RTX 4070 the FP16 native path fits with the offloader engaged, but the margin is thinner than on a 16 GB card, so if you press against the envelope with other models loaded — or want longer clips — switch the diffusion model to a QuantStack Q8_0 GGUF (5.40 GB on disk) via the Unet Loader (GGUF) node, or step down to Q5_K_M (3.81 GB) / Q4_K_S (3.12 GB) for more headroom. Quality loss at Q8 is minimal.
The command-line generate.py path OOMs
The official repo documents a python generate.py --task ti2v-5B CLI command, but the Wan-Video/Wan2.2 README annotates it with "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." — it is tuned for a 24 GB card, not a 12 GB RTX 4070. A community report (Issue #90, titled "CUDA OOM in TI2V-5B with RTX 3090 24GB") even shows that exact CLI command OOMing at the VAE-decode stage on an RTX 3090 (Ampere, 24 GB) despite the full 24 GB. That report is specific to the CLI's static offload plan on the 24 GB Ampere tier and does not reflect the ComfyUI native route, whose runtime offloader uses a different, more aggressive memory plan and is the path the documented 8 GB floor refers to. On the RTX 4070, use the ComfyUI native workflow (Installation step 2) rather than the CLI.
Offloading throughput on the RTX 4070 (PCIe Gen4)
The RTX 4070 connects over PCIe Gen4 x16. When the ComfyUI offloader streams weights between system RAM and the GPU (which it does to hit the 8 GB working floor), the transfer rate is gated by that Gen4 link. The model still fits and runs, but the offloaded portion streams at roughly half the bandwidth of a PCIe Gen5 card — so keep fast system RAM and avoid loading other large models on the card during a render to minimize spillover. The on-card-resident path (Q8/Q5/Q4 GGUF, which fits more comfortably in 12 GB) reduces how much streaming happens at all.
No FP8 weight file for TI2V-5B
TI2V-5B is a dense single-checkpoint model — Wan-AI does not publish an FP8 weight path for it (unlike the 14B-A14B siblings, whose FP8-scaled experts ship via the Comfy-Org repackager). The RTX 4070's Ada sm_89 tensor cores natively support FP8 (E4M3/E5M2), so an FP8 quant would run at hardware speed if one shipped — but since none is published for the DiT, the canonical path is the FP16 safetensors file in Installation step 2, and the VRAM escape hatch is the GGUF quant ladder above — not FP8 (and not NVFP4, which is Blackwell-only). Do not look for a *_fp8_scaled.safetensors file for TI2V-5B; it does not exist. (Note the UMT5-XXL text encoder does ship in FP8 — that is a separate component, not the DiT.)