What You'll Build
A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path on the RTX 5070, with a QuantStack Q8 GGUF alternative for tighter VRAM or colocation.
Hardware data: RTX 5070 (12GB GDDR7 VRAM, Blackwell GB205 sm_120) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data
Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models whose single-GPU command the official
Wan-Video/Wan2.2README documents with the note "This command can run on a GPU with at least 80GB VRAM." — far past a 12 GB consumer card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 14B variants need a different recipe entirely.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM (per the official ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading") | RTX 5070 (12 GB, Blackwell GB205 sm_120) |
| RAM | 16 GB | 32 GB+ recommended (offloading is RAM-heavy) |
| Storage | ~17 GiB (TI2V-5B FP16 weights 9.31 GiB + UMT5-XXL FP8 text encoder 6.27 GiB + Wan2.2-VAE 1.31 GiB) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates), Python 3.12, PyTorch ≥ 2.9 built against CUDA 12.8 / 12.9 (sm_120 kernel coverage) | torch 2.9 + CUDA 12.9 + Python 3.12 |
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 12 GB RTX 5070 you have ~4 GB of headroom on top of that 8 GB working floor, so the runtime offloader runs in cache mode rather than aggressive-spillover mode.
Blackwell sm_120 wheel note. The RTX 5070 is a Blackwell GB205 sm_120 card. Make sure your PyTorch build ships sm_120 kernels — that means PyTorch ≥ 2.9 built against CUDA 12.8/12.9 (the
cu128/cu129wheels). Older torch wheels without sm_120 kernels fail at model load on the 50-series. This is the same wheel requirement that applies to every Blackwell GPU (5060 Ti / 5070 / 5070 Ti / 5080 / 5090). For ComfyUI's native diffusion path the offloader uses PyTorch SDPA attention, so no FlashAttention-2 wheel is needed — the FA2 sm_120 kernel gap that bites direct-transformerssnippets does not apply here.
2. Download model files for the native workflow
Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/:
# diffusion model → ComfyUI/models/diffusion_models/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
# text encoder → ComfyUI/models/text_encoders/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
# VAE → ComfyUI/models/vae/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors
The resulting layout matches what the official template expects:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
3. (Recommended on 12 GB) Install ComfyUI-GGUF and a Q8 quant
On a 12 GB card the FP16 native path fits via offloading, but a Q8 GGUF gives you more headroom for higher frame counts or for colocating another model on the card. Use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF. The QuantStack README states it plainly: "This GGUF file is a direct conversion of" the Wan-AI canonical card, and the repo's base_model is Wan-AI/Wan2.2-TI2V-5B.
Install city96/ComfyUI-GGUF:
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf
Download the Q8_0 file (5.40 GB) and place it in the unet folder:
wget -P ComfyUI/models/unet \
https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q8_0.gguf
In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) and point it at the .gguf file. The text encoder and VAE remain unchanged. Per-tier sizes from the HF tree API: Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — Q8_0 is generally indistinguishable from FP16 for this model, and the lower tiers free even more of the 12 GB envelope if you need it.
Running
With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.
A command-line route exists in the official repo, but it is tuned for a larger card — see the Troubleshooting note below before trying it on 12 GB. The ComfyUI native workflow above is the path the documented 8 GB working floor refers to and the recommended one for the RTX 5070.
Results
- Speed: No first-party RTX 5070 measurement for TI2V-5B currently exists in the Wan-AI HF card, the official Wan-Video/Wan2.2 README, or the Wan2.2 GitHub Issues tracker (verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU", and that claim names the RTX 4090 (Ada sm_89) as the consumer GPU class it targets — not the RTX 5070. The 5070's Blackwell GB205 sm_120 generation has ~672 GB/s memory bandwidth and 6144 CUDA cores, both materially below the 4090, so we do not forward-extrapolate the 4090 timing to this card. Report your measured 5070 timing via /contribute to land a first-party benchmark row for this pair.
- VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~4 GB headroom on the 12 GB RTX 5070. The native FP16 diffusion file (9.31 GiB) plus UMT5-XXL FP8 text encoder (6.27 GiB) plus Wan2.2-VAE (1.31 GiB) add up to ~17 GiB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. Live data: /check/wan-2-2/rtx-5070.
- Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps — the Wan-Video/Wan2.2 README states that the TI2V-5B model supports 720P video generation at 24 FPS. Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split — TI2V-5B does not).
For the full benchmark data, see /check/wan-2-2/rtx-5070.
Troubleshooting
Model fails to load on the 50-series
The RTX 5070 is a Blackwell GB205 sm_120 card. If the model fails at load (or silently falls back to CPU), your PyTorch wheel is missing sm_120 kernels. Install a PyTorch ≥ 2.9 build against CUDA 12.8/12.9 (cu128/cu129) — those are the first wheels that ship sm_120 kernels for the 50-series. This is the same requirement across all Blackwell consumer cards.
Out of memory at 720p
Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial relies on it for the 8 GB minimum claim). On the 12 GB RTX 5070 the FP16 native path fits with the offloader engaged, but the margin is thinner than on a 16 GB card, so if you press against the envelope with other models loaded — or want longer clips — switch the diffusion model to a QuantStack Q8_0 GGUF (5.40 GB on disk) via the Unet Loader (GGUF) node, or step down to Q5_K_M (3.81 GB) / Q4_K_S (3.12 GB) for more headroom. Quality loss at Q8 is minimal.
The command-line generate.py path OOMs
The official repo documents a python generate.py --task ti2v-5B CLI command, but the Wan-Video/Wan2.2 README annotates it with "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." — it is tuned for a 24 GB card, not a 12 GB RTX 5070. A community report (Issue #90) even shows that exact CLI command OOMing at the VAE-decode stage on an RTX 3090 (Ampere, 24 GB) despite the full 24 GB. That report is specific to the CLI's static offload plan on Ampere and does not reflect the ComfyUI native route, whose runtime offloader uses a different, more aggressive memory plan and is the path the documented 8 GB floor refers to. On the RTX 5070, use the ComfyUI native workflow (Installation step 2) rather than the CLI.
No FP8 weight file for TI2V-5B
TI2V-5B is a dense single-checkpoint model — Wan-AI does not publish an FP8 weight path for it (unlike the 14B-A14B siblings, whose FP8-scaled experts ship via the Comfy-Org repackager). On Blackwell GB205 sm_120 the FP8 tensor cores are native and would run a quant at hardware speed if one shipped, but since none is published for TI2V-5B, the canonical path is the FP16 safetensors file in Installation step 2, and the VRAM escape hatch is the GGUF quant ladder above — not FP8. Do not look for a *_fp8_scaled.safetensors file for TI2V-5B; it does not exist. (Note the UMT5-XXL text encoder does ship in FP8 — that is a separate component, not the DiT.)