Wan 2.2 TI2V-5B on RTX 5070 Ti: 720p Text/Image-to-Video in ComfyUI

What You'll Build

A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path on the RTX 5070 Ti, with a QuantStack Q8 GGUF alternative for tighter VRAM or colocation.

Hardware data: RTX 5070 Ti (16GB GDDR7 VRAM, Blackwell GB203 sm_120) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data

Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models whose single-GPU command the official Wan-Video/Wan2.2 README documents with the note "This command can run on a GPU with at least 80GB VRAM." — far past a 16 GB consumer card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 14B variants need a different recipe entirely.

Requirements

Component	Minimum	Tested
GPU	8 GB VRAM (per the official ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading")	RTX 5070 Ti (16 GB, Blackwell GB203 sm_120)
RAM	16 GB	32 GB+ recommended (offloading is RAM-heavy)
Storage	~17 GiB (TI2V-5B FP16 weights 9.31 GiB + UMT5-XXL FP8 text encoder 6.27 GiB + Wan2.2-VAE 1.31 GiB)	—
Software	ComfyUI (recent build with Wan 2.2 templates), Python 3.12, PyTorch ≥ 2.9 built against CUDA 12.8 / 12.9 (sm_120 kernel coverage)	torch 2.9 + CUDA 12.9 + Python 3.12

Installation

1. Install / update ComfyUI

Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 16 GB 5070 Ti you have comfortable headroom on top of that 8 GB floor, so offloading runs in cache mode rather than spillover mode.

Blackwell sm_120 wheel note. The RTX 5070 Ti is a Blackwell GB203 sm_120 card. Make sure your PyTorch build ships sm_120 kernels — that means PyTorch ≥ 2.9 built against CUDA 12.8/12.9 (the cu128/cu129 wheels). Older torch wheels without sm_120 kernels fail at model load on the 50-series. This is the same wheel requirement that applies to every Blackwell GPU (5060 Ti / 5070 Ti / 5080 / 5090).

2. Download model files for the native workflow

Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/:

# diffusion model → ComfyUI/models/diffusion_models/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors

# text encoder → ComfyUI/models/text_encoders/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

# VAE → ComfyUI/models/vae/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors

The resulting layout matches what the official template expects:

ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors

Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.

3. (Alternative) Install ComfyUI-GGUF and a Q8 quant

For lower peak VRAM headroom or to colocate another model on the card, use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF. The QuantStack README states it plainly: "This GGUF file is a direct conversion of" the Wan-AI canonical card, and the repo's base_model is Wan-AI/Wan2.2-TI2V-5B.

Install city96/ComfyUI-GGUF:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf

Download the Q8_0 file (5.40 GB) and place it in the unet folder:

wget -P ComfyUI/models/unet \
  https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q8_0.gguf

In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) and point it at the .gguf file. The text encoder and VAE remain unchanged. Per-tier sizes from the HF tree API: Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — Q8_0 is generally indistinguishable from FP16 for this model.

Running

With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.

For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.

If you prefer the command-line route, the official repo documents this exact invocation for TI2V-5B:

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
python generate.py --task ti2v-5B --size 1280*704 \
  --ckpt_dir ./Wan2.2-TI2V-5B \
  --offload_model True --convert_model_dtype --t5_cpu \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"

The Wan-Video/Wan2.2 README annotates that command with "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." — it is tuned for a 24 GB card, not the 16 GB 5070 Ti. On the 16 GB 5070 Ti the ComfyUI native route (Installation step 2) is the more reliable path, because ComfyUI's runtime offloader is more aggressive than the CLI's static three-flag offload setup and is the path the documented 8 GB working floor refers to.

Results

Speed: No first-party RTX 5070 Ti measurement for TI2V-5B currently exists in the Wan-AI HF card, the official Wan-Video/Wan2.2 README, or the Wan2.2 GitHub Issues tracker (verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU", and the README names the RTX 4090 (Ada sm_89) as the GPU class that single-GPU command targets. The 5070 Ti's Blackwell GB203 sm_120 generation has high memory bandwidth (~896 GB/s) and native FP8 tensor cores, but no published 5070 Ti number exists to quote, so we do not forward-extrapolate one. Report your measured 5070 Ti timing via /contribute to land a first-party benchmark row for this pair.
VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~8 GB headroom on the 16 GB 5070 Ti. The native FP16 diffusion file (9.31 GiB) plus UMT5-XXL FP8 text encoder (6.27 GiB) plus Wan2.2-VAE (1.31 GiB) add up to ~17 GiB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. Live data: /check/wan-2-2/rtx-5070-ti.
Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps — the Wan-Video/Wan2.2 README states that the TI2V-5B model supports 720P video generation at 24 FPS. Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split — TI2V-5B does not).

For the full benchmark data, see /check/wan-2-2/rtx-5070-ti.

Troubleshooting

Model fails to load on the 50-series

The RTX 5070 Ti is a Blackwell GB203 sm_120 card. If the model fails at load (or silently falls back to CPU), your PyTorch wheel is missing sm_120 kernels. Install a PyTorch ≥ 2.9 build against CUDA 12.8/12.9 (cu128/cu129) — those are the first wheels that ship sm_120 kernels for the 50-series. This is the same requirement across all Blackwell consumer cards.

Out of memory at 720p

Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial relies on it for the 8 GB minimum claim). If the FP16 path still presses against the 16 GB envelope while you have other models loaded, switch the diffusion model to a QuantStack Q8_0 GGUF (5.40 GB on disk) via the Unet Loader (GGUF) node — peak VRAM drops further and quality loss at Q8 is minimal. Note that the command-line generate.py --task ti2v-5B path is tuned for a 24 GB card, and a community report (Issue #90) shows that exact CLI command OOMing at the VAE-decode stage on an RTX 3090 (Ampere, 24 GB) — another reason to prefer the ComfyUI native route on the 16 GB 5070 Ti, since its offloader uses a different memory plan.

No FP8 weight file for TI2V-5B

TI2V-5B is a dense single-checkpoint model — Wan-AI does not publish an FP8 weight path for it (unlike the 14B-A14B siblings, whose FP8-scaled experts ship via the Comfy-Org repackager). On Blackwell GB203 sm_120 the FP8 tensor cores are native and would run a quant at hardware speed if one shipped, but since none is published for TI2V-5B, the canonical path is the FP16 safetensors file in Installation step 2, and the VRAM escape hatch is the GGUF quant ladder above — not FP8. Do not look for a *_fp8_scaled.safetensors file for TI2V-5B; it does not exist. (Note the UMT5-XXL text encoder does ship in FP8 — that is a separate component, not the DiT.)

Want the 14B variants?

Per the official README, the 14B / A14B single-GPU commands are annotated "This command can run on a GPU with at least 80GB VRAM." — out of scope for a 16 GB card at native precision. Community GGUF quants of the 14B Wan variants exist but need a separate workflow; file a request on /contribute if you want a 14B-quantized recipe added once a stable 16 GB workflow lands.