What You'll Build
A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant that fits a consumer 16GB card. The recipe walks through both the official native workflow (FP16 safetensors with built-in offloading) and the QuantStack Q8 GGUF path for tighter VRAM.
Hardware data: RTX 5060 Ti (16GB VRAM) · ~7 min per 1024×574 5s clip with Q8 GGUF + Sage Attention 2.2 · See benchmark data
Why TI2V-5B and not the 14B variants? The official
Wan-Video/Wan2.2repo states T2V-A14B, I2V-A14B, S2V-14B and Animate-14B all need "at least 80GB VRAM" on a single GPU. Only TI2V-5B (5B dense, not MoE) is documented as a single-consumer-GPU target.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8GB VRAM (per ComfyUI native offloading note) | RTX 5060 Ti (16GB) |
| RAM | 16GB | 64GB (recommended by community for Q8 + Sage Attention) |
| Storage | ~13GB (FP16 5B weights + VAE + text encoder) or ~6GB (Q8 GGUF + VAE + text encoder) | — |
| Software | ComfyUI (recent build), Python 3.10+, PyTorch ≥ 2.4 | torch 2.9 cu12.9 + Python 3.12 |
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". See the ComfyUI Wan 2.2 tutorial for the menu path.
2. Download model files (official FP16 path)
Per the ComfyUI native workflow docs, place these files in ComfyUI/models/:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Or grab the raw weights from the official HF repo using the Wan-AI install guide:
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B
3. (Alternative) Install ComfyUI-GGUF and a Q8 quant
For lower peak VRAM and faster offloading on a 16GB card, use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF (the repo confirms it is "a direct conversion of Wan-AI/Wan2.2-TI2V-5B").
Install city96/ComfyUI-GGUF:
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf
Download the Q8_0 file (5.4 GB) and place it in the unet folder:
huggingface-cli download QuantStack/Wan2.2-TI2V-5B-GGUF \
Wan2.2-TI2V-5B-Q8_0.gguf \
--local-dir ./ComfyUI/models/unet
In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) (under the bootleg category) and point it at the .gguf file.
4. (Recommended) Install Sage Attention 2.2
Community-reported in the Wan2.2-Animate-14B discussion #4 as required for the ~7-minute / 5-second runtime on a 5060 Ti. Prebuilt Windows wheels: woct0rdho/SageAttention v2.2 release.
Running
After loading the Wan2.2 5B video generation template, enter a prompt in the positive-prompt node and queue. The Wan22ImageToVideoLatent node exposes resolution and frame count.
If you prefer the CLI path from the official repo:
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
python generate.py --task ti2v-5B --size 1280*704 \
--ckpt_dir ./Wan2.2-TI2V-5B \
--offload_model True --convert_model_dtype --t5_cpu \
--prompt "a panda playing guitar by a lake at sunset"
The CLI command above is the exact invocation the Wan 2.2 README documents for TI2V-5B on a 24GB+ card; on a 16GB 5060 Ti the ComfyUI route with offloading (or the GGUF route) is the more reliable option.
Results
- Speed: ~7 minutes for a 1024×574 5-second clip using Q8 GGUF + Sage Attention 2.2 on an RTX 5060 Ti, per Ricardo130's report in HF discussion #4
- VRAM usage: TI2V-5B is documented as fitting "well on 8GB vram with the ComfyUI native offloading" per the official ComfyUI tutorial; a 16GB 5060 Ti has comfortable headroom in either FP16-with-offload or Q8 GGUF mode. Live data: /check/wan-2-2/rtx-5060-ti
- Quality notes: TI2V-5B is the only Wan 2.2 variant the official repo documents as runnable on a single consumer GPU. The 14B-class siblings (T2V-A14B, I2V-A14B, Animate-14B, S2V-14B) are documented as needing 80GB+ and are out of scope for this card unquantized. The TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps for 5 seconds.
For the full benchmark data, see /check/wan-2-2/rtx-5060-ti.
Troubleshooting
"Incompatibility error with the VGA 5xxx series" on first install
Reported by the original poster in discussion #4. The Blackwell 50-series needs a recent CUDA + PyTorch stack. The working combo reported in that thread is PyTorch 2.9 with CUDA 12.9 on Python 3.12. Older torch wheels (without sm_120 kernels) silently fall back to CPU or fail at model load.
Out of memory on 16GB even with FP16
Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial explicitly relies on it for the 8GB minimum claim). If the FP16 path still OOMs at 720p, drop to the QuantStack Q8_0 GGUF (5.4 GB on disk) via the Unet Loader (GGUF) node — peak VRAM drops significantly and quality loss is minimal at Q8.
Want the 14B variants?
Per the official README, the 14B variants require 80GB+ single-GPU VRAM. Community GGUF quants of the 14B Wan variants exist but are out of scope for this recipe — file a request on /contribute if you want a 14B-quantized recipe added once a stable workflow lands.