What You'll Build
A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant that fits a consumer 16 GB card on a single GPU. The recipe walks through both the official native workflow (FP16 safetensors with ComfyUI's built-in offloading) and the QuantStack Q8 GGUF path for tighter VRAM.
Hardware data: RTX 4060 Ti 16GB · 720p (1280×704 / 704×1280) at 24 fps · See benchmark data
Why TI2V-5B and not the 14B variants? The official
Wan-Video/Wan2.2repo states T2V-A14B, I2V-A14B, S2V-14B and Animate-14B all need at least 80 GB single-GPU VRAM. Only TI2V-5B (5B dense, not MoE) is documented as a single consumer-GPU target — the 14B-class siblings are a 16× VRAM cliff and are out of scope for this card. The Wan-AI HF card lists exactly three released variants: T2V-A14B, I2V-A14B, and TI2V-5B.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM with ComfyUI native offloading (per the ComfyUI Wan2.2 tutorial: "should fit well on 8GB vram with the ComfyUI native offloading") | RTX 4060 Ti 16GB |
| RAM | 16 GB | 32 GB+ recommended (offloaded layers live in system RAM) |
| Storage | ~13 GB (FP16 5B weights + UMT5-XXL FP8 text encoder + VAE) or ~6 GB (Q8 GGUF + text encoder + VAE) | — |
| Software | ComfyUI (recent build with Wan 2.2 templates), Python 3.10+, PyTorch ≥ 2.4 | Default pip install torch wheel — no special CUDA pinning needed for Ada (sm_89) |
Installation
1. Install / update ComfyUI
Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". See the ComfyUI Wan 2.2 tutorial for the menu path.
2. Download model files (official FP16 path)
Per the ComfyUI native workflow docs, place these files in ComfyUI/models/:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
Or grab the raw weights from the official HF repo using the Wan-AI install guide:
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B
3. (Alternative) Install ComfyUI-GGUF and a Q8 quant
For lower peak VRAM and tighter offloading on a 16 GB card, use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF, which the card explicitly identifies as "a direct conversion of Wan-AI/Wan2.2-TI2V-5B".
Install city96/ComfyUI-GGUF:
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf
Download the Q8_0 file (5.4 GB) and place it in the unet folder:
huggingface-cli download QuantStack/Wan2.2-TI2V-5B-GGUF \
Wan2.2-TI2V-5B-Q8_0.gguf \
--local-dir ./ComfyUI/models/unet
In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) (under the bootleg category) and point it at the .gguf file.
The QuantStack repo publishes a per-quant file-size table ranging from Q2_K (1.85 GB) through Q8_0 (5.4 GB); Q8_0 is the recommended balance of quality and footprint on this card. Quality loss at Q8 is minimal vs FP16.
Running
After loading the Wan2.2 5B video generation template, enter a prompt in the positive-prompt node and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.
If you prefer the CLI path from the official repo:
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
python generate.py --task ti2v-5B --size 1280*704 \
--ckpt_dir ./Wan2.2-TI2V-5B \
--offload_model True --convert_model_dtype --t5_cpu \
--prompt "a panda playing guitar by a lake at sunset"
The --offload_model True, --convert_model_dtype, and --t5_cpu flags collectively reduce GPU memory usage; the Wan 2.2 README documents this exact command as runnable on "at least 24GB VRAM (e.g, RTX 4090 GPU)". On the 16 GB 4060 Ti the ComfyUI native-offload route (or the GGUF route) is the reliable path — the CLI's 24 GB minimum is for stock diffusers loading; ComfyUI's runtime offloader is what brings the 5B model down to the 8 GB floor the tutorial documents.
Results
- Speed: No 4060 Ti 16GB-specific benchmark for TI2V-5B has been published yet. For a same-architecture (Ada sm_89) reference point, Ikesan's lilting.ch walkthrough on an RTX 4060 8GB reports the 5B model generating in 95–179 seconds for short clips using FP8 weights with
--lowvramoffloading (the FP8 checkpoint loads as 5.28 GB). The 4060 Ti 16GB has ~6% higher memory bandwidth (288 vs 272 GB/s) and 2× the VRAM, so per-step times should be in the same range with less spillover pressure. Live data once a community benchmark lands: /check/wan-2-2/rtx-4060-ti-16gb. - VRAM usage: TI2V-5B is documented as fitting "well on 8GB vram with the ComfyUI native offloading" per the official ComfyUI tutorial; a 16 GB 4060 Ti has comfortable headroom in either FP16-with-offload or Q8 GGUF mode. Live data: /check/wan-2-2/rtx-4060-ti-16gb.
- Quality notes: TI2V-5B is the only Wan 2.2 variant the official repo documents as runnable on a single consumer GPU. Output is 720p (1280×704 or 704×1280) at 24 fps for 5 seconds. The 14B-class siblings (T2V-A14B, I2V-A14B, Animate-14B, S2V-14B) require 80 GB+ unquantized and are out of scope for this card.
For the full benchmark data, see /check/wan-2-2/rtx-4060-ti-16gb.
Troubleshooting
Out of memory at 720p with the FP16 path
Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial explicitly relies on it for the 8 GB minimum claim). If FP16 still OOMs at 720p, drop to the QuantStack Q8_0 GGUF (5.4 GB on disk) via the Unet Loader (GGUF) node — peak VRAM drops significantly and quality loss is minimal at Q8.
CLI path errors with "CUDA out of memory" at 16 GB
The stock python generate.py --task ti2v-5B ... command from the Wan-AI repo is documented as needing at least 24 GB on an RTX 4090 even with --offload_model True --convert_model_dtype --t5_cpu (per the official README). On the 16 GB 4060 Ti, prefer the ComfyUI route — the runtime's layer-wise offloader is what enables the 8 GB floor cited in the tutorial. The CLI path is reproduced above for reference but is not the recommended invocation on this card.
Want the 14B variants?
Per the official README, T2V-A14B, I2V-A14B, Animate-14B and S2V-14B all require 80 GB+ single-GPU VRAM. Community GGUF quants of the 14B Wan variants exist but are out of scope for this recipe — file a request on /contribute if you want a 14B-quantized recipe added once a stable consumer-card workflow lands.