How much VRAM does WAN 2.2 need?

About 8 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

Wan 2.2 TI2V-5B on RTX 3060: 720p Text/Image-to-Video in ComfyUI

What You'll Build

A local ComfyUI pipeline that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. The recipe walks through the ComfyUI native workflow as the canonical path on the RTX 3060, with a QuantStack Q8 GGUF alternative for tighter VRAM or colocation. The published command-line route is documented for the RTX 4090 and is reported to OOM on Ampere even with all offload flags engaged — so this recipe leads with the ComfyUI native path and demotes the CLI to Troubleshooting.

Hardware data: RTX 3060 (12GB GDDR6 VRAM, Ampere GA106 sm_86, 360 GB/s, 3584 CUDA cores) · 720p (1280×704 / 704×1280) at 24 fps via ComfyUI native offloading · See benchmark data

Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models; the Wan-AI HF card annotates the multi-GPU / no-offload path with "If you are running on a GPU with at least 80GB VRAM, you can remove the --offload_model True, --convert_model_dtype and --t5_cpu options to speed up execution." — far past a 12 GB consumer card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card is explicit: "In addition to the 27B MoE models, a 5B dense model, i.e., TI2V-5B, is released." TI2V-5B is dense (one fused checkpoint, no high-noise / low-noise expert split), so the timestep-MoE plumbing the 14B-A14B siblings need does not apply here. The 14B variants need a different recipe entirely.

Requirements

Component	Minimum	Tested
GPU	8 GB VRAM (per the official ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading")	RTX 3060 (12 GB GDDR6, Ampere GA106 sm_86, 360 GB/s, 3584 CUDA cores — specs per TechPowerUp)
RAM	16 GB	32 GB+ recommended (offloading is RAM-heavy)
Storage	~17 GiB (TI2V-5B FP16 weights 9.31 GiB + UMT5-XXL FP8 text encoder 6.27 GiB + Wan2.2-VAE 1.31 GiB)	—
Software	ComfyUI (recent build with Wan 2.2 templates), Python 3.10+, PyTorch ≥ 2.4 (default cu124/cu121 stable wheel)	—

The RTX 3060 is an Ampere card (GA106, sm_86). Unlike Blackwell (50-series) GPUs, no special CUDA-wheel selection is required — the default stable PyTorch wheel (cu124 or cu121) already ships sm_86 kernels, and prebuilt FlashAttention wheels cover sm_86. (ComfyUI's native diffusion path uses PyTorch SDPA attention, so no FlashAttention wheel is needed for this recipe either way.)

Installation

1. Install / update ComfyUI

Use a build new enough to expose the Wan 2.2 templates under Workflow → Browse Templates → Video → "Wan2.2 5B video generation". Per the official ComfyUI Wan 2.2 tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading" — on a 12 GB RTX 3060 you have ~4 GB of headroom on top of that 8 GB working floor, so the runtime offloader runs in cache mode rather than aggressive-spillover mode.

2. Download model files for the native workflow

Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/:

# diffusion model → ComfyUI/models/diffusion_models/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors

# text encoder → ComfyUI/models/text_encoders/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

# VAE → ComfyUI/models/vae/
wget https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors

The resulting layout matches what the official template expects:

ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors

Open the Wan2.2 5B video generation template, set the positive prompt, and queue. The Wan22ImageToVideoLatent node exposes resolution (1280×704 or 704×1280) and frame count.

3. (Recommended on 12 GB) Install ComfyUI-GGUF and a Q8 quant

On a 12 GB card the FP16 native path fits via offloading, but a Q8 GGUF gives you more headroom for higher frame counts or for colocating another model on the card. Use the community Q8 quant from QuantStack/Wan2.2-TI2V-5B-GGUF. The repo's base_model is Wan-AI/Wan2.2-TI2V-5B — it is a direct GGUF conversion of the Wan-AI canonical card.

Install city96/ComfyUI-GGUF:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf

Download the Q8_0 file (5.40 GB) and place it in the unet folder:

wget -P ComfyUI/models/unet \
  https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q8_0.gguf

In the official template, swap the Load Diffusion Model node for Unet Loader (GGUF) and point it at the .gguf file. The text encoder and VAE remain unchanged. Per-tier sizes from the HF tree API: Q4_K_S 3.12 GB, Q5_K_M 3.81 GB, Q6_K 4.21 GB, Q8_0 5.40 GB — Q8_0 is generally indistinguishable from FP16 for this model, and the lower tiers free even more of the 12 GB envelope if you need it.

Running

With the Wan2.2 5B video generation template loaded, enter a prompt, set resolution to 1280×704 for landscape or 704×1280 for portrait, set the frame count for the clip length you want (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.

For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.

A command-line route exists in the official repo, but it is tuned for a larger card and is reported to OOM on Ampere — see the Troubleshooting note below before trying it on the RTX 3060. The ComfyUI native workflow above is the path the documented 8 GB working floor refers to and the recommended one for this card.

Results

Speed: No first-party RTX 3060 measurement for TI2V-5B currently exists in the Wan-AI HF card or the backend benchmark data (/check/wan-2-2/rtx-3060 returns verdict: unknown with no benchmark rows, verified at write time). The HF card's only published timing is the model-wide claim "Without specific optimization, TI2V-5B can generate a 5-second 720P video in under 9 minutes on a single consumer-grade GPU", and that claim names the RTX 4090 ("can runs on single consumer-grade GPU such as the 4090") as the consumer GPU class it targets — not the RTX 3060. The RTX 3060 has 3584 CUDA cores and 360 GB/s memory bandwidth, both far below the 4090, so we do not forward-extrapolate the 4090 timing to this card. Report your measured 3060 timing via /contribute to land a first-party benchmark row for this pair.
VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (ComfyUI tutorial: "The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading"), leaving ~4 GB headroom on the 12 GB RTX 3060. The native FP16 diffusion file (9.31 GiB) plus UMT5-XXL FP8 text encoder (6.27 GiB) plus Wan2.2-VAE (1.31 GiB) add up to ~17 GiB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. Live data: /check/wan-2-2/rtx-3060.
Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps — the official ComfyUI tutorial and the Wan-AI HF card both document that the TI2V-5B model supports 720P video generation at 24 FPS. Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent across the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant (the 14B-A14B siblings expose that via the high-noise / low-noise expert split — TI2V-5B does not).

For the full benchmark data, see /check/wan-2-2/rtx-3060.

Troubleshooting

Out of memory at 720p

Make sure ComfyUI's native offloading is active (it is by default in recent builds — the official tutorial relies on it for the 8 GB minimum claim). On the 12 GB RTX 3060 the FP16 native path fits with the offloader engaged, but the margin is thinner than on a 16 GB card, so if you press against the envelope with other models loaded — or want longer clips — switch the diffusion model to a QuantStack Q8_0 GGUF (5.40 GB on disk) via the Unet Loader (GGUF) node, or step down to Q5_K_M (3.81 GB) / Q4_K_S (3.12 GB) for more headroom. Quality loss at Q8 is minimal.

The command-line `generate.py` path OOMs on Ampere

The Wan-AI HF card documents a single-GPU CLI command — python generate.py --task ti2v-5B --size 1280*704 --ckpt_dir ./Wan2.2-TI2V-5B --offload_model True --convert_model_dtype --t5_cpu --prompt "…" — annotated with "This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU)." — it is tuned for a 24 GB Ada card, not a 12 GB RTX 3060. More importantly, that exact CLI command has been reported to OOM on Ampere even on a 24 GB card: Wan2.2 Issue #90 "CUDA OOM in TI2V-5B with RTX 3090 24GB" (reporter ran "the examples given in README.md" on a single RTX 3090 and hit an OOM error; a second user reported the "same issue" on a separate run). No maintainer workaround has been posted to that issue as of writing. The RTX 3060 (Ampere GA106 sm_86) shares the same Ampere architecture generation as the 3090 in that report and has half the VRAM, so the CLI path is not viable here. Use the ComfyUI native workflow (Installation step 2) instead — ComfyUI's runtime offloader uses a different, more aggressive memory plan and is the path the documented 8 GB working floor refers to. Track Issue #90 for any future CLI workaround.

Offloading throughput on the RTX 3060 (PCIe Gen4)

The RTX 3060 connects over PCIe Gen4 x16. When the ComfyUI offloader streams weights between system RAM and the GPU (which it does to hit the 8 GB working floor), the transfer rate is gated by that Gen4 link. The model still fits and runs, but keep fast system RAM and avoid loading other large models on the card during a render to minimize spillover. The on-card-resident path (Q8/Q5/Q4 GGUF, which fits more comfortably in 12 GB) reduces how much streaming happens at all.

No FP8 weight file for TI2V-5B (and FP8 would not accelerate on Ampere anyway)

TI2V-5B is a dense single-checkpoint model — Wan-AI does not publish an FP8 weight path for the DiT (unlike the 14B-A14B siblings, whose FP8-scaled experts ship via the Comfy-Org repackager). So the canonical path is the FP16 safetensors file in Installation step 2, and the VRAM escape hatch is the GGUF quant ladder above — not FP8 (and not NVFP4, which is Blackwell-only). Even if an FP8 DiT quant existed, the RTX 3060's Ampere sm_86 tensor cores have no FP8 compute support — FP8 tensor-core acceleration first shipped on Ada (sm_89) and Hopper (sm_90), so an FP8 weight file would only load on the 3060 (the runtime dequantizes it to BF16/FP16 per op): a VRAM escape hatch at best, never a speed win. Do not look for a *_fp8_scaled.safetensors file for the TI2V-5B DiT; it does not exist. (Note the UMT5-XXL text encoder does ship in FP8 — that is a separate component, not the DiT, and on the 3060 it too is dequantized at compute time.)