How much VRAM does Wan 2.2 14B need?

About 24 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

Wan 2.2 T2V-A14B on RTX 5090: 720p text-to-video in ComfyUI with FP8 scaled weights (Blackwell)

What You'll Build

A working ComfyUI text-to-video pipeline that runs the Wan 2.2 T2V-A14B variant — Alibaba's two-expert (high-noise + low-noise) 14B-active-per-step video model — on a single RTX 5090, producing 5.4-second clips at 1280×720, 81 frames. The native upstream code path requires 80 GB VRAM per the official model card; the consumer-card path is the Comfy-Org repackaged FP8 scaled workflow, which swaps the two experts in and out of GPU memory sequentially during denoising. On the 5090's 32 GB envelope this same workflow that ran razor-thin on a 24 GB card now has ~8 GB of headroom — and Blackwell's native FP8 tensor cores compute the FP8 weights without dequantizing back to BF16, unlike Ampere where the FP8 format is purely a storage win.

Hardware data: RTX 5090 (32GB VRAM) · ~24 GB peak at FP8 / 1280×720 / 81 frames / 30 steps · See benchmark data

⚠️ Variant pin — this recipe is specifically for T2V-A14B. The Wan 2.2 family ships several variants under one brand: T2V-A14B (text-to-video, this recipe), I2V-A14B (image-to-video), Animate-14B (motion-from-image), S2V-14B (speech-to-video), and TI2V-5B (a dense fused 5B variant — different architecture, fits 16 GB cards). They share family branding but ship different weights and different ComfyUI workflows. If you want any of the others, the install steps below do not apply verbatim — start from the official Wan 2.2 GitHub instead.

ℹ️ Timestep-MoE, not classical sparse MoE. The "A14B from 27B total" framing in the model card describes a two-expert timestep MoE: per the HF card, "a high-noise expert for the early stages, focusing on overall layout; and a low-noise expert for the later stages, refining video details." The switch happens once per generation at a fixed SNR threshold — not per token via a learned router. The practical consequence: only one 14B expert is resident in VRAM at any moment (the other is on disk/CPU), so peak VRAM = one-expert × bytes-per-param + text encoder + VAE + activations. Do NOT size for 27B resident.

ℹ️ FP8 weight + FP8 compute on Blackwell sm_120 — a real speedup, not just a memory escape hatch. The RTX 5090's 5th-generation tensor cores include hardware-accelerated FP8 paths per the RunPod RTX 5090 architecture writeup ("5th generation [tensor cores] with significantly improved FP4 and FP8 throughput for AI inference"). Unlike the RTX 3090 sibling — where Ampere sm_86 has no FP8 tensor cores and the FP8 e4m3fn safetensors get dequantized to BF16 at compute time (memory savings only, no speed boost) — Blackwell computes the FP8 weights natively. The FP8 path is therefore the recommended primary on this card, not just a fallback.

Requirements

Component	Minimum	Tested
GPU	24 GB VRAM (Blackwell sm_120 to benefit from native FP8)	RTX 5090 (32 GB)
RAM	32 GB	—
Storage	~30 GB (two 14.3 GB FP8 experts + text encoder + VAE)	—
Software	ComfyUI (recent build with Wan2.2 templates)	—
Python	3.10+	—
PyTorch	2.7+ with CUDA 12.8 (`cu128`)	—

ℹ️ cu128 wheel matters on Blackwell. The 5090's sm_120 architecture needs PyTorch built against CUDA 12.8. The default pip install torch may not pull a cu128 build depending on your platform — install explicitly with pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision. ComfyUI's startup banner will print the active CUDA / sm-architecture list; verify sm_120 is in it before proceeding.

Installation

1. Install or update ComfyUI

If you don't have ComfyUI yet, follow the official install. If you already have it, update to the latest version — the Wan 2.2 templates and FP8-scaled loader support landed in mid-2025 and you'll want a recent ComfyUI for both. Per the official ComfyUI Wan 2.2 tutorial, the Wan 2.2 14B T2V template is available via Workflow → Browse Templates → Video.

2. Download the two FP8 scaled expert weights

The T2V-A14B model is structured as two 14B experts that activate at different denoising timesteps — the high-noise expert handles early layout and motion; the low-noise expert handles late-stage detail. ComfyUI's workflow loads each into VRAM sequentially (not simultaneously), which is what keeps peak memory at ~24 GB even on a 24 GB card; on the 5090's 32 GB envelope the same workflow leaves ~8 GB headroom for the text encoder + VAE + activations.

Download both files from Comfy-Org/Wan_2.2_ComfyUI_Repackaged:

cd ComfyUI/models/diffusion_models

# High-noise expert (14.3 GB)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors \
  --local-dir . --local-dir-use-symlinks False

# Low-noise expert (14.3 GB)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors \
  --local-dir . --local-dir-use-symlinks False

After download, move both .safetensors files out of the nested split_files/diffusion_models/ subfolder into ComfyUI/models/diffusion_models/ directly (ComfyUI looks at the top level of that folder).

3. Download the text encoder and VAE

# UMT5-XXL text encoder (FP8 e4m3fn scaled, ~6.7 GB)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors \
  --local-dir ComfyUI/models/text_encoders --local-dir-use-symlinks False

# VAE (shared with Wan 2.1, ~254 MB)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/vae/wan_2.1_vae.safetensors \
  --local-dir ComfyUI/models/vae --local-dir-use-symlinks False

Same note on the nested folder paths — flatten so the encoder file sits directly under text_encoders/ and the VAE sits directly under vae/.

4. Load the official Wan 2.2 14B T2V workflow

Inside ComfyUI:

Click Workflow → Browse Templates → Video
Select Wan2.2 14B T2V
The template instantiates two LoadDiffusionModel nodes (one per expert) wired into the dual-sampler chain

Running

With the workflow loaded:

Enter your prompt in the CLIPTextEncode (Positive) node
Confirm resolution is 1280×720 and frame count is 81 in the latent video node (the standard 5-second-clip configuration for this model)
Confirm steps = 30, CFG = 5.0
Click Queue Prompt

The first run will spend extra time loading the high-noise expert into VRAM. Once denoising switches to the low-noise stage at the SNR threshold, ComfyUI evicts the high-noise expert and loads the low-noise expert — expect a visible pause at the timestep switch. Output lands in ComfyUI/output/ as an MP4 (or as a sequence of frames depending on your video-saver node).

Results

Speed: No first-party RTX 5090 timing for Wan 2.2 T2V-A14B at 1280×720, 30 steps has been published as of this writing. The 24 GB siblings measured this exact workload at 7m 10s on RTX 3090 and 4m 20s on RTX 4090 per LocalAIMaster (April 2026); the 5090's 5th-generation tensor cores execute FP8 natively (no Ampere-style dequantize), and Blackwell's compute throughput is roughly 1.3× the 4090 — but until a hands-on RTX 5090 measurement at the same workload lands we are not quoting a number. If you run this, report your timing via the submission form and we'll add it to /check/.
VRAM usage: The 24 GB siblings (3090 id=253, 4090 id=252) measured ~24 GB peak at this configuration; on the 32 GB 5090 the same FP8 workflow has roughly ~8 GB of headroom at the documented settings. Once a 5090-specific measurement lands in our /check/wan-2-2-14b/rtx-5090 row, the headroom estimate will be replaced with a measured number.
Quality notes: LocalAIMaster's head-to-head review calls Wan 2.2-T2V-A14B "the highest-quality open video model" in their four-model comparison; the dual-expert architecture (high-noise = layout/motion, low-noise = texture/detail) is the cited reason for the motion-stability lift over single-expert 14B competitors.

For the full benchmark data, see /check/wan-2-2-14b/rtx-5090.

Spending the 32 GB envelope — what the 5090 unlocks here

The 24 GB siblings (3090, 4090) run this same workflow at the ceiling of their cards — "the model wants every byte of a 24 GB card" per LocalAIMaster's writeup. The 5090's extra 8 GB opens up three concrete branches that were unavailable on Ada / Ampere:

Drop the FP8 scaled experts for FP16 BF16 experts — Comfy-Org also ships wan2.2_t2v_high_noise_14B_fp16.safetensors and the low-noise FP16 sibling, each ~28.6 GB on disk. With one expert resident at a time (timestep MoE), the larger half-precision expert just fits the 32 GB envelope when the FP8 text encoder is kept on GPU and the VAE decode is staged — tight, but the quality uplift over FP8 is the kind of headroom upgrade the 24 GB cards couldn't take. Expect to keep umt5_xxl_fp8_e4m3fn_scaled.safetensors (the FP8 text encoder, 6.7 GB) rather than the FP16 one (11.4 GB) when running FP16 experts — the encoder is loaded simultaneously with the active DiT expert at sampling time. Verify peak with nvidia-smi on first run.
Longer clips at FP8 — the 24 GB cards run 81 frames at the ceiling; the 5090's headroom permits experimenting with 97-frame or 113-frame counts in the latent video node at the same 1280×720 / FP8 / 30 steps configuration. There is no first-party measurement for these higher frame counts on the 5090 yet — increase incrementally and watch nvidia-smi.
Colocation with a second model — at FP8 with ~8 GB headroom, the workflow leaves room for a Q3-Q4 LLM (e.g. a quantized prompt-rewriter) to live on-card alongside the video pipeline, eliminating CPU↔GPU round-trips between prompt-enhancement and video generation. The official Wan 2.2 repo's --use_prompt_extend --prompt_extend_method 'local_qwen' flow already supports this on multi-GPU; on the 5090 it can be a single-GPU pattern.

Troubleshooting

Confirm Blackwell native FP8 acceleration is actually engaging

If generation feels closer to a 4090 than the expected Blackwell uplift, the FP8 fast path may not be active. PyTorch needs to be on a cu128 build (CUDA 12.8) to expose sm_120 kernels — pip list | grep torch should show a +cu128 tag. ComfyUI's startup banner lists active CUDA architectures; verify sm_120 is present. The cu126 and earlier wheels do not contain sm_120 kernels and will fall back to compute paths that don't use the 5th-generation tensor cores.

Out-of-memory at the VAE decode stage

This is much less likely on the 5090 than on the 24 GB siblings (you have ~8 GB of headroom at the documented configuration) — but VAE decode is the spikiest stage of the pipeline, and a long enough clip plus background GPU consumers (browser hardware acceleration, video calls) can still push you over. First step: close other GPU consumers. Second step: drop frame count or resolution. Third step: switch to a GGUF quant via the city96/ComfyUI-GGUF custom node — Q5_K_M (~10.8 GB per expert) or Q6_K (~12 GB per expert) from QuantStack/Wan2.2-T2V-A14B-GGUF. The QuantStack repo ships both HighNoise/ and LowNoise/ subfolders at every quant tier (Q2_K through Q8_0), so the dual-expert swap pattern is preserved.

Native Wan 2.2 install (`generate.py`) wants 80 GB

This is expected — the upstream Wan-Video/Wan2.2 repo's single-GPU code path holds both experts resident and requires ~80 GB VRAM per the model card ("This command can run on a GPU with at least 80GB VRAM"). Even the 5090's 32 GB isn't enough for that path. The memory flags --offload_model True --convert_model_dtype --t5_cpu exist but the ComfyUI FP8 scaled workflow is the cleaner consumer-GPU route — stick with it.

NVFP4 path exists but is enterprise-only today

NVIDIA published nvidia/Wan2.2-T2V-A14B-Diffusers-NVFP4 in May 2026 — an FP4 microscaling quant that targets Blackwell's 5th-generation tensor cores directly and would, in principle, be the fastest path on this card. The NVIDIA card's Supported Runtime Engine(s) lists only TRTLLM and SGLang (enterprise serving stacks) and the test hardware is B200 (datacenter Blackwell), not consumer 5090. There is no ComfyUI loader for NVFP4 weights at time of writing. Watch Wan2.2 Issue #317 "NVFP4支持" and the city96/ComfyUI-GGUF repo for community NVFP4 loader support before treating NVFP4 as a viable consumer path on this card.

"Where do I get the I2V or Animate variant?"

This recipe is T2V (text-to-video) only. For image-to-video (I2V-A14B), Animate-14B, S2V-14B, or the dense 5B sibling (TI2V-5B), the workflows and weight files differ — start from the Wan-AI HF org page and pick the matching ComfyUI repackaged variant. Same arch family, same install pattern, but different files. Report your results via the submission form and we'll add sibling recipes.