Wan 2.2 T2V-A14B on RTX 4090: 720p text-to-video in ComfyUI with FP8 scaled weights

What You'll Build

A working ComfyUI text-to-video pipeline that runs the Wan 2.2 T2V-A14B variant — Alibaba's two-expert (high-noise + low-noise) 14B-active video model — at FP8 precision on a single RTX 4090, producing 5.4-second clips at 1280×720, 81 frames. The native upstream code path requires 80 GB VRAM per the official model card; the path that actually fits 24 GB is the Comfy-Org repackaged FP8 scaled workflow, which swaps the two experts in and out of GPU memory sequentially during denoising.

Hardware data: RTX 4090 (24GB VRAM) · 4m 20s per 81-frame 720p clip at FP8, 30 steps · peak 24 GB VRAM · See benchmark data

⚠️ Variant pin — this recipe is specifically for T2V-A14B. The Wan 2.2 14B family ships at least four siblings under a single brand: T2V-A14B (text-to-video, this recipe), I2V-A14B (image-to-video), Animate-14B (motion-from-image), and S2V-14B (speech-to-video). They share architecture but ship different weights and different ComfyUI workflows. If you want any of the other three, the install steps below do not apply verbatim — start from the official Wan 2.2 GitHub instead.

ℹ️ Peak VRAM is tight at 24 GB. Our benchmark (id=252) measured exactly 24 GB peak — there is effectively zero headroom. Close other GPU workloads (browser hardware acceleration, video calls) before running, or expect occasional OOM at the VAE decode stage. Drop to a GGUF Q5_K_M or Q6_K via QuantStack/Wan2.2-T2V-A14B-GGUF if you need more headroom (see Troubleshooting).

Requirements

Component	Minimum	Tested
GPU	24 GB VRAM (Ada sm_89 or newer)	RTX 4090 (24 GB)
RAM	32 GB	—
Storage	~30 GB (two 14.3 GB FP8 experts + text encoder + VAE)	—
Software	ComfyUI (recent build with Wan2.2 templates)	—
Python	3.10+	—
PyTorch	2.4+ with CUDA	—

Installation

1. Install or update ComfyUI

If you don't have ComfyUI yet, follow the official install. If you already have it, update to the latest version — the Wan 2.2 templates and FP8-scaled loader support landed in mid-2025 and you'll want recent ComfyUI for both. Per the official ComfyUI tutorial, the Wan 2.2 14B T2V template is available via Workflow → Browse Templates → Video.

2. Download the two FP8 scaled expert weights

The T2V-A14B model is structured as two 14B experts that activate at different denoising timesteps — the high-noise expert handles early layout and motion; the low-noise expert handles late-stage detail. ComfyUI's workflow loads each into VRAM sequentially (not simultaneously), which is what keeps peak memory under 24 GB.

Download both files from Comfy-Org/Wan_2.2_ComfyUI_Repackaged:

cd ComfyUI/models/diffusion_models

# High-noise expert (14.3 GB)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors \
  --local-dir . --local-dir-use-symlinks False

# Low-noise expert (14.3 GB)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors \
  --local-dir . --local-dir-use-symlinks False

After download, move both .safetensors files out of the nested split_files/diffusion_models/ subfolder into ComfyUI/models/diffusion_models/ directly (ComfyUI looks at the top level of that folder).

3. Download the text encoder and VAE

# UMT5-XXL text encoder (FP8 e4m3fn scaled)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors \
  --local-dir ComfyUI/models/text_encoders --local-dir-use-symlinks False

# VAE (shared with Wan 2.1)
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged \
  split_files/vae/wan_2.1_vae.safetensors \
  --local-dir ComfyUI/models/vae --local-dir-use-symlinks False

Same note on the nested folder paths — flatten so the encoder file sits directly under text_encoders/ and the VAE sits directly under vae/.

4. Load the official Wan 2.2 14B T2V workflow

Inside ComfyUI:

Click Workflow → Browse Templates → Video
Select Wan2.2 14B T2V
The template instantiates two LoadDiffusionModel nodes (one per expert) wired into the dual-sampler chain

The template's JSON is also at video_wan2_2_14B_t2v.json if you want to drag-and-drop instead.

Running

With the workflow loaded:

Enter your prompt in the CLIPTextEncode (Positive) node
Confirm resolution is 1280×720 and frame count is 81 in the latent video node (this is the configuration the benchmark cited below was measured at)
Confirm steps = 30, CFG = 5.0
Click Queue Prompt

The first run will spend extra time loading the high-noise expert into VRAM. Once denoising switches to the low-noise stage, ComfyUI evicts the high-noise expert and loads the low-noise expert — expect a visible pause at the timestep switch. Output lands in ComfyUI/output/ as an MP4 (or as a sequence of frames depending on your video-saver node).

Results

Speed: 4 minutes 20 seconds for an 81-frame 1280×720 clip at 30 steps, FP8 e4m3fn precision — measured on RTX 4090 by LocalAIMaster (April 2026). The same benchmark notes a 3090 (same VRAM tier, older arch) at 7m 10s for the identical workload.
VRAM usage: 24 GB peak at the configuration above — this is the figure recorded in our /check/wan-2-2-14b/rtx-4090 row (id=252, FP8, 1280×720, 30 steps). There is essentially no headroom; see Troubleshooting if you hit OOM.
Quality notes: LocalAIMaster's head-to-head review calls Wan 2.2 "the highest-quality open video model I have benchmarked" with prompt-adherence 8.4, motion-stability 8.7, aesthetic 8.5 in their rubric. The dual-expert architecture (high-noise = layout/motion, low-noise = texture/detail) is the cited reason for the motion-stability lift over single-expert 14B competitors.

For the full benchmark data, see /check/wan-2-2-14b/rtx-4090.

Troubleshooting

Out-of-memory at the VAE decode stage

The cited peak is 24 GB on the nose — any background GPU consumer (browser hardware acceleration, video conferencing, a second model loaded in another process) can push you over. First step: close everything else using the GPU. Second step: drop resolution to 1280×704 or 960×544. Third step: switch to a GGUF quant — Q5_K_M (10.8 GB) or Q6_K (12 GB) from QuantStack/Wan2.2-T2V-A14B-GGUF via the city96/ComfyUI-GGUF custom node loads through Unet Loader (GGUF) instead of LoadDiffusionModel. Note: GGUF route is two-file-pair too (high-noise + low-noise both need quanting at the same tier).

Native Wan 2.2 install (`generate.py`) wants 80 GB

This is expected — the upstream Wan-Video/Wan2.2 repo's single-GPU code path holds both experts resident and requires ~80 GB VRAM per the model card. Memory flags like --offload_model True --convert_model_dtype --t5_cpu exist but the ComfyUI FP8 scaled path is the cleaner consumer-GPU route. Don't try to run python generate.py --task t2v-A14B directly on a 4090.

"Where do I get the I2V or Animate variant?"

This recipe is T2V (text-to-video) only. For image-to-video (I2V-A14B), Animate-14B, or S2V-14B, the workflows and weight files differ — start from the Wan-AI HF org page and pick the matching ComfyUI repackaged variant. Same arch family, same install pattern, but different files. Report your results via the submission form and we'll add sibling recipes.