What You'll Build
A local ComfyUI pipeline on a 24 GB Radeon RX 7900 XTX (RDNA3, Navi 31, gfx1100) that turns a text prompt (or a starting image) into a 5-second 720p video using the Wan 2.2 TI2V-5B model — the only Wan 2.2 variant the official repo documents as runnable on a single consumer-grade GPU. This runs through AMD's ROCm stack with PyTorch SDPA as the attention path (not FlashAttention), and the ComfyUI native workflow as the canonical lead. At 24 GB, VRAM is not the bottleneck on this pairing — ROCm's memory-management fragility on large video-model loads is, and this recipe is honest about that.
Hardware data: RX 7900 XTX (24GB VRAM, RDNA3 / gfx1100) · 720p (1280×704 / 704×1280) at 24 fps · ComfyUI on ROCm · See benchmark data
⚠️ This is a ROCm recipe, not CUDA. The RX 7900 XTX runs on AMD's ROCm/HIP stack — there is no
cu124/cu128wheel, nopip install flash-attn, and no FP8/FP4 path here. RDNA3 has no FP8/FP4 hardware (its WMMA units accept FP16, BF16, INT8, INT4 only), so an FP8 checkpoint just upcasts to FP16 with no memory saving — and at 24 GB you don't need it anyway. The attention path is PyTorch SDPA (ComfyUI's default), not FlashAttention-2. A 7900 XTX owner running Wan via ComfyUI+ROCm reports that forcing FlashAttention-2 actually increased sampling time by ~50% versus the native SDP path (Wan2.1 AMD support discussion #14). If a guide tells you to install a FlashAttention wheel or pick acu12xwheel for this card, it's written for the wrong vendor.
Why TI2V-5B and not the 14B variants? The Wan 2.2 family ships five variants: TI2V-5B (this recipe), T2V-A14B, I2V-A14B, S2V-14B, and Animate-14B. The four 14B-class variants are MoE models the official
Wan-Video/Wan2.2README documents with an 80 GB single-GPU floor — far past a 24 GB card at native precision. Only TI2V-5B is positioned as a single-consumer-GPU target. The Wan-AI HF card describes it as a 5B dense model (one fused checkpoint, no high-noise / low-noise expert split) with a unified text-and-image-to-video architecture and a high-compression 16×16×4 VAE, released alongside the larger 27B-total MoE models. The 14B variants need a different recipe entirely.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM (ROCm-supported AMD card) — the official ComfyUI tutorial documents the 5B model fitting on 8 GB with ComfyUI native offloading | RX 7900 XTX (24 GB, RDNA3 / gfx1100) |
| RAM | 16 GB | 32 GB+ recommended (offloading is RAM-heavy) |
| Storage | ~22 GB (TI2V-5B FP16 9.31 GB + UMT5-XXL FP16 text encoder 11.37 GB + Wan2.2-VAE 1.41 GB) | per HF Files tree |
| Driver | AMD ROCm 7.2.x on Linux | — |
| Software | ComfyUI + PyTorch (ROCm 7.2 build), Python 3.10+ | — |
The model is released under the Apache 2.0 License (per the Wan-AI HF card frontmatter) and the weights are not gated on Hugging Face — no access request or login is required. The official README states the TI2V-5B single-GPU command "can run on a GPU with at least 24GB VRAM (e.g., RTX 4090 GPU)" — the 7900 XTX's 24 GB sits exactly on that floor.
Installation
1. Install ComfyUI
Per the ComfyUI README, clone the repo:
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
2. Install PyTorch for ROCm
The RX 7900 XTX (gfx1100) is an officially ROCm-supported GPU, so it uses the stable ROCm PyTorch wheel. Per the ComfyUI README "AMD GPUs (Linux)" section, the stable install command is:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2
ℹ️ Verify the ROCm tag before you copy it. As of this writing the ComfyUI README pins
rocm7.2as the stable wheel — but therocmX.Ytag moves over time (6.3 → 6.4 → 7.x). Read the current line in the live ComfyUI README before running. A nightly variant (https://download.pytorch.org/whl/nightly/rocm7.2) "might have some performance improvements" per the README. There is also a separate experimental RDNA-3 wheel index (https://rocm.nightlies.amd.com/v2/gfx110X-all/) the README lists for Windows+Linux RDNA3 — on officially-supported Linux you do not need it; the stablewhl/rocm7.2wheel above is the canonical path.
3. Install ComfyUI dependencies
Per the ComfyUI README "Dependencies" section:
pip install -r requirements.txt
4. Download model files for the native workflow
Per the ComfyUI native workflow docs, download these three files from the Comfy-Org Wan 2.2 repackaged repo and place them in ComfyUI/models/. File sizes are verified from the Hugging Face Files tree:
# diffusion model (FP16, ~9.31 GB) → ComfyUI/models/diffusion_models/
wget -P models/diffusion_models/ \
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
# text encoder (FP16, ~11.37 GB) → ComfyUI/models/text_encoders/
wget -P models/text_encoders/ \
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp16.safetensors
# VAE (~1.41 GB) → ComfyUI/models/vae/
wget -P models/vae/ \
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors
The resulting layout matches what the official template expects:
ComfyUI/models/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors
ComfyUI/models/text_encoders/umt5_xxl_fp16.safetensors
ComfyUI/models/vae/wan2.2_vae.safetensors
ℹ️ Why the FP16 text encoder, not the FP8 one. Most Wan 2.2 ComfyUI guides point you at
umt5_xxl_fp8_e4m3fn_scaled.safetensors(6.74 GB). That file is an NVIDIA optimization: on RDNA3 there is no FP8 hardware, so an FP8 text encoder upcasts to FP16 at load — you pay the upcast and get no memory win or speedup. At 24 GB you have the room, so take the nativeumt5_xxl_fp16.safetensors(11.37 GB) directly. (If you ever colocate another model and need to claw back encoder VRAM, the FP8 file still works — it just won't be faster or smaller in practice on this card.)
Running
Launch ComfyUI from the repo root. Per the ComfyUI README "Running" section:
python main.py
This starts the server (default http://127.0.0.1:8188). Open the Wan2.2 5B video generation template under Workflow → Browse Templates → Video (you need a ComfyUI build new enough to ship it), set the positive prompt, choose resolution 1280×704 (landscape) or 704×1280 (portrait), set the frame count for clip length (24 fps → 120 frames for a 5-second clip), and queue. The first render is slower due to model load; subsequent renders reuse the cached weights.
For image-to-video, drop a starting image into the LoadImage node wired into the template's Wan22ImageToVideoLatent input — TI2V is a unified text-and-image-to-video model, so the same workflow file handles both modes.
Attention path: ComfyUI's default attention backend on this stack is PyTorch's scaled-dot-product attention (SDPA), which is the path a 7900 XTX owner reports running Wan through (Wan2.1 AMD discussion #14: "using PyTorch's native Flash attention (via SDP) on PyTorch 2.6+rocm6.2.4"). Do not install or force a FlashAttention-2 wheel on this card — the same user found it ~50% slower than SDP, and the upstream Composable-Kernel FlashAttention build is CDNA/MI-only on consumer RDNA3 anyway. Stick with the default.
ℹ️ Why ComfyUI native and not the CLI. The official
generate.py --task ti2v-5Bcommand is tuned for a 24 GB card, but per Wan2.2 issue #90 it OOMs at the decode stage on a 24 GB GPU (reported on an RTX 3090 24 GB, even with--offload_model True --convert_model_dtype --t5_cpuset). ComfyUI's runtime offloader is more aggressive than the CLI's static offload flags and is the path the documented 8 GB working floor refers to — so it is the reliable lead on the 24 GB 7900 XTX, especially given ROCm's tighter memory headroom.
Results
- Speed: No first-party RX 7900 XTX measurement for Wan 2.2 TI2V-5B exists in the Wan-AI HF card, the official README, or the backend benchmark data (/check/wan-2-2/rx-7900-xtx returns no benchmark rows, verified at write time). The HF card's only published timing is the model-wide claim that TI2V-5B generates a 5-second 720P video in under 9 minutes on a single consumer-grade GPU — measured on an RTX 4090 (NVIDIA), not this card. The closest AMD data point is a Wan 2.1 (not 2.2) run on a 7900 XTX reported in discussion #14: "25 steps is running for ~24 mins at 832×480" via the Kijai WanVideoWrapper. That is a different model version, resolution, and workflow, so it is not quoted as this pairing's speed — treat it only as a rough signal that ROCm video generation on this card is functional but markedly slower than the NVIDIA reference. We do not publish an invented number. If you've measured Wan 2.2 TI2V-5B wall-clock on a 7900 XTX, please contribute it so it lands on /check/wan-2-2/rx-7900-xtx.
- VRAM usage: ~8 GB working floor on the ComfyUI native path with the runtime offloader engaged (per the ComfyUI tutorial), leaving ample headroom on the 24 GB card. The FP16 diffusion file (9.31 GB) + UMT5-XXL FP16 text encoder (11.37 GB) + Wan2.2-VAE (1.41 GB) sum to ~22 GB on disk; runtime peak with ComfyUI's offloader is well below that because weights stream rather than all loading resident. At 24 GB, VRAM is not the constraint on this pairing — ROCm memory-management stability is (see Troubleshooting). Live data: /check/wan-2-2/rx-7900-xtx.
- Quality notes: TI2V-5B output is 720p (1280×704 or 704×1280) at 24 fps; the README documents 720P generation at 24 FPS for this model. Clip length is configurable via frame count. The dense single-checkpoint architecture means quality is consistent on the canonical FP16 path; there is no per-expert quality-vs-speed dial for this variant.
For the full benchmark data and other-GPU comparisons, see /check/wan-2-2/rx-7900-xtx.
Troubleshooting
Model load stalls or hangs ("Requested to load …") on ROCm
The single most common video-model failure on a 7900 XTX is not OOM — it's a load-time stall caused by interactions between ROCm's memory management and ComfyUI's pinned / async / dynamic-VRAM offloading. A 7900 XTX owner hit exactly this on a video model and resolved it with launch flags (ComfyUI issue #13730, reported for LTX on the same card + ROCm 7.2; the same plumbing affects Wan loads):
python main.py --disable-pinned-memory --disable-async-offload --disable-dynamic-vram
The reporter notes --disable-pinned-memory and --disable-async-offload each "seem important." This is an AMD/ROCm-specific issue — the identical model loads fine on NVIDIA without these flags. If a Wan generation hangs at the "load model" step on this card, try these first.
"Torch not compiled with CUDA enabled"
This means a CUDA build of PyTorch got installed instead of the ROCm build. Per the ComfyUI README troubleshooting note, uninstall and reinstall against the ROCm wheel index:
pip uninstall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2
Confirm the installed build is the ROCm one: python -c "import torch; print(torch.__version__)" should print a +rocm7.2-style suffix, and torch.cuda.is_available() returns True (ROCm masquerades as the cuda device namespace under HIP).
Slow VAE decode at the end of a render
A 7900 XTX user found VAE decode was the slow stage and fixed it by lowering the VAE tile size: per discussion #14, "changing the tile size to 256×256 — a 720×480 video will decode in 16s now." If your workflow exposes a VAE-tiling node, drop the tile to 256×256 when decode dominates wall-clock.
Don't install FlashAttention or a cu12x wheel
HF and ComfyUI guides written for NVIDIA frequently suggest a FlashAttention wheel or a CUDA cu124/cu128 index. On RDNA3 both are the wrong path: the upstream Composable-Kernel FlashAttention build is CDNA/MI-only on consumer gfx1100, and a 7900 XTX owner measured FA2 ~50% slower than the default SDP route (discussion #14). ComfyUI already routes attention through PyTorch SDPA on this stack — stick with the default.
Want the 14B variants?
Per the official README, the 14B / A14B single-GPU commands need at least 80 GB VRAM — out of scope for a 24 GB card at native precision. Community GGUF quants of the 14B Wan variants exist but need a separate workflow and a GGUF loader; file a request on /contribute if you want a 14B-quantized AMD recipe added once a stable 24 GB gfx1100 workflow lands.