self-hosted/ai
§01·recipe · 3d

Hunyuan3D-2.1 on RX 7900 XTX: Image-to-Mesh on ROCm (Shape-Only)

3dadvanced10GB+ VRAMJun 17, 2026

This advanced recipe sets up Hunyuan3D on the RX 7900 XTX, needing about 10 GB of VRAM.

models
tools
prerequisites
  • AMD Radeon RX 7900 XTX (24 GB VRAM, RDNA3 / Navi 31 / gfx1100) on Linux with the AMD ROCm stack
  • Python 3.10 (the version the community prebuilt custom_rasterizer wheel targets; a 3.13 wheel also exists)
  • PyTorch built for ROCm, pinned to torch 2.7.0 — torch 2.8 is reported broken by the community AMD repo
  • A prebuilt custom_rasterizer wheel from dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux — the in-tree CUDA rasterizer does not build on ROCm without it
  • Compliance with the Tencent Hunyuan 3D 2.1 Community License (license: other / tencent-hunyuan-community, not Apache 2.0; excludes EU/UK/South Korea and restricts products over 1M MAU)

What You'll Build

A working image-to-3D shape pipeline using Tencent's Hunyuan3D-2.1 on a 24 GB Radeon RX 7900 XTX (RDNA3, Navi 31, gfx1100) through the ROCm stack: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). On AMD this is a deliberately shape-only recipe, and not only because of VRAM — the texture stage is a separate, fragile path on ROCm (see the boxes below). The mesh stage runs reliably; the colour stage does not.

Hardware data: RX 7900 XTX (24GB VRAM) · ~10GB peak for shape generation · ROCm + community custom_rasterizer wheel · See benchmark data

⚠️ This is a ROCm recipe, not CUDA — and it is fragile. The RX 7900 XTX runs on AMD's ROCm/HIP stack: there is no cu124/cu128 wheel, no xformers, no FlashAttention build, and no FP8/FP4 path here (RDNA3's WMMA units accept FP16, BF16, INT8, INT4 only). The deciding obstacle on AMD is Hunyuan3D-2.1's custom_rasterizer extension, which ships as a CUDA kernel and does not compile on ROCm out of the box. The community repo dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux — authored by someone running this exact card, a "RX 7900 XTX" — solves it by shipping a prebuilt custom_rasterizer wheel, plus a hard torch 2.7.0 pin (torch 2.8 is reported broken). This is community tooling, version-pinned and unofficial. Frame your expectations accordingly.

⚠️ Texture generation is broken on ROCm — shape only. Two independent reasons keep texture off this card. (1) VRAM: the official Hunyuan3D-2.1 README states "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total" — the combined 29GB peak exceeds the 7900 XTX's 24GB. (2) Corruption: even when texture fits, it comes out corrupted on ROCm. AMD's own tracker, ROCm/ROCm Issue #5981, reports "texture corruption observed with Hunyuan3D-2.1 with ROCm ... This is not observed with nVidia GPUs," notes that "mesh generation works correctly; only textures are corrupted," and that the reporter "tested this with ROCm versions all the way from 6.4.0 to 7.2.0 and this issue persists" (so ROCm 7.1 is squarely inside the affected range). Generate the mesh here and texture it downstream on NVIDIA / rented hardware or in a DCC tool.

ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (.glb meshes), not 2D pictures. It sits in our 3d vertical; the shape stage covered here outputs an untextured mesh — colour/material is the separate, broken-on-ROCm texture stage that is out of scope on this card.

Requirements

ComponentMinimumTested
GPU10GB VRAM (shape pipeline)RX 7900 XTX (24GB, gfx1100)
DriverAMD ROCm on Linux (community repo author used ROCm 6.4.1)
RAM16GB
Storage~7GB for the shape DiT checkpoint (~15GB if you also pull the texture folders you won't use)
SoftwarePython 3.10, PyTorch 2.7.0 for ROCm, community prebuilt custom_rasterizer wheel

Installation

The weights are not gated on Hugging Face — they download freely on the first pipeline call via huggingface_hub. There is, however, a license you must comply with before deploying: see step 4.

1. Install PyTorch for ROCm — pin torch 2.7.0

The community AMD repo is explicit that torch 2.7.0 is the known-good version and torch 2.8 does not work. Verbatim from its README: "Current Torch 2.8 doesn't work. Previous version 2.8.0.dev20250525+rocm6.4 used to work but it is no longer available. With torch 2.7.0 version it works." The repo author "used ROCM 6.4.1 but maybe it can work with other versions." Install the ROCm build of torch 2.7.0:

# Pin torch 2.7.0 (ROCm build). The rocmX.Y tag moves over time — read the live
# selector at pytorch.org/get-started/locally and match a 2.7.0 wheel. Do NOT use torch 2.8.
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3

ℹ️ Verify the ROCm wheel tag before copying. The whl/rocmX.Y index tag (6.3 → 6.4 → 7.x) moves with each PyTorch release; the load-bearing constraint here is the torch version (2.7.0), not a specific ROCm tag — pick whichever ROCm wheel index currently carries a 2.7.0 build. Confirm afterward with python -c "import torch; print(torch.__version__)" (expect a 2.7.0+rocm... suffix) and torch.cuda.is_available() returning True (ROCm masquerades as the cuda device namespace under HIP).

2. Get the community AMD repo and its prebuilt custom_rasterizer wheel

This is the step that makes Hunyuan3D-2.1 work on an AMD GPU at all. The shape pipeline depends on the custom_rasterizer extension, whose stock build is a CUDA kernel that fails on ROCm. The community repo dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux ships prebuilt wheels for it — per its README, "i provide python wheels compiled for python version 3.10 and 3.13." The wheels are committed in the repo's wheels/ directory (e.g. custom_rasterizer-0.1-py310-none-manylinux_2_39_x86_64.whl for Python 3.10, and a py313 variant). The author built and runs them on an RX 7900 XTX, so gfx1100 is the validated target.

git clone https://github.com/dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux.git
cd Hunyuan3d-2-for-AMDGPU-linux

The repo provides an INSTALL.sh driver that wires everything up. Per the README, "you will be asked to select which python version you want to use" (match the wheel — 3.10 or 3.13) and a port for the bundled Gradio app:

./INSTALL.sh

If you prefer to wire the wheel into your own Hunyuan3D-2.1 checkout instead of using the script, install the prebuilt custom_rasterizer wheel directly with pip — pick the one matching your Python version:

# from the cloned community repo
pip install wheels/custom_rasterizer-0.1-py310-none-manylinux_2_39_x86_64.whl

This replaces the in-tree CUDA custom_rasterizer build step from the official README — do not run the upstream custom_rasterizer python setup.py install / compile_mesh_painter.sh build on AMD; that is the CUDA path that fails on ROCm and the reason the prebuilt wheel exists.

3. Install the remaining dependencies

Install the rest of the requirements (the community repo carries its own requirements.txt; if you are working from the official Tencent checkout instead, use that one). Do not install xformers or a flash-attn wheel — neither is the right path on RDNA3. Hunyuan3D-2.1's shape diffusion runs on PyTorch's default SDPA (scaled_dot_product_attention), which needs no extra install and is the supported attention backend on ROCm.

pip install -r requirements.txt

4. Read the Tencent license before you deploy

The weights are governed by the Tencent Hunyuan 3D 2.1 Community License — the HF card lists license: tencent-hunyuan-community (license: other), not Apache 2.0. The license header states it "DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA" and requires a separate license from Tencent for any product whose monthly active users exceed 1 million. The free, ungated download does not waive these terms — read the LICENSE in full before deploying anything user-facing.

Running

Run only the shape pipeline — do not instantiate the texture (Hunyuan3DPaintPipeline) path on this card; it both overflows 24GB combined and produces corrupted output on ROCm (see the boxes above). On the first call the shape DiT checkpoint downloads automatically from tencent/Hunyuan3D-2.1.

import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline

shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')

For best results, give the pipeline a clean cutout — the repo ships a BackgroundRemover helper in hy3dshape/rembg.py you can run on the input image first. The pipeline outputs an untextured .glb mesh (glTF 2.0 binary); open the result in Blender, three.js, or any glTF 2.0 viewer.

The community repo also bundles a gradio_app.py UI (the port you chose during ./INSTALL.sh). Use it for the shape path; leave texture generation disabled.

Results

  • VRAM usage: ~10GB peak for shape generation, cited verbatim from the official Hunyuan3D-2.1 README ("It takes 10 GB VRAM for shape generation"). The 3.3B shape DiT ships as a single ~6.9GB FP16 checkpoint; the ~10GB figure accounts for the VAE and activations on top. This fits the 7900 XTX's 24GB with large headroom — VRAM is not the constraint on this card; the ROCm texture-corruption bug is. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rx-7900-xtx.
  • Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RX 7900 XTX benchmark for Hunyuan3D-2.1 shape generation was found that could be verified on a source page. Quoting a number from a different card or architecture would mislead. We route timings to /contribute and /check/hunyuan-3d/rx-7900-xtx.
  • Output format: .glb (glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to .obj/.fbx via trimesh in Python.
  • Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject. The shape stage produces watertight geometry suitable for greyboxing, retopology sources, and downstream texturing in a DCC tool. The mesh is the trustworthy output on AMD; do not rely on on-device texture.

For the full benchmark data, see /check/hunyuan-3d/rx-7900-xtx.

Troubleshooting

custom_rasterizer won't build / CUDA errors on import

Do not build the upstream CUDA custom_rasterizer on AMD — it is a CUDA kernel and fails on ROCm. That is the entire reason this recipe routes through dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux, which ships a prebuilt wheel (custom_rasterizer-0.1-py3XX-none-manylinux_2_39_x86_64.whl). Install that wheel matching your Python version (3.10 or 3.13) rather than compiling.

Torch errors / pipeline crashes after a torch upgrade

Re-pin torch 2.7.0. The community repo is explicit: "Current Torch 2.8 doesn't work ... With torch 2.7.0 version it works." If a dependency bumped torch to 2.8, reinstall 2.7.0 (ROCm build) and the prebuilt custom_rasterizer wheel was built against that line.

Textures come out corrupted (garbage colours / noise)

Expected on ROCm, and the reason this recipe is shape-only. Per ROCm/ROCm Issue #5981, Hunyuan3D-2.1 texture output is corrupted on ROCm across versions 6.4.0–7.2.0 while "mesh generation works correctly; only textures are corrupted" — and it is not observed on NVIDIA. There is no on-device fix today: take the untextured .glb from the shape stage and texture it downstream (on NVIDIA / rented A100/L40S-class hardware, or in a DCC tool like Substance Painter).

Texture generation fails outright when an iGPU is also present

Separate from the corruption bug: the community repo notes "Texture generation fails when integrated+dedicated gpus are present. Disabling it in bios fixed this issue." If you were attempting the texture path and hit a hard failure (not corruption) on a system with both an integrated and the discrete 7900 XTX, disabling the iGPU in BIOS is the documented workaround — but note texture output is still corrupted on ROCm regardless, so shape-only remains the recommendation.

Do not install xformers or FlashAttention

HF/CUDA-oriented guides often suggest pip install xformers or a FlashAttention wheel. On RDNA3 these are the wrong path: the ROCm xformers fork is limited (no FP32, head-dim ≤256) and consumer-card FlashAttention builds are unreliable on gfx1100. Hunyuan3D-2.1's shape diffusion runs on PyTorch SDPA, which is the supported attention backend here — no extra install needed.

For texture and PBR options that exceed this card's reliable envelope, see /check/hunyuan-3d/rx-7900-xtx and contribute your results via /contribute.

common questions
How much VRAM does Hunyuan3D need?

About 10 GB — the minimum this recipe targets.

Which GPUs is Hunyuan3D tested on?

RX 7900 XTX (24 GB).

How hard is this setup?

Advanced — follow the steps above.