What You'll Build
A working image-to-3D shape pipeline using Tencent's Hunyuan3D-2.1 on a 24 GB Radeon RX 7900 XTX (RDNA3, Navi 31, gfx1100) through the ROCm stack: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). On AMD this is a deliberately shape-only recipe, and not only because of VRAM — the texture stage is a separate, fragile path on ROCm (see the boxes below). The mesh stage runs reliably; the colour stage does not.
Hardware data: RX 7900 XTX (24GB VRAM) · ~10GB peak for shape generation · ROCm + community custom_rasterizer wheel · See benchmark data
⚠️ This is a ROCm recipe, not CUDA — and it is fragile. The RX 7900 XTX runs on AMD's ROCm/HIP stack: there is no
cu124/cu128wheel, no xformers, no FlashAttention build, and no FP8/FP4 path here (RDNA3's WMMA units accept FP16, BF16, INT8, INT4 only). The deciding obstacle on AMD is Hunyuan3D-2.1'scustom_rasterizerextension, which ships as a CUDA kernel and does not compile on ROCm out of the box. The community repodgarcia1985/Hunyuan3d-2-for-AMDGPU-linux— authored by someone running this exact card, a "RX 7900 XTX" — solves it by shipping a prebuiltcustom_rasterizerwheel, plus a hard torch 2.7.0 pin (torch 2.8 is reported broken). This is community tooling, version-pinned and unofficial. Frame your expectations accordingly.
⚠️ Texture generation is broken on ROCm — shape only. Two independent reasons keep texture off this card. (1) VRAM: the official Hunyuan3D-2.1 README states "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total" — the combined 29GB peak exceeds the 7900 XTX's 24GB. (2) Corruption: even when texture fits, it comes out corrupted on ROCm. AMD's own tracker, ROCm/ROCm Issue #5981, reports "texture corruption observed with Hunyuan3D-2.1 with ROCm ... This is not observed with nVidia GPUs," notes that "mesh generation works correctly; only textures are corrupted," and that the reporter "tested this with ROCm versions all the way from 6.4.0 to 7.2.0 and this issue persists" (so ROCm 7.1 is squarely inside the affected range). Generate the mesh here and texture it downstream on NVIDIA / rented hardware or in a DCC tool.
ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (
.glbmeshes), not 2D pictures. It sits in our3dvertical; the shape stage covered here outputs an untextured mesh — colour/material is the separate, broken-on-ROCm texture stage that is out of scope on this card.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 10GB VRAM (shape pipeline) | RX 7900 XTX (24GB, gfx1100) |
| Driver | AMD ROCm on Linux (community repo author used ROCm 6.4.1) | — |
| RAM | 16GB | — |
| Storage | ~7GB for the shape DiT checkpoint (~15GB if you also pull the texture folders you won't use) | — |
| Software | Python 3.10, PyTorch 2.7.0 for ROCm, community prebuilt custom_rasterizer wheel | — |
Installation
The weights are not gated on Hugging Face — they download freely on the first pipeline call via huggingface_hub. There is, however, a license you must comply with before deploying: see step 4.
1. Install PyTorch for ROCm — pin torch 2.7.0
The community AMD repo is explicit that torch 2.7.0 is the known-good version and torch 2.8 does not work. Verbatim from its README: "Current Torch 2.8 doesn't work. Previous version 2.8.0.dev20250525+rocm6.4 used to work but it is no longer available. With torch 2.7.0 version it works." The repo author "used ROCM 6.4.1 but maybe it can work with other versions." Install the ROCm build of torch 2.7.0:
# Pin torch 2.7.0 (ROCm build). The rocmX.Y tag moves over time — read the live
# selector at pytorch.org/get-started/locally and match a 2.7.0 wheel. Do NOT use torch 2.8.
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
ℹ️ Verify the ROCm wheel tag before copying. The
whl/rocmX.Yindex tag (6.3 → 6.4 → 7.x) moves with each PyTorch release; the load-bearing constraint here is the torch version (2.7.0), not a specific ROCm tag — pick whichever ROCm wheel index currently carries a 2.7.0 build. Confirm afterward withpython -c "import torch; print(torch.__version__)"(expect a2.7.0+rocm...suffix) andtorch.cuda.is_available()returningTrue(ROCm masquerades as thecudadevice namespace under HIP).
2. Get the community AMD repo and its prebuilt custom_rasterizer wheel
This is the step that makes Hunyuan3D-2.1 work on an AMD GPU at all. The shape pipeline depends on the custom_rasterizer extension, whose stock build is a CUDA kernel that fails on ROCm. The community repo dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux ships prebuilt wheels for it — per its README, "i provide python wheels compiled for python version 3.10 and 3.13." The wheels are committed in the repo's wheels/ directory (e.g. custom_rasterizer-0.1-py310-none-manylinux_2_39_x86_64.whl for Python 3.10, and a py313 variant). The author built and runs them on an RX 7900 XTX, so gfx1100 is the validated target.
git clone https://github.com/dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux.git
cd Hunyuan3d-2-for-AMDGPU-linux
The repo provides an INSTALL.sh driver that wires everything up. Per the README, "you will be asked to select which python version you want to use" (match the wheel — 3.10 or 3.13) and a port for the bundled Gradio app:
./INSTALL.sh
If you prefer to wire the wheel into your own Hunyuan3D-2.1 checkout instead of using the script, install the prebuilt custom_rasterizer wheel directly with pip — pick the one matching your Python version:
# from the cloned community repo
pip install wheels/custom_rasterizer-0.1-py310-none-manylinux_2_39_x86_64.whl
This replaces the in-tree CUDA custom_rasterizer build step from the official README — do not run the upstream custom_rasterizer python setup.py install / compile_mesh_painter.sh build on AMD; that is the CUDA path that fails on ROCm and the reason the prebuilt wheel exists.
3. Install the remaining dependencies
Install the rest of the requirements (the community repo carries its own requirements.txt; if you are working from the official Tencent checkout instead, use that one). Do not install xformers or a flash-attn wheel — neither is the right path on RDNA3. Hunyuan3D-2.1's shape diffusion runs on PyTorch's default SDPA (scaled_dot_product_attention), which needs no extra install and is the supported attention backend on ROCm.
pip install -r requirements.txt
4. Read the Tencent license before you deploy
The weights are governed by the Tencent Hunyuan 3D 2.1 Community License — the HF card lists license: tencent-hunyuan-community (license: other), not Apache 2.0. The license header states it "DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA" and requires a separate license from Tencent for any product whose monthly active users exceed 1 million. The free, ungated download does not waive these terms — read the LICENSE in full before deploying anything user-facing.
Running
Run only the shape pipeline — do not instantiate the texture (Hunyuan3DPaintPipeline) path on this card; it both overflows 24GB combined and produces corrupted output on ROCm (see the boxes above). On the first call the shape DiT checkpoint downloads automatically from tencent/Hunyuan3D-2.1.
import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline
shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')
For best results, give the pipeline a clean cutout — the repo ships a BackgroundRemover helper in hy3dshape/rembg.py you can run on the input image first. The pipeline outputs an untextured .glb mesh (glTF 2.0 binary); open the result in Blender, three.js, or any glTF 2.0 viewer.
The community repo also bundles a gradio_app.py UI (the port you chose during ./INSTALL.sh). Use it for the shape path; leave texture generation disabled.
Results
- VRAM usage: ~10GB peak for shape generation, cited verbatim from the official Hunyuan3D-2.1 README ("It takes 10 GB VRAM for shape generation"). The 3.3B shape DiT ships as a single ~6.9GB FP16 checkpoint; the ~10GB figure accounts for the VAE and activations on top. This fits the 7900 XTX's 24GB with large headroom — VRAM is not the constraint on this card; the ROCm texture-corruption bug is. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rx-7900-xtx.
- Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RX 7900 XTX benchmark for Hunyuan3D-2.1 shape generation was found that could be verified on a source page. Quoting a number from a different card or architecture would mislead. We route timings to /contribute and /check/hunyuan-3d/rx-7900-xtx.
- Output format:
.glb(glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to.obj/.fbxviatrimeshin Python. - Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject. The shape stage produces watertight geometry suitable for greyboxing, retopology sources, and downstream texturing in a DCC tool. The mesh is the trustworthy output on AMD; do not rely on on-device texture.
For the full benchmark data, see /check/hunyuan-3d/rx-7900-xtx.
Troubleshooting
custom_rasterizer won't build / CUDA errors on import
Do not build the upstream CUDA custom_rasterizer on AMD — it is a CUDA kernel and fails on ROCm. That is the entire reason this recipe routes through dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux, which ships a prebuilt wheel (custom_rasterizer-0.1-py3XX-none-manylinux_2_39_x86_64.whl). Install that wheel matching your Python version (3.10 or 3.13) rather than compiling.
Torch errors / pipeline crashes after a torch upgrade
Re-pin torch 2.7.0. The community repo is explicit: "Current Torch 2.8 doesn't work ... With torch 2.7.0 version it works." If a dependency bumped torch to 2.8, reinstall 2.7.0 (ROCm build) and the prebuilt custom_rasterizer wheel was built against that line.
Textures come out corrupted (garbage colours / noise)
Expected on ROCm, and the reason this recipe is shape-only. Per ROCm/ROCm Issue #5981, Hunyuan3D-2.1 texture output is corrupted on ROCm across versions 6.4.0–7.2.0 while "mesh generation works correctly; only textures are corrupted" — and it is not observed on NVIDIA. There is no on-device fix today: take the untextured .glb from the shape stage and texture it downstream (on NVIDIA / rented A100/L40S-class hardware, or in a DCC tool like Substance Painter).
Texture generation fails outright when an iGPU is also present
Separate from the corruption bug: the community repo notes "Texture generation fails when integrated+dedicated gpus are present. Disabling it in bios fixed this issue." If you were attempting the texture path and hit a hard failure (not corruption) on a system with both an integrated and the discrete 7900 XTX, disabling the iGPU in BIOS is the documented workaround — but note texture output is still corrupted on ROCm regardless, so shape-only remains the recommendation.
Do not install xformers or FlashAttention
HF/CUDA-oriented guides often suggest pip install xformers or a FlashAttention wheel. On RDNA3 these are the wrong path: the ROCm xformers fork is limited (no FP32, head-dim ≤256) and consumer-card FlashAttention builds are unreliable on gfx1100. Hunyuan3D-2.1's shape diffusion runs on PyTorch SDPA, which is the supported attention backend here — no extra install needed.
For texture and PBR options that exceed this card's reliable envelope, see /check/hunyuan-3d/rx-7900-xtx and contribute your results via /contribute.