self-hosted/ai
§01·recipe · 3d

Hunyuan3D-2.1 on RX 7800 XT: Image-to-Mesh on ROCm (Shape-Only)

3dadvanced10GB+ VRAMJun 19, 2026

This advanced recipe sets up Hunyuan3D on the RX 7800 XT, needing about 10 GB of VRAM.

models
tools
prerequisites
  • AMD Radeon RX 7800 XT (16 GB VRAM, RDNA3 / Navi 32 / gfx1101) on Linux with the AMD ROCm stack
  • Python 3.10 (the version the community prebuilt custom_rasterizer wheel targets; a 3.13 wheel also exists)
  • PyTorch built for ROCm, pinned to torch 2.7.0 — torch 2.8 is reported broken by the community AMD repo
  • A prebuilt custom_rasterizer wheel from dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux — the in-tree CUDA rasterizer does not build on ROCm without it; the published wheel ships native gfx1101 kernels
  • Compliance with the Tencent Hunyuan 3D 2.1 Community License (license: other / tencent-hunyuan-community, not Apache 2.0; excludes EU/UK/South Korea and restricts products over 1M MAU)

What You'll Build

A working image-to-3D shape pipeline using Tencent's Hunyuan3D-2.1 on a 16 GB Radeon RX 7800 XT (RDNA3, Navi 32, gfx1101) through the ROCm stack: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). On AMD this is a deliberately shape-only recipe — the shape stage's ~10 GB peak fits the 16 GB card comfortably, but the texture stage does not (its 21 GB and 29 GB peaks both overflow 16 GB; see the boxes below). The mesh stage is the trustworthy output on this card.

Hardware data: RX 7800 XT (16GB VRAM) · ~10GB peak for shape generation · ROCm + prebuilt custom_rasterizer wheel (native gfx1101 kernels) · See benchmark data

⚠️ This is a ROCm recipe, not CUDA — and it is fragile. The RX 7800 XT runs on AMD's ROCm/HIP stack: there is no cu124/cu128 wheel, no xformers, no FlashAttention build, and no FP8/FP4 path here (RDNA3's WMMA units accept FP16, BF16, INT8, INT4 only). The deciding obstacle on AMD is Hunyuan3D-2.1's custom_rasterizer extension, which ships as a CUDA kernel and does not compile on ROCm out of the box. The community repo dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux solves it by shipping a prebuilt custom_rasterizer wheel, plus a hard torch 2.7.0 pin (torch 2.8 is reported broken). This is community tooling, version-pinned and unofficial. Frame your expectations accordingly.

⚠️ Built for the 7900 XTX (gfx1100) — but the wheel ships native gfx1101 kernels too. The community repo author ran an RX 7900 XTX and explicitly warns "if you own other models further modifications may be needed." The 7800 XT is a different RDNA3 die — Navi 32, gfx1101 — and the repo's INSTALL.sh hardcodes gpu="gfx1100" / GPU_ARCHS="gfx1100", which targets the XTX, not your card. The good news, verified by inspecting the published wheel's compiled kernel directly: the prebuilt custom_rasterizer .so is a HIP fat binary that embeds native code objects for gfx1101 (the 7800 XT's target) alongside gfx1100, gfx1102 and the rest of the RDNA2/3/4 lineup. HIP's loader auto-selects the matching gfx1101 code object at runtime, so the rasterizer kernel runs natively on the 7800 XT with no recompile. You only need to change the GPU_ARCHS env var to gfx1101 so the rest of the runtime targets your card (the steps below do this). If any other ROCm component still ships gfx1100-only kernels, the documented Linux fallback is HSA_OVERRIDE_GFX_VERSION=11.0.0, which masquerades gfx1101 as gfx1100 — see Troubleshooting.

⚠️ Texture generation does not fit 16 GB — shape only. The official Hunyuan3D-2.1 README states "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total." Both the 21 GB texture-only peak and the 29 GB combined peak exceed this card's 16 GB by a wide margin — texture is out of reach on the 7800 XT for VRAM reasons alone. (Separately, texture output on ROCm has a corruption history; see the box below for the current, narrower picture.) Generate the mesh here and texture it downstream on NVIDIA / rented hardware or in a DCC tool.

ℹ️ ROCm texture-corruption status is now narrower than it was — but still doesn't help this card. AMD's tracker ROCm/ROCm Issue #5981 originally reported Hunyuan3D-2.1 texture corruption on ROCm (filed against an AMD R9700) that was "not observed with nVidia GPUs". The thread has since narrowed: a community tester (zichguan-amd, a non-maintainer per GitHub) re-ran it on a 7900 XTX, got what was confirmed as correct output, then reproduced corruption only on the gfx12 R9700 and concluded "This does appear to be a gfx12 specific issue" — i.e. RDNA4 (gfx1200/1201), not RDNA3 (gfx1100/gfx1101). So the corruption bug is not confirmed for the 7800 XT's gfx1101. It doesn't change the recommendation, though: texture still can't fit 16 GB regardless. Shape-only stands.

ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (.glb meshes), not 2D pictures. It sits in our 3d vertical; the shape stage covered here outputs an untextured mesh — colour/material is the separate texture stage that is out of scope on this card.

Requirements

ComponentMinimumTested
GPU10GB VRAM (shape pipeline)RX 7800 XT (16GB, gfx1101)
DriverAMD ROCm on Linux (community repo author used ROCm 6.4.1)
RAM16GB
Storage~7GB for the shape DiT checkpoint (~15GB if you also pull the texture folders you won't use)
SoftwarePython 3.10, PyTorch 2.7.0 for ROCm, prebuilt custom_rasterizer wheel (ships gfx1101 kernels)

Installation

The weights are not gated on Hugging Face — they download freely on the first pipeline call via huggingface_hub. There is, however, a license you must comply with before deploying: see step 4.

1. Install PyTorch for ROCm — pin torch 2.7.0

The community AMD repo is explicit that torch 2.7.0 is the known-good version and torch 2.8 does not work. Verbatim from its README: "Current Torch 2.8 doesn't work. Previous version 2.8.0.dev20250525+rocm6.4 used to work but it is no longer available. With torch 2.7.0 version it works." The repo author used ROCM 6.4.1 but notes it "maybe it can work with other versions". Install the ROCm build of torch 2.7.0:

# Pin torch 2.7.0 (ROCm build). The rocmX.Y tag moves over time — read the live
# selector at pytorch.org/get-started/locally and match a 2.7.0 wheel. Do NOT use torch 2.8.
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3

ℹ️ Verify the ROCm wheel tag before copying. The whl/rocmX.Y index tag (6.3 → 6.4 → 7.x) moves with each PyTorch release; the load-bearing constraint here is the torch version (2.7.0), not a specific ROCm tag — pick whichever ROCm wheel index currently carries a 2.7.0 build. Confirm afterward with python -c "import torch; print(torch.__version__)" (expect a 2.7.0+rocm... suffix) and torch.cuda.is_available() returning True (ROCm masquerades as the cuda device namespace under HIP).

2. Get the community AMD repo and its prebuilt custom_rasterizer wheel

This is the step that makes Hunyuan3D-2.1 work on an AMD GPU at all. The shape pipeline depends on the custom_rasterizer extension, whose stock build is a CUDA kernel that fails on ROCm. The community repo dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux ships prebuilt wheels for it — per its README, "i provide python wheels compiled for python version 3.10 and 3.13." The wheels live in the repo's wheels/ directory (custom_rasterizer-0.1-py310-none-manylinux_2_39_x86_64.whl for Python 3.10, and a py313 variant). Although the author built and ran them on an RX 7900 XTX, the compiled kernel inside the wheel is a HIP fat binary that includes native gfx1101 code objects — so the rasterizer runs on the 7800 XT without a recompile.

git clone https://github.com/dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux.git
cd Hunyuan3d-2-for-AMDGPU-linux

The repo provides an INSTALL.sh driver that wires everything up. Per the README, "You will be asked to select which python version you want to use" (match the wheel — 3.10 or 3.13) and a port for the bundled Gradio app:

./INSTALL.sh

⚠️ Re-target the arch to gfx1101 for the 7800 XT. INSTALL.sh hardcodes gpu="gfx1100" and writes export GPU_ARCHS="gfx1100" into the generated launch scripts (hunyuan.sh / hunyuan-mv.sh) — that targets the 7900 XTX, not your Navi 32 card. After running the installer, edit those launch scripts (and your own shell) to export GPU_ARCHS="gfx1101" so non-rasterizer runtime kernels build/dispatch for your card. The prebuilt rasterizer wheel itself already carries gfx1101 kernels and needs no change.

If you prefer to wire the wheel into your own Hunyuan3D-2.1 checkout instead of using the script, install the prebuilt custom_rasterizer wheel directly with pip — pick the one matching your Python version:

# from the cloned community repo
pip install wheels/custom_rasterizer-0.1-py310-none-manylinux_2_39_x86_64.whl

This replaces the in-tree CUDA custom_rasterizer build step from the official README — do not run the upstream custom_rasterizer pip install -e . / compile_mesh_painter.sh build on AMD; that is the CUDA path that fails on ROCm and the reason the prebuilt wheel exists.

3. Install the remaining dependencies

Install the rest of the requirements (the community repo carries its own requirements.txt; if you are working from the official Tencent checkout instead, use that one). Do not install xformers or a flash-attn wheel — neither is the right path on RDNA3. Hunyuan3D-2.1's shape diffusion runs on PyTorch's default SDPA (scaled_dot_product_attention), which needs no extra install and is the supported attention backend on ROCm.

pip install -r requirements.txt

4. Read the Tencent license before you deploy

The weights are governed by the Tencent Hunyuan 3D 2.1 Community License — the HF card lists license: tencent-hunyuan-community (license: other), not Apache 2.0. The license header states it "DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA" and requires a separate license from Tencent for any product whose monthly active users exceed 1 million. The free, ungated download does not waive these terms — read the LICENSE in full before deploying anything user-facing.

Running

Run only the shape pipeline — do not instantiate the texture (Hunyuan3DPaintPipeline) path on this card; both its 21 GB texture-only peak and its 29 GB combined peak overflow the 16 GB card (see the boxes above). On the first call the shape DiT checkpoint downloads automatically from tencent/Hunyuan3D-2.1.

import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline

shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')

For best results, give the pipeline a clean cutout — the repo ships a BackgroundRemover helper in hy3dshape/rembg.py you can run on the input image first. The pipeline outputs an untextured .glb mesh (glTF 2.0 binary); open the result in Blender, three.js, or any glTF 2.0 viewer.

The community repo also bundles a gradio_app.py UI (the port you chose during ./INSTALL.sh). Use it for the shape path; leave texture generation disabled.

Results

  • VRAM usage: ~10GB peak for shape generation, cited verbatim from the official Hunyuan3D-2.1 README ("It takes 10 GB VRAM for shape generation"). The 3.3B shape DiT ships as a single ~6.9GB FP16 checkpoint; the ~10GB figure accounts for the VAE and activations on top. This fits the 7800 XT's 16GB with ~6 GB of headroom — VRAM is comfortable for shape on this card. Texture, at 21 GB / 29 GB, is the part that does not fit. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rx-7800-xt.
  • Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RX 7800 XT benchmark for Hunyuan3D-2.1 shape generation was found that could be verified on a source page. Quoting a number from a different card or architecture would mislead. If you measure it, please send timings to /contribute — they will populate /check/hunyuan-3d/rx-7800-xt for the next reader.
  • Output format: .glb (glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to .obj/.fbx via trimesh in Python.
  • Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject. The shape stage produces watertight geometry suitable for greyboxing, retopology sources, and downstream texturing in a DCC tool. The mesh is the trustworthy output on AMD; do not rely on on-device texture.

For the full benchmark data, see /check/hunyuan-3d/rx-7800-xt.

Troubleshooting

custom_rasterizer won't build / CUDA errors on import

Do not build the upstream CUDA custom_rasterizer on AMD — it is a CUDA kernel and fails on ROCm. That is the entire reason this recipe routes through dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux, which ships a prebuilt wheel (custom_rasterizer-0.1-py3XX-none-manylinux_2_39_x86_64.whl). Install that wheel matching your Python version (3.10 or 3.13) rather than compiling. The wheel's kernel includes native gfx1101 code objects, so it runs on the 7800 XT as shipped.

Rasterizer import fails or no-kernel-image error on gfx1101

The prebuilt wheel embeds gfx1101 kernels, so this should not happen for the rasterizer. If another ROCm component (a kernel built only for gfx1100) raises a "no kernel image is available" error on your Navi 32 card, the documented Linux fallback is to masquerade as the XTX target:

export HSA_OVERRIDE_GFX_VERSION=11.0.0

This tells HIP to treat gfx1101 as gfx1100 at runtime. It is a legacy fallback, not generally required for an officially-supported card (the 7800 XT is a ROCm-supported gfx1101 target), and it is Linux-only. Also ensure you switched the repo's GPU_ARCHS from gfx1100 to gfx1101 (Installation step 2).

Torch errors / pipeline crashes after a torch upgrade

Re-pin torch 2.7.0. The community repo is explicit: "Current Torch 2.8 doesn't work. […] With torch 2.7.0 version it works." If a dependency bumped torch to 2.8, reinstall 2.7.0 (ROCm build); the prebuilt custom_rasterizer wheel was built against that line.

Out of memory if you try the texture path

Expected — texture does not fit 16 GB. The official README quotes "21GB for texture generation and 29GB for shape and texture generation in total"; both exceed this card's 16 GB. Keep Hunyuan3DPaintPipeline disabled, run shape only, and texture the resulting .glb downstream (on NVIDIA / rented A100/L40S-class hardware, or in a DCC tool like Substance Painter). Note also that on ROCm a community tester narrowed the texture-corruption bug (Issue #5981) to gfx12 (RDNA4) hardware, with the gfx1100 7900 XTX producing correct output — but VRAM, not corruption, is the binding blocker for the gfx1101 7800 XT.

Texture generation fails outright when an iGPU is also present

Separate from VRAM: the community repo notes "Texture generation fails when integrated+dedicated gpus are present. Disabling it in bios fixed this issue." If you were attempting the texture path on a system with both an integrated and the discrete 7800 XT and hit a hard failure, disabling the iGPU in BIOS is the documented workaround — but note texture still won't fit 16 GB regardless, so shape-only remains the recommendation.

Do not install xformers or FlashAttention

HF/CUDA-oriented guides often suggest pip install xformers or a FlashAttention wheel. On RDNA3 these are the wrong path: the ROCm xformers fork is limited (no FP32, head-dim ≤256) and consumer-card FlashAttention builds are unreliable on gfx1101/gfx1100. Hunyuan3D-2.1's shape diffusion runs on PyTorch SDPA, which is the supported attention backend here — no extra install needed.

For texture and PBR options that exceed this card's reliable envelope, see /check/hunyuan-3d/rx-7800-xt and contribute your results via /contribute.

common questions
How much VRAM does Hunyuan3D need?

About 10 GB — the minimum this recipe targets.

Which GPUs is Hunyuan3D tested on?

RX 7800 XT (16 GB).

How hard is this setup?

Advanced — follow the steps above.