How much VRAM does Hunyuan3D need?

About 21 GB — the minimum this recipe targets.

How hard is this setup?

Advanced — follow the steps above.

Hunyuan3D-2.1 on RX 7900 XTX: Textured Image-to-Mesh on ROCm

What You'll Build

A working image-to-3D pipeline using Tencent's Hunyuan3D-2.1 on a 24 GB Radeon RX 7900 XTX (RDNA3, Navi 31, gfx1100) through the ROCm stack: drop in a reference photo, get back a textured .glb mesh (with PBR metallic/roughness maps) ready for any DCC tool (Blender, Unity, Unreal, three.js). Both stages — shape generation and texture generation — run on this card. The catch is memory, not correctness: the two pipelines together exceed 24 GB, so you run them sequentially (generate the mesh, free the shape pipeline, then texture), keeping only one pipeline resident at a time.

Hardware data: RX 7900 XTX (24GB VRAM) · ~10 GB shape stage, ~21 GB texture stage (run sequentially) · ROCm + custom_rasterizer / DifferentiableRenderer · See benchmark data

✅ Texture works on the 7900 XTX — the corruption bug is RDNA4-only. You may have seen reports that Hunyuan3D-2.1 texture output is garbled on ROCm. That is real, but it is specific to the gfx12 / RDNA4 generation (the Radeon AI PRO R9700) — not the RX 7900 XTX (gfx1100 / RDNA3). AMD's tracker, ROCm/ROCm Issue #5981, names the R9700 in its title ("corrupted texture output with ROCm 7.1 on AMD r9700") and the reporter scopes the corruption to AMD r9700 GPUs. In the thread, a tester on an AMD-handle account (zichguan-amd) ran the exact pipeline on a 7900 XTX, and the reporter confirmed that output is the expected output. That same tester then reproduced the garbling on an R9700 and concluded it does appear to be a gfx12 specific issue (later fixed on the R9700 with TheRock gfx1201 wheels, and the issue was closed). On RDNA3 (gfx1100), texture renders correctly — so this recipe textures on-device.

ℹ️ Why sequential, not "out of scope": the 29 GB combined peak. The only reason to split is VRAM. The official Hunyuan3D-2.1 README states It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total. The 29 GB combined figure (both pipelines resident at once) overflows the 7900 XTX's 24 GB — but the 21 GB texture stage alone fits. So you run shape → export mesh → free the pipeline → texture, never holding both at once. This is exactly what the AMD-verified split below does — the reporter created code that splits the generation into 2 parts, a mesh generator and a texture generator precisely to avoid the combined peak.

ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (.glb meshes), not 2D pictures. It sits in our 3d vertical; this recipe outputs a textured mesh — geometry plus a baked-on PBR texture set (albedo + metallic + roughness).

Requirements

Component	Minimum	Tested
GPU	21 GB VRAM (texture stage; ~10 GB for shape-only)	RX 7900 XTX (24 GB, gfx1100)
Driver	AMD ROCm on Linux — ROCm 7.1 for the verified textured path; ROCm 6.4.x also works for shape	—
RAM	16GB	—
Storage	~15GB (shape DiT ~7GB + Paint-PBR 2B model + RealESRGAN upscaler)	—
Software	Python 3.11, PyTorch 2.9.1 for ROCm, custom_rasterizer + DifferentiableRenderer built for ROCm	—

Installation

The weights are not gated on Hugging Face — they download freely on the first pipeline call via huggingface_hub. There is, however, a license you must comply with before deploying: see step 5.

There are two AMD-verified setups. For the full textured pipeline, lead with the manjunaths path — it is the setup a tester (zichguan-amd) ran on a 7900 XTX to confirm correct texture (Issue #5981). For shape-only, the lighter dgarcia prebuilt-wheel path also works (see the note at the end of this section).

1. Start from the verified ROCm PyTorch base

The verified textured path (manjunaths/HY3D-2.1-ROCm) builds on AMD's official ROCm PyTorch image — its Dockerfile starts:

FROM rocm/pytorch:rocm7.1_ubuntu22.04_py3.11_pytorch_release_2.9.1

That pins ROCm 7.1, Python 3.11, PyTorch 2.9.1 — the combination zichguan-amd verified produces correct texture on the 7900 XTX. If you build outside Docker, install a matching ROCm PyTorch (the rocmX.Y wheel tag moves between releases — read the live selector at pytorch.org/get-started/locally). Confirm afterward with python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())" — expect a +rocm suffix and True (ROCm masquerades as the cuda device namespace under HIP).

2. Clone Hunyuan3D-2.1 and the ROCm split scripts

git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1
git clone https://github.com/manjunaths/HY3D-2.1-ROCm
# copy the split-pipeline scripts into the Hunyuan3D-2.1 tree
cp HY3D-2.1-ROCm/{requirements.txt,generate3D.py,shape_gen.py,paint_gen.py} Hunyuan3D-2.1/
cd Hunyuan3D-2.1
pip install -r requirements.txt

The three scripts (shape_gen.py, paint_gen.py, generate3D.py) are the key to fitting 24 GB: generate3D.py runs shape_gen.py as a separate subprocess (which exits and frees its VRAM) before launching paint_gen.py, so the shape and texture pipelines are never resident at the same time.

3. Build the texture extensions for ROCm

The texture stage needs two compiled extensions — the same two the official README builds, but compiled on ROCm. Per the manjunaths Dockerfile:

# system libraries the renderer needs
apt install -y libgl1 libx11-dev libxi-dev libxxf86vm-dev libboost-dev \
  libxrender-dev libglu1-mesa-dev freeglut3-dev mesa-common-dev
pip install pymeshlab onnxruntime open3d

# 1) custom_rasterizer (also used by the shape stage)
cd hy3dpaint/custom_rasterizer
python setup.py bdist_wheel
pip install dist/custom_rasterizer-0.1-cp311-cp311-linux_x86_64.whl
cd ../..

# 2) DifferentiableRenderer (the mesh painter)
cd hy3dpaint/DifferentiableRenderer
bash compile_mesh_painter.sh
cd ../..

These are the same hy3dpaint/custom_rasterizer + hy3dpaint/DifferentiableRenderer build steps the official README documents (pip install -e . then bash compile_mesh_painter.sh); the manjunaths Dockerfile just wraps them for the ROCm 7.1 / Python 3.11 base. Do not install xformers or a flash-attn wheel on RDNA3 — neither is the right path. Hunyuan3D-2.1 runs on PyTorch's default SDPA (scaled_dot_product_attention), which is the supported attention backend on ROCm and needs no extra install.

4. Download the texture upscaler checkpoint

The Paint-PBR pipeline uses a RealESRGAN upscaler. Per the official README:

wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth -P hy3dpaint/ckpt

5. Read the Tencent license before you deploy

The weights are governed by the Tencent Hunyuan 3D 2.1 Community License — the HF card lists license: tencent-hunyuan-community (license: other), not Apache 2.0. The license header states it DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA and requires a separate license from Tencent for any product whose monthly active users exceed 1 million. The free, ungated download does not waive these terms — read the LICENSE in full before deploying anything user-facing.

Shape-only alternative (lighter setup). If you only need untextured geometry, the prebuilt-wheel path dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux avoids the from-source custom_rasterizer build: it ships prebuilt wheels (i provide python wheels compiled for python version 3.10 and 3.13), pins torch 2.7.0 on ROCm 6.4.1 (Current Torch 2.8 doesn't work. Previous version 2.8.0.dev20250525+rocm6.4 used to work but it is no longer available. With torch 2.7.0 version it works.), and runs the shape stage in ~10 GB. Clone it, run ./INSTALL.sh, pick your Python version (3.10 or 3.13). This is the shape-only subset of what the manjunaths path above does end-to-end.

Running

Run the two stages sequentially so the shape and texture pipelines never coexist in VRAM. The manjunaths generate3D.py driver does this for you — it runs the shape stage as a subprocess, lets it exit (freeing VRAM), then runs the texture stage:

python generate3D.py assets/demo.png

You will see Stage 1 finished. VRAM cleared. Starting Stage 2: Texturing... between the stages — that VRAM free is what keeps the run inside 24 GB. The output is a textured .glb plus the baked texture maps:

demo_shape_<timestamp>.glb        # untextured mesh (Stage 1 output)
demo_textured_<timestamp>.glb     # textured mesh (final)
demo_textured_<timestamp>.jpg     # albedo
demo_textured_<timestamp>_metallic.jpg
demo_textured_<timestamp>_roughness.jpg

Under the hood the two stages call the standard pipelines. Stage 1 (shape_gen.py) runs Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1') after a BackgroundRemover pass on the input; Stage 2 (paint_gen.py) runs Hunyuan3DPaintPipeline(Hunyuan3DPaintConfig(6, 512)) on the mesh + original image. Open the textured .glb in Blender, three.js, or any glTF 2.0 viewer.

If you ever want to run both pipelines in one Python process (the README's single-file example), Hunyuan3D-2.1 also ships a --low_vram_mode flag — but on a 24 GB card the sequential subprocess split above is the reliable way to stay under the 29 GB combined peak.

Results

VRAM usage: ~10 GB peak for the shape stage and ~21 GB for the texture stage, cited verbatim from the official Hunyuan3D-2.1 README (It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total). Run sequentially, the resident peak is the 21 GB texture stage, which fits the 7900 XTX's 24 GB; the 29 GB combined figure (both pipelines at once) is the number you avoid by splitting. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rx-7900-xtx.
Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RX 7900 XTX benchmark for Hunyuan3D-2.1 was found that could be verified on a source page. Quoting a number from a different card or architecture would mislead. If you run this pair, please contribute your timings via /contribute so they land on /check/hunyuan-3d/rx-7900-xtx.
Texture correctness: verified on the 7900 XTX (gfx1100 / RDNA3) — see Issue #5981, where a tester's (zichguan-amd) 7900 XTX run was confirmed correct by the reporter. The texture-corruption bug is gfx12 / RDNA4-specific (R9700) and does not affect this card.
Output format: textured .glb (glTF 2.0 binary) plus separate albedo / metallic / roughness JPEGs. Universal — import to Blender, Unity, Unreal, three.js.
Quality notes: image-to-3D quality is best when the input photo has a clean background and a single subject (the shape stage runs a BackgroundRemover for you). The pipeline produces watertight geometry with a baked PBR texture set suitable for greyboxing, look-dev, and downstream refinement in a DCC tool.

For the full benchmark data, see /check/hunyuan-3d/rx-7900-xtx.

Troubleshooting

Out of memory during texturing (or when running shape + texture together)

Do not run both pipelines in the same process on a 24 GB card — the combined peak is 29 GB per the official README. Use the sequential split (generate3D.py), which runs shape and texture as separate subprocesses so the shape pipeline's VRAM is freed before texturing starts. The texture stage alone needs ~21 GB, which fits.

Textures come out corrupted (garbage colours / noise)

On the RX 7900 XTX (gfx1100 / RDNA3) this should not happen — texture renders correctly per ROCm/ROCm Issue #5981, where a tester's (zichguan-amd) 7900 XTX output was confirmed correct by the reporter. The corruption bug in that issue is specific to gfx12 / RDNA4 (the Radeon AI PRO R9700), where that tester concluded it does appear to be a gfx12 specific issue (fixed there with TheRock gfx1201 wheels). If you do see corruption on a 7900 XTX, re-check you are on the verified ROCm 7.1 / PyTorch 2.9.1 base rather than chasing the R9700 fix, which does not apply to RDNA3.

`custom_rasterizer` won't build / CUDA errors on import

The stock custom_rasterizer build assumes CUDA. On ROCm, build it from the verified base (ROCm 7.1 / py3.11) as in step 3, or — for shape-only — use the prebuilt wheel from dgarcia1985/Hunyuan3d-2-for-AMDGPU-linux (custom_rasterizer-0.1-py310-...whl / py313). Do not run the upstream CUDA build path on AMD.

Texture generation fails outright when an iGPU is also present

Separate from any memory issue: the dgarcia community repo notes Texture generation fails when integrated+dedicated gpus are present. Disabling it in bios fixed this issue. If you hit a hard texture failure (not corruption) on a system with both an integrated GPU and the discrete 7900 XTX, disabling the iGPU in BIOS is the documented workaround.

Torch errors / pipeline crashes after a torch upgrade

For the shape-only dgarcia path, re-pin torch 2.7.0 — the repo is explicit that Current Torch 2.8 doesn't work. Previous version 2.8.0.dev20250525+rocm6.4 used to work but it is no longer available. With torch 2.7.0 version it works. For the textured manjunaths path, stay on the verified ROCm 7.1 / PyTorch 2.9.1 base; a stray torch upgrade can break the compiled extensions, which were built against that line.

Do not install xformers or FlashAttention

HF/CUDA-oriented guides often suggest pip install xformers or a FlashAttention wheel. On RDNA3 these are the wrong path: the ROCm xformers fork is limited (no FP32, head-dim ≤256) and consumer-card FlashAttention builds are unreliable on gfx1100. Hunyuan3D-2.1 runs on PyTorch SDPA, which is the supported attention backend here — no extra install needed.

For empirical timings and texture-quality reports on this exact card, see /check/hunyuan-3d/rx-7900-xtx and contribute your results via /contribute.