What You'll Build
A working image-to-3D pipeline using Tencent's Hunyuan3D-2.1 shape model: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). Texture generation is skipped on purpose — see the box below.
Hardware data: RTX 5080 (16GB VRAM) · 10GB peak for shape generation · See benchmark data
⚠️ Why shape-only on 16GB? The official Hunyuan3D-2.1 README states: "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total." The texture stage's 21GB peak is above the 5080's 16GB ceiling, so it will OOM. This isn't theoretical for the 5080 specifically: an RTX 5080 owner reports in Issue #15 being "able to generate models just fine on my mobile 5080 but unfortunately lack the VRAM for texture painting." Generate the mesh here and texture it on a rented A100/L40S, in a tool like Substance Painter, or with the lighter Hunyuan3D-Omni control variant. If you must texture locally, see the mmgp-offload workaround in Troubleshooting.
ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (
.glbmeshes), not 2D pictures. It sits in our3dvertical; the shape stage covered here outputs an untextured mesh — colour/material is the separate texture stage that's out of scope on this 16GB card.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 10GB VRAM (shape pipeline) | RTX 5080 (16GB) |
| RAM | 16GB | — |
| Storage | ~15GB for shape weights | — |
| Software | Python 3.10, CUDA 12.8, PyTorch 2.7.0+cu128 | — |
Installation
1. Clone the official repository
git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1.git
cd Hunyuan3D-2.1
2. Install PyTorch with cu128 wheels (Blackwell override)
The README pins torch==2.5.1+cu124, which does not include Blackwell sm_120 kernels and fails at first inference on a 5080 with no kernel image is available for execution on the device. Use the cu128 build instead. The shape pipeline needs only the base requirements — the in-tree texture extensions (custom_rasterizer, compile_mesh_painter.sh, Real-ESRGAN checkpoint) are for the paint stage, which we don't run on a 16GB card, so they can be skipped.
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
The torch==2.7.0 + cu128 combination is the community-confirmed Blackwell 50-series path documented in Issue #22 — opened by community user pengpeng (GitHub NONE) to upgrade the repo for 50-series cards, and detailed by PlanarFox (community NONE) with a working FROM nvidia/cuda:12.8.0-devel Dockerfile. The same thread also documents TORCH_CUDA_ARCH_LIST="12.0" for anyone who additionally needs to build the texture extensions; the shape-only path here does not require it. This is community-aligned guidance, not official Tencent support.
3. Read the Tencent license before you deploy
The weights are not gated on Hugging Face (they download freely on the first pipeline call via huggingface_hub, no click-through), but they are governed by the Tencent Hunyuan 3D 2.1 Community License — license: other, not Apache 2.0. Verbatim from the license header: "THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW." It also requires you to request a separate license from Tencent for any product whose monthly active users exceed 1 million. The free download does not waive these terms — read the LICENSE in full before deploying anything user-facing.
Running
Use the official Python API from the Hunyuan3D-2.1 README. Run only the shape pipeline — do not instantiate Hunyuan3DPaintPipeline on a 16GB card.
import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline
shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')
The pipeline outputs an untextured .glb mesh (glTF 2.0 binary). Open the result in Blender, three.js, or any glTF 2.0 viewer.
If you prefer a UI, run the official Gradio app with the low-VRAM flag. It still won't let you texture-generate on 16GB, but it works for shape:
python3 gradio_app.py --model_path tencent/Hunyuan3D-2.1 \
--subfolder hunyuan3d-dit-v2-1 \
--texgen_model_path tencent/Hunyuan3D-2.1 \
--low_vram_mode
Results
- VRAM usage: 10GB peak for shape generation, cited verbatim from the official README and corroborated for this card class by the 5080 owner in Issue #15. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rtx-5080.
- Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RTX 5080 benchmark for Hunyuan3D-2.1 exists yet. The only community timings (in Issue #24) are for the RTX 4090 (Ada sm_89) and RTX 3090 (Ampere sm_86) — different architectures running full mesh+texture, not a 5080 shape-only run, so they don't belong here. The 5080's ~960 GB/s memory bandwidth (vs the 4090's ~1008 GB/s and the 3090's ~936 GB/s) puts it in the same broad performance class, but until a 5080-named measurement appears we route timings to /contribute and /check/hunyuan-3d/rtx-5080.
- Output format:
.glb(glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to.obj/.fbxviatrimeshin Python. - Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject.
For the full benchmark data, see /check/hunyuan-3d/rtx-5080.
Troubleshooting
CUDA OOM during texture generation
Don't run Hunyuan3DPaintPipeline on the 5080. As cited in the README, the texture stage alone needs 21GB — above the 16GB envelope. Generate the mesh here and texture it downstream on bigger hardware or in a non-AI workflow.
no kernel image is available for execution on the device
You installed the README's pinned torch==2.5.1+cu124 wheel, which does not include sm_120 kernels for Blackwell. Reinstall with the cu128 wheel per step 2 above (Issue #22 tracks the 50-series upgrade; Issue #122 carries a full Blackwell Dockerfile).
FlashAttention-2 errors / wheel-build failures
Hunyuan3D-2.1's shape diffusion uses PyTorch's default SDPA (scaled_dot_product_attention) backend, which has full Blackwell sm_120 support via the cu128 wheel and needs no extra install. If a dependency tree pulls in flash-attn for an unrelated reason and you hit Could not build wheels for flash-attn, note that the canonical FA2 wheel does not yet ship sm_120 kernels — tracked upstream at Dao-AILab/flash-attention#2168. Pin the dependency tree to skip flash-attn or run without it (SDPA is the default path).
Want to texture locally anyway? mmgp offloading
Texture generation needs 21GB, but the deepbeepmeep/Hunyuan3D-2GP memory-management layer (pip install mmgp) lets you stream the paint pipeline through limited VRAM. The 5080 owner in Issue #15 confirmed this works for them after adapting the demo script (community workaround, not official Tencent support). Expect it to be slower than a card that fits the full 29GB combined peak — for a no-compromise local mesh+texture pass you need ≥29GB (e.g. an RTX 5090).
Want less VRAM up front? Look at Hunyuan3D-Omni
The sibling model Hunyuan3D-Omni adds multi-modal control (point cloud / voxel / pose / bounding-box conditioning on top of image input) and its HF card states "It takes 10 GB VRAM for generation." Same install (cu128 Blackwell wheel), same license. Useful if you want skeletal or voxel control over the generated mesh.