Hunyuan3D-2.1 on RTX 5080: Image-to-Mesh 3D Generation (Shape-Only)

What You'll Build

A working image-to-3D pipeline using Tencent's Hunyuan3D-2.1 shape model: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). Texture generation is skipped on purpose — see the box below.

Hardware data: RTX 5080 (16GB VRAM) · 10GB peak for shape generation · See benchmark data

⚠️ Why shape-only on 16GB? The official Hunyuan3D-2.1 README states: "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total." The texture stage's 21GB peak is above the 5080's 16GB ceiling, so it will OOM. This isn't theoretical for the 5080 specifically: an RTX 5080 owner reports in Issue #15 being "able to generate models just fine on my mobile 5080 but unfortunately lack the VRAM for texture painting." Generate the mesh here and texture it on a rented A100/L40S, in a tool like Substance Painter, or with the lighter Hunyuan3D-Omni control variant. If you must texture locally, see the mmgp-offload workaround in Troubleshooting.

ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (.glb meshes), not 2D pictures. It sits in our 3d vertical; the shape stage covered here outputs an untextured mesh — colour/material is the separate texture stage that's out of scope on this 16GB card.

Requirements

Component	Minimum	Tested
GPU	10GB VRAM (shape pipeline)	RTX 5080 (16GB)
RAM	16GB	—
Storage	~15GB for shape weights	—
Software	Python 3.10, CUDA 12.8, PyTorch 2.7.0+cu128	—

Installation

1. Clone the official repository

git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1.git
cd Hunyuan3D-2.1

2. Install PyTorch with cu128 wheels (Blackwell override)

The README pins torch==2.5.1+cu124, which does not include Blackwell sm_120 kernels and fails at first inference on a 5080 with no kernel image is available for execution on the device. Use the cu128 build instead. The shape pipeline needs only the base requirements — the in-tree texture extensions (custom_rasterizer, compile_mesh_painter.sh, Real-ESRGAN checkpoint) are for the paint stage, which we don't run on a 16GB card, so they can be skipped.

pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

The torch==2.7.0 + cu128 combination is the community-confirmed Blackwell 50-series path documented in Issue #22 — opened by community user pengpeng (GitHub NONE) to upgrade the repo for 50-series cards, and detailed by PlanarFox (community NONE) with a working FROM nvidia/cuda:12.8.0-devel Dockerfile. The same thread also documents TORCH_CUDA_ARCH_LIST="12.0" for anyone who additionally needs to build the texture extensions; the shape-only path here does not require it. This is community-aligned guidance, not official Tencent support.

3. Read the Tencent license before you deploy

The weights are not gated on Hugging Face (they download freely on the first pipeline call via huggingface_hub, no click-through), but they are governed by the Tencent Hunyuan 3D 2.1 Community License — license: other, not Apache 2.0. Verbatim from the license header: "THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW." It also requires you to request a separate license from Tencent for any product whose monthly active users exceed 1 million. The free download does not waive these terms — read the LICENSE in full before deploying anything user-facing.

Running

Use the official Python API from the Hunyuan3D-2.1 README. Run only the shape pipeline — do not instantiate Hunyuan3DPaintPipeline on a 16GB card.

import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline

shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')

The pipeline outputs an untextured .glb mesh (glTF 2.0 binary). Open the result in Blender, three.js, or any glTF 2.0 viewer.

If you prefer a UI, run the official Gradio app with the low-VRAM flag. It still won't let you texture-generate on 16GB, but it works for shape:

python3 gradio_app.py --model_path tencent/Hunyuan3D-2.1 \
  --subfolder hunyuan3d-dit-v2-1 \
  --texgen_model_path tencent/Hunyuan3D-2.1 \
  --low_vram_mode

Results

VRAM usage: 10GB peak for shape generation, cited verbatim from the official README and corroborated for this card class by the 5080 owner in Issue #15. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rtx-5080.
Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RTX 5080 benchmark for Hunyuan3D-2.1 exists yet. The only community timings (in Issue #24) are for the RTX 4090 (Ada sm_89) and RTX 3090 (Ampere sm_86) — different architectures running full mesh+texture, not a 5080 shape-only run, so they don't belong here. The 5080's ~960 GB/s memory bandwidth (vs the 4090's ~1008 GB/s and the 3090's ~936 GB/s) puts it in the same broad performance class, but until a 5080-named measurement appears we route timings to /contribute and /check/hunyuan-3d/rtx-5080.
Output format: .glb (glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to .obj/.fbx via trimesh in Python.
Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject.

For the full benchmark data, see /check/hunyuan-3d/rtx-5080.

Troubleshooting

CUDA OOM during texture generation

Don't run Hunyuan3DPaintPipeline on the 5080. As cited in the README, the texture stage alone needs 21GB — above the 16GB envelope. Generate the mesh here and texture it downstream on bigger hardware or in a non-AI workflow.

`no kernel image is available for execution on the device`

You installed the README's pinned torch==2.5.1+cu124 wheel, which does not include sm_120 kernels for Blackwell. Reinstall with the cu128 wheel per step 2 above (Issue #22 tracks the 50-series upgrade; Issue #122 carries a full Blackwell Dockerfile).

FlashAttention-2 errors / wheel-build failures

Hunyuan3D-2.1's shape diffusion uses PyTorch's default SDPA (scaled_dot_product_attention) backend, which has full Blackwell sm_120 support via the cu128 wheel and needs no extra install. If a dependency tree pulls in flash-attn for an unrelated reason and you hit Could not build wheels for flash-attn, note that the canonical FA2 wheel does not yet ship sm_120 kernels — tracked upstream at Dao-AILab/flash-attention#2168. Pin the dependency tree to skip flash-attn or run without it (SDPA is the default path).

Want to texture locally anyway? mmgp offloading

Texture generation needs 21GB, but the deepbeepmeep/Hunyuan3D-2GP memory-management layer (pip install mmgp) lets you stream the paint pipeline through limited VRAM. The 5080 owner in Issue #15 confirmed this works for them after adapting the demo script (community workaround, not official Tencent support). Expect it to be slower than a card that fits the full 29GB combined peak — for a no-compromise local mesh+texture pass you need ≥29GB (e.g. an RTX 5090).

Want less VRAM up front? Look at Hunyuan3D-Omni

The sibling model Hunyuan3D-Omni adds multi-modal control (point cloud / voxel / pose / bounding-box conditioning on top of image input) and its HF card states "It takes 10 GB VRAM for generation." Same install (cu128 Blackwell wheel), same license. Useful if you want skeletal or voxel control over the generated mesh.