Hunyuan3D-2.1 on RTX 4080: Image-to-Mesh 3D Generation (Shape-Only)

What You'll Build

A working image-to-3D pipeline using Tencent's Hunyuan3D-2.1 shape model: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). Texture generation is skipped on purpose — see the box below.

Hardware data: RTX 4080 (16GB VRAM) · ~10GB peak for shape generation · See benchmark data

⚠️ Why shape-only on 16GB? The official Hunyuan3D-2.1 README states: "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total." The texture stage's 21GB peak (and the 29GB combined peak) is above the 4080's 16GB ceiling, so the paint pipeline will OOM. The texture stage is heavy enough that a user in Issue #80 reports it overflowing even a 24GB card. Generate the mesh here and texture it downstream — on rented A100/L40S-class hardware, in a tool like Substance Painter, or with the lighter Hunyuan3D-Omni control variant. If you must texture locally, see the mmgp-offload workaround in Troubleshooting.

ℹ️ Meshes, not images. Hunyuan3D-2.1 produces 3D geometry (.glb meshes), not 2D pictures. It sits in our 3d vertical; the shape stage covered here outputs an untextured mesh — colour/material is the separate texture stage that is out of scope on this 16GB card.

Requirements

Component	Minimum	Tested
GPU	10GB VRAM (shape pipeline)	RTX 4080 (16GB)
RAM	16GB	—
Storage	~7GB for the shape DiT checkpoint (~15GB if you also pull the VAE/texture folders)	—
Software	Python 3.10, PyTorch 2.5.1+cu124	—

Installation

The weights are not gated on Hugging Face — they download freely on the first pipeline call via huggingface_hub, with no click-through to accept. There is, however, a license you must comply with before deploying: see step 3.

1. Clone the official repository

git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1.git
cd Hunyuan3D-2.1

2. Install PyTorch and dependencies (Ada uses the stock cu124 wheel)

The README tests with Python 3.10 and torch==2.5.1+cu124. On the RTX 4080 (Ada Lovelace, sm_89) this is exactly the right wheel — the stock cu124 build already ships sm_89 kernels, so no special index-url override is required. (This is the key difference from Blackwell 50-series cards, which need a newer cu128 wheel for sm_120 kernels; Ada does not.)

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

The in-tree texture extensions (hy3dpaint/custom_rasterizer, compile_mesh_painter.sh, and the Real-ESRGAN checkpoint) are only needed for the paint stage, which does not fit a 16GB card — skip building them for the shape-only path. If you do build them later, note that on Ada sm_89 the prebuilt CUDA toolchain covers the target directly (no architecture-specific workaround needed).

3. Read the Tencent license before you deploy

The weights are governed by the Tencent Hunyuan 3D 2.1 Community License — the HF card lists license: other (tencent-hunyuan-community), not Apache 2.0. Verbatim from the license header: "THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW." It also requires a separate license from Tencent for any product whose monthly active users exceed 1 million. The free, ungated download does not waive these terms — read the LICENSE in full before deploying anything user-facing.

Running

Use the official Python API from the Hunyuan3D-2.1 README. Run only the shape pipeline — do not instantiate Hunyuan3DPaintPipeline on a 16GB card. On the first call the shape DiT checkpoint (hunyuan3d-dit-v2-1/model.fp16.ckpt, ~6.9GB) downloads automatically.

import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline

shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')

For best results, give the pipeline a clean cutout — the repo ships a BackgroundRemover helper in hy3dshape/rembg.py you can run on the input image before passing it in. The pipeline outputs an untextured .glb mesh (glTF 2.0 binary); open the result in Blender, three.js, or any glTF 2.0 viewer.

If you prefer a UI, run the official Gradio app with the low-VRAM flag. It still won't let you texture-generate on 16GB, but it works for shape:

python3 gradio_app.py \
  --model_path tencent/Hunyuan3D-2.1 \
  --subfolder hunyuan3d-dit-v2-1 \
  --texgen_model_path tencent/Hunyuan3D-2.1 \
  --low_vram_mode

Results

VRAM usage: ~10GB peak for shape generation, cited verbatim from the official README ("It takes 10 GB VRAM for shape generation"). The 3.3B shape DiT ships as a single ~6.9GB FP16 checkpoint; the ~10GB figure accounts for the VAE and activations on top. This fits the 4080's 16GB with headroom. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rtx-4080.
Speed: intentionally omitted. Tencent publishes no per-GPU timings, and no RTX 4080 benchmark for Hunyuan3D-2.1 exists yet. The 4080 is not a close compute-sibling of any card with published Hunyuan3D timings (the community timings that exist are for full mesh+texture runs on different architectures, not a 4080 shape-only run), so quoting one here would mislead. We route timings to /contribute and /check/hunyuan-3d/rtx-4080.
Output format: .glb (glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to .obj/.fbx via trimesh in Python.
Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject. The shape stage produces watertight geometry suitable for greyboxing, retopology sources, and downstream texturing in a DCC tool.

For the full benchmark data, see /check/hunyuan-3d/rtx-4080.

Troubleshooting

CUDA OOM during texture generation

Don't run Hunyuan3DPaintPipeline on the 4080. Per the README, the texture stage alone needs 21GB and the combined shape+texture run needs 29GB — both above the 16GB envelope. Generate the mesh here and texture it downstream on bigger hardware or in a non-AI workflow.

Want to texture locally anyway? mmgp offloading

The texture stage needs 21GB, but the deepbeepmeep/Hunyuan3D-2GP memory-management layer (pip install mmgp) streams the paint pipeline through limited VRAM. This is a community workaround, not official Tencent support, and it is slower than a card that fits the full 29GB combined peak — for a no-compromise local mesh+texture pass you need a 24GB+ card, and per Issue #80 even 24GB can fall short, pushing the comfortable target toward 29GB+ (e.g. an RTX 5090 / A100 / L40S).

FlashAttention build failures

Hunyuan3D-2.1's shape diffusion uses PyTorch's default SDPA (scaled_dot_product_attention) backend, which needs no extra install and is fully supported on Ada sm_89. Ada is not affected by the Blackwell sm_120 FlashAttention wheel gap — prebuilt flash-attn wheels cover sm_89 — so if a dependency pulls in flash-attn and the build fails, it is an unrelated toolchain issue, not an architecture gap. The shape path runs fine on SDPA without flash-attn.

Want less VRAM up front? Look at Hunyuan3D-Omni

The sibling model Hunyuan3D-Omni adds multi-modal control (point cloud / voxel / pose / bounding-box conditioning on top of image input) and its HF card states "It takes 10 GB VRAM for generation." Same install path, same license. Useful if you want skeletal or voxel control over the generated mesh.

For texture and PBR options that exceed 16GB, see /check/hunyuan-3d/rtx-4080 and contribute your results via /contribute.