Hunyuan3D-2.1 on RTX 5060 Ti: Image-to-Mesh 3D Generation (Shape-Only)

What You'll Build

A working image-to-3D pipeline using Tencent's Hunyuan3D-2.1 shape model: drop in a reference photo, get back an untextured .glb mesh ready for any DCC tool (Blender, Unity, Unreal, three.js). Texture generation is skipped on purpose — see the box below.

Hardware data: RTX 5060 Ti (16GB VRAM) · 10GB peak for shape generation · See benchmark data

⚠️ Why shape-only on 16GB? The official Hunyuan3D-2.1 README states: "It takes 10 GB VRAM for shape generation, 21GB for texture generation and 29GB for shape and texture generation in total." Even with --low_vram_mode enabled, the DeepWiki memory-optimization page reports the flag "Reduces peak memory from 29GB to 21GB" — still above the 5060 Ti's 16GB ceiling. The texture stage will OOM. Generate the mesh here and texture it on a rented A100/L40S, in a tool like Substance Painter, or with the lighter Hunyuan3D-Omni control variant.

Requirements

Component	Minimum	Tested
GPU	10GB VRAM (shape pipeline)	RTX 5060 Ti (16GB)
RAM	16GB	—
Storage	~15GB for weights	—
Software	Python 3.10, CUDA 12.4, PyTorch 2.5.1	—

Installation

1. Clone the official repository

git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1.git
cd Hunyuan3D-2.1

2. Install pinned PyTorch and dependencies

The README pins exact wheel versions — don't deviate; flow-matching kernels are sensitive to torch ABI changes.

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

Dependencies include diffusers, transformers, omegaconf, trimesh, and rembg (for background removal), per the DeepWiki Getting Started page.

3. Accept the Tencent license

Weights are gated by the Tencent Hunyuan 3D 2.1 Community License. It is not Apache 2.0 — it limits use to certain territories (excludes EU, UK, South Korea) and requires explicit approval from Tencent for products with >1M monthly active users. Read it before deploying anything user-facing.

Log in to Hugging Face and accept the license on the model page: https://huggingface.co/tencent/Hunyuan3D-2.1. Weights download on first pipeline call via huggingface_hub.

Running

Use the official Python API from the Hunyuan3D-2.1 README. Run only the shape pipeline — do not instantiate Hunyuan3DPaintPipeline on a 16GB card.

import sys
sys.path.insert(0, './hy3dshape')
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline

shape_pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2.1')
mesh_untextured = shape_pipeline(image='assets/demo.png')[0]
mesh_untextured.export('output.glb')

The pipeline outputs an untextured .glb mesh, the format explicitly documented on the DeepWiki Getting Started page ("Untextured .glb mesh"). Open the result in Blender, three.js, or any glTF 2.0 viewer.

If you prefer a UI, run the Gradio app with the official low-VRAM flag — it still won't allow texture generation on 16GB, but it works for shape:

python3 gradio_app.py --model_path tencent/Hunyuan3D-2.1 \
  --subfolder hunyuan3d-dit-v2-1 \
  --texgen_model_path tencent/Hunyuan3D-2.1 \
  --low_vram_mode

The --low_vram_mode flag, per the DeepWiki memory page, "enables sequential model loading" and "calls torch.cuda.empty_cache() after each stage" — useful even for shape-only runs to keep peak well below 10GB.

Results

VRAM usage: 10GB peak for shape generation, cited verbatim from the official README. For empirical numbers on this exact GPU pair once a community submission lands, see /check/hunyuan-3d/rtx-5060-ti.
Output format: .glb (glTF 2.0 binary). Universal — import to Blender, Unity, Unreal, three.js, or convert to .obj/.fbx via trimesh in Python.
Quality notes: Image-to-3D quality is best when the input photo has a clean background and a single subject. rembg is installed by default and applied automatically.

For the full benchmark data, see /check/hunyuan-3d/rtx-5060-ti.

Troubleshooting

CUDA OOM during texture generation

Don't run Hunyuan3DPaintPipeline on the 5060 Ti. As cited in the README, texture stage alone needs 21GB. Generate the mesh here, texture downstream on bigger hardware or in a non-AI workflow.

`torch` / CUDA version mismatch

The repo pins torch==2.5.1 against CUDA 12.4. If your system CUDA is older, install the matching wheel from https://download.pytorch.org/whl/cu121 or upgrade the toolchain. Diffusers' flow-matching scheduler will silently produce broken meshes if the kernels disagree with the runtime.

Want even less VRAM? Look at Hunyuan3D-Omni

The sibling model Hunyuan3D-Omni adds multi-modal control (point cloud / voxel / pose / bounding-box conditioning on top of image input) and explicitly states "It takes 10 GB VRAM for generation" on its HF card. Same install (pinned to torch 2.5.1 cu124), same license. Useful if you want skeletal or voxel control over the generated mesh.