self-hosted/ai
§01·recipe · 3d

TRELLIS image-large on RTX 5080: Image-to-3D Mesh Generation at the 16 GB Floor

3dadvanced16GB+ VRAMMay 29, 2026
models
tools
  • Pytorch
  • Conda
prerequisites
  • NVIDIA RTX 5080 16 GB (Blackwell sm_120)
  • Linux (Ubuntu tested; Windows requires extra steps — see Issue #3)
  • CUDA Toolkit 12.8 (required for sm_120)
  • Conda + Python 3.10+

What You'll Build

A working install of Microsoft's TRELLIS image-large (1.2B-parameter image-to-3D mesh generator, MIT-licensed, arXiv:2412.01506) on an RTX 5080 16 GB — capable of converting a single input image into a textured GLB mesh, a Gaussian-splat representation, or a radiance field. The 5080 sits exactly at the model's officially-stated 16 GB floor, so the recipe is structured around the default code path (no offloading, no quantization tricks); the Blackwell sm_120 build path here mirrors the upstream community fix collected on Issue #243.

ℹ️ Image-to-3D, not text-to-3D. TRELLIS image-large takes a single image as input and produces 3D representations (mesh / Gaussian splat / radiance field) — it does not generate 3D from a text prompt. It lives in our 3d vertical because the catalogue groups all 3D-asset generators together; the model card is explicit that the input is an image. Bring your own reference picture (or generate one with an image model first).

Hardware data: RTX 5080 (16 GB VRAM) · canonical 16 GB minimum per TRELLIS README and Issue #5 (Microsoft collaborator JeffreyXiang) · See benchmark data

⚠️ Known issue: Stock TRELLIS fails on RTX 5080 with NVIDIA GeForce RTX 5080 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The default setup.sh installs PyTorch 2.4.0 + CUDA 11.8 wheels that predate Blackwell; multiple CUDA submodules (kaolin, xformers, diffoctreerast) must be built against PyTorch's cu128 wheel. The same fix path applies across the Blackwell consumer lineup — see Issue #243 and Issue #343.

ℹ️ Tight floor, no headroom. The RTX 5080's 16 GB GDDR7 envelope is the same envelope as the 5060 Ti — both sit at the model's floor. The 5080's higher memory bandwidth and compute will make each pass faster, but it does not buy you VRAM headroom: the default code path fits in 16 GB but texture baking at simplify=0 can OOM on detailed meshes (per Issue #31) — see Troubleshooting for mode='fast' and simplify=0.95 workarounds. If you need more headroom, the RTX 5090 sibling recipe runs the same model with ~16 GB of slack above the floor, and the Microsoft team has TRELLIS.2-4B (a different model entirely, 24 GB minimum) for higher-VRAM cards.

Requirements

ComponentMinimumTested
GPU16 GB VRAM (per README, verified on A100 / A6000)RTX 5080 (16 GB)
RAM16 GB system RAM
Storage~3.30 GB for model weights (HF tree API); ~20 GB total with conda env and CUDA extensions
SoftwareCUDA 12.8, Conda, Python 3.10+, PyTorch ≥ 2.7.1 + cu128

Installation

The default setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast from the TRELLIS README is hard-coded to PyTorch 2.4.0 + CUDA 11.8 and will fail on Blackwell. The steps below follow the community-tested Blackwell path documented in the maepopi fork README (explicitly "RTX 5090 (or other Blackwell GPU)" — the sm_120 fix is shared across all Blackwell consumer cards, the RTX 5080 included) and corroborated by a confirmation on Issue #243 from Caenorst (a contributor to NVIDIA's Kaolin repo) that kaolin v0.18.0 supports current PyTorch / CUDA versions.

1. Verify CUDA 12.8 toolkit

nvcc --version
# Expected: release 12.8 or newer

If nvcc reports an older release, install CUDA Toolkit 12.8 before continuing.

2. Clone the repo

git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git
cd TRELLIS

3. Run partial setup (skip xformers / diffoctreerast / kaolin — we'll build those from source)

. ./setup.sh --new-env --basic --spconv --nvdiffrast

This creates a fresh trellis conda env, installs basic Python dependencies, installs spconv-cu120, and builds nvdiffrast. We deliberately omit --flash-attn here because FlashAttention 2 does not yet ship sm_120 kernels (see Troubleshooting) — TRELLIS runs fine on the xformers backend on Blackwell. Activate the env if not already active: conda activate trellis.

4. Replace torch with cu128 wheel (sm_120 support)

PyTorch 2.7.1+ shipped pre-built CUDA 12.8 wheels with native sm_120 support:

pip install torch==2.7.1+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

This replaces the torch installed in step 3. CUDA extensions built in step 3 may need to be rebuilt against the new torch; if you hit an undefined symbol error later, rebuild the offending extension.

5. Build xformers from source

The PyPI xformers wheels lag behind PyTorch 2.7.1+cu128; build against your installed torch — this is the attention backend TRELLIS will use on Blackwell:

git clone --recurse-submodules https://github.com/facebookresearch/xformers.git
cd xformers
pip install -e .
cd ..

6. Build diffoctreerast from source

mkdir -p /tmp/extensions
git clone --recurse-submodules https://github.com/JeffreyXiang/diffoctreerast.git /tmp/extensions/diffoctreerast
pip install /tmp/extensions/diffoctreerast

7. Install kaolin v0.18.0+ (sm_120 support)

Caenorst, a NVIDIAGameWorks/kaolin contributor, noted on Issue #243 that kaolin v0.18.0 supports current PyTorch / CUDA versions:

git clone https://github.com/NVIDIAGameWorks/kaolin
cd kaolin
export IGNORE_TORCH_VER=1
pip install "Cython >= 0.29.37"
pip install -e .
cd ..

If pip install -e . fails on a cuda_post_cflags / Unknown CUDA arch ("12.0") or GPU not supported error, ensure you're on kaolin master (v0.18.0+); older releases hard-coded the supported arch list.

8. Install Gradio demo dependencies (optional but recommended)

. ./setup.sh --demo

9. Re-pin torchvision (the demo setup may downgrade it)

pip uninstall -y torchvision
pip install torchvision --index-url https://download.pytorch.org/whl/cu128

Running

Verify the install with the upstream example.py — it downloads the weights from HuggingFace on first run (~3.30 GB to ~/.cache/huggingface/hub/):

python example.py

You should get five files in the working directory:

  • sample_gs.mp4 — turntable video of the 3D Gaussian representation
  • sample_rf.mp4 — turntable of the radiance field
  • sample_mesh.mp4 — turntable of the normal-shaded mesh
  • sample.glb — textured GLB exportable to Blender / Unity / web viewers
  • sample.ply — raw 3D Gaussian point cloud

For the interactive Gradio demo:

python app.py

Then open the URL it prints (default http://127.0.0.1:7860). The demo lets you drop in a single image, runs the same TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large") pipeline, and previews the Gaussian / radiance / mesh outputs side-by-side.

Tightening texture baking for the 16 GB floor

The default postprocessing_utils.to_glb(...) call in example.py keeps simplify=0.95 and texture_size=1024, which fits the 5080 comfortably. If you call the pipeline directly with simplify=0 (no mesh decimation) on a complex input, the texture-baking stage can OOM even on 24 GB cards (per PladsElsker on Issue #31). Keep simplify0.9 on this card, and for very dense meshes set mode='fast' in to_glb (0lento's workaround on Issue #31).

Results

  • Speed: No RTX 5080–named TRELLIS measurement has been published. The 5080 has roughly 2× the memory bandwidth of the 5060 Ti (~960 GB/s vs ~448 GB/s) and more compute, so each pass will be meaningfully faster than the 16 GB-floor sibling card — but with no published 5080-named figure, quoting a number here would be a guess. Once a community benchmark lands via /contribute, this section will pick it up. For now, see /check/trellis-image-large/rtx-5080 for the live data.
  • VRAM usage: The canonical TRELLIS README states "An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs." — and Microsoft collaborator JeffreyXiang reiterated this in Issue #5: "Currently at least 16GB VRAM is required." The 5080 is at the floor — the default code path fits, but with no headroom for simplify=0 texture bakes (see Troubleshooting).
  • Quality notes: TRELLIS image-large is a 1.2B-parameter SLAT (Structured LATent) flow model — see the arXiv paper for the architecture. It outputs three representations from one pass; the GLB export from postprocessing_utils.to_glb(...) is the most directly usable downstream artifact (drop into Blender, Three.js, or any GLTF-aware viewer).

For the full benchmark data, see /check/trellis-image-large/rtx-5080.

Troubleshooting

NVIDIA GeForce RTX 5080 with CUDA capability sm_120 is not compatible with the current PyTorch installation

The pre-built PyTorch 2.4.0 wheel that setup.sh --basic installs is compiled for CUDA 11.8 and predates Blackwell. The fix is step 4 above — install PyTorch 2.7.1+cu128 from the cu128 index. The canonical tracking thread is Issue #243, which collects working install paths from multiple contributors (maepopi, SanBingYouYong, zhizdev, Caenorst). RTX 50-series Blackwell support is also tracked in Issue #343, where IgorAherne confirms his recompiled trellis-stable-projectorz build supports "5000 cards".

Unknown CUDA arch ("12.0") or GPU not supported

Reported by Polytoo on Issue #243 — fires when an installed extension's bundled torch.utils.cpp_extension doesn't recognize compute_120. Rebuild the offending extension after step 4: usually kaolin (step 7) or xformers (step 5). Make sure you're on the upstream master of each (kaolin v0.18.0+, xformers latest) — older tagged releases pre-date Blackwell.

Texture-baking OOM at the 16 GB floor

The texture-bake stage in postprocessing_utils.to_glb(...) is the single largest VRAM consumer in the pipeline. On a 16 GB card with the default simplify=0.95 and texture_size=1024 the bake fits, but on detailed meshes with simplify=0 it can OOM (per Issue #31). Three remediations, in order:

  1. Keep simplify0.9 (the default 0.95 is already safe).
  2. Set mode='fast' in the to_glb call (0lento's diff).
  3. If calling the pipeline programmatically (not via app.py), del pipeline before invoking to_glb to free the SLAT decoder's VRAM for the bake stage (same comment).

If you still OOM, you have effectively outgrown the 16 GB floor — the 24 GB tier (RTX 5090 sibling recipe) is the next stop.

flash_attn import fails (undefined symbol: _ZN3c105ErrorC...)

This recipe skips --flash-attn in step 3 precisely because FlashAttention 2 does not currently ship sm_120 kernels — coverage is tracked at Dao-AILab/flash-attention#2168. If you installed flash-attn anyway and hit an ABI / undefined symbol error after pinning PyTorch 2.7.1+cu128, force the xformers backend before importing TRELLIS:

import os
os.environ['ATTN_BACKEND'] = 'xformers'  # before any TRELLIS import

TRELLIS supports both flash-attn and xformers attention backends — see the Minimal Example at the top of the upstream README. The xformers backend (step 5) works on Blackwell.

GLIBCXX_3.4.30 not found at import time

conda install -c conda-forge libstdcxx-ng

The system libstdc++ shipped with older Ubuntu LTS lags the version Caffe2 / PyTorch needs. The conda-forge package is the safe override.

Tremendous VRAM allocation request (Tried to allocate 196.89 GiB)

Issue #79diffoctreerast can mis-size its allocation when given certain input image shapes (transparency / unusual aspect ratios). Pre-process input images to a square aspect ratio (the upstream app.py does this automatically; if calling pipeline.run directly, mirror its preprocessing).

Windows install

Windows is documented as not fully tested by Microsoft — see Issue #3. For RTX 50-series on Windows, Issue #259 collects a full tutorial with pre-compiled libraries from FurkanGozukara, and trellis-stable-projectorz v40 (recompiled for "5000 cards" per #343 comment) is the path of least resistance for first-class Blackwell support on Windows. The steps above target Linux.