What You'll Build
A working install of Microsoft's TRELLIS image-large (1.2B-parameter image-to-3D mesh generator, MIT-licensed, arXiv:2412.01506) on an RTX 5080 16 GB — capable of converting a single input image into a textured GLB mesh, a Gaussian-splat representation, or a radiance field. The 5080 sits exactly at the model's officially-stated 16 GB floor, so the recipe is structured around the default code path (no offloading, no quantization tricks); the Blackwell sm_120 build path here mirrors the upstream community fix collected on Issue #243.
ℹ️ Image-to-3D, not text-to-3D. TRELLIS image-large takes a single image as input and produces 3D representations (mesh / Gaussian splat / radiance field) — it does not generate 3D from a text prompt. It lives in our
3dvertical because the catalogue groups all 3D-asset generators together; the model card is explicit that the input is an image. Bring your own reference picture (or generate one with an image model first).
Hardware data: RTX 5080 (16 GB VRAM) · canonical 16 GB minimum per TRELLIS README and Issue #5 (Microsoft collaborator JeffreyXiang) · See benchmark data
⚠️ Known issue: Stock TRELLIS fails on RTX 5080 with
NVIDIA GeForce RTX 5080 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The defaultsetup.shinstalls PyTorch 2.4.0 + CUDA 11.8 wheels that predate Blackwell; multiple CUDA submodules (kaolin,xformers,diffoctreerast) must be built against PyTorch'scu128wheel. The same fix path applies across the Blackwell consumer lineup — see Issue #243 and Issue #343.
ℹ️ Tight floor, no headroom. The RTX 5080's 16 GB GDDR7 envelope is the same envelope as the 5060 Ti — both sit at the model's floor. The 5080's higher memory bandwidth and compute will make each pass faster, but it does not buy you VRAM headroom: the default code path fits in 16 GB but texture baking at
simplify=0can OOM on detailed meshes (per Issue #31) — see Troubleshooting formode='fast'andsimplify=0.95workarounds. If you need more headroom, the RTX 5090 sibling recipe runs the same model with ~16 GB of slack above the floor, and the Microsoft team has TRELLIS.2-4B (a different model entirely, 24 GB minimum) for higher-VRAM cards.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16 GB VRAM (per README, verified on A100 / A6000) | RTX 5080 (16 GB) |
| RAM | 16 GB system RAM | — |
| Storage | ~3.30 GB for model weights (HF tree API); ~20 GB total with conda env and CUDA extensions | — |
| Software | CUDA 12.8, Conda, Python 3.10+, PyTorch ≥ 2.7.1 + cu128 | — |
Installation
The default setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast from the TRELLIS README is hard-coded to PyTorch 2.4.0 + CUDA 11.8 and will fail on Blackwell. The steps below follow the community-tested Blackwell path documented in the maepopi fork README (explicitly "RTX 5090 (or other Blackwell GPU)" — the sm_120 fix is shared across all Blackwell consumer cards, the RTX 5080 included) and corroborated by a confirmation on Issue #243 from Caenorst (a contributor to NVIDIA's Kaolin repo) that kaolin v0.18.0 supports current PyTorch / CUDA versions.
1. Verify CUDA 12.8 toolkit
nvcc --version
# Expected: release 12.8 or newer
If nvcc reports an older release, install CUDA Toolkit 12.8 before continuing.
2. Clone the repo
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git
cd TRELLIS
3. Run partial setup (skip xformers / diffoctreerast / kaolin — we'll build those from source)
. ./setup.sh --new-env --basic --spconv --nvdiffrast
This creates a fresh trellis conda env, installs basic Python dependencies, installs spconv-cu120, and builds nvdiffrast. We deliberately omit --flash-attn here because FlashAttention 2 does not yet ship sm_120 kernels (see Troubleshooting) — TRELLIS runs fine on the xformers backend on Blackwell. Activate the env if not already active: conda activate trellis.
4. Replace torch with cu128 wheel (sm_120 support)
PyTorch 2.7.1+ shipped pre-built CUDA 12.8 wheels with native sm_120 support:
pip install torch==2.7.1+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
This replaces the torch installed in step 3. CUDA extensions built in step 3 may need to be rebuilt against the new torch; if you hit an undefined symbol error later, rebuild the offending extension.
5. Build xformers from source
The PyPI xformers wheels lag behind PyTorch 2.7.1+cu128; build against your installed torch — this is the attention backend TRELLIS will use on Blackwell:
git clone --recurse-submodules https://github.com/facebookresearch/xformers.git
cd xformers
pip install -e .
cd ..
6. Build diffoctreerast from source
mkdir -p /tmp/extensions
git clone --recurse-submodules https://github.com/JeffreyXiang/diffoctreerast.git /tmp/extensions/diffoctreerast
pip install /tmp/extensions/diffoctreerast
7. Install kaolin v0.18.0+ (sm_120 support)
Caenorst, a NVIDIAGameWorks/kaolin contributor, noted on Issue #243 that kaolin v0.18.0 supports current PyTorch / CUDA versions:
git clone https://github.com/NVIDIAGameWorks/kaolin
cd kaolin
export IGNORE_TORCH_VER=1
pip install "Cython >= 0.29.37"
pip install -e .
cd ..
If pip install -e . fails on a cuda_post_cflags / Unknown CUDA arch ("12.0") or GPU not supported error, ensure you're on kaolin master (v0.18.0+); older releases hard-coded the supported arch list.
8. Install Gradio demo dependencies (optional but recommended)
. ./setup.sh --demo
9. Re-pin torchvision (the demo setup may downgrade it)
pip uninstall -y torchvision
pip install torchvision --index-url https://download.pytorch.org/whl/cu128
Running
Verify the install with the upstream example.py — it downloads the weights from HuggingFace on first run (~3.30 GB to ~/.cache/huggingface/hub/):
python example.py
You should get five files in the working directory:
sample_gs.mp4— turntable video of the 3D Gaussian representationsample_rf.mp4— turntable of the radiance fieldsample_mesh.mp4— turntable of the normal-shaded meshsample.glb— textured GLB exportable to Blender / Unity / web viewerssample.ply— raw 3D Gaussian point cloud
For the interactive Gradio demo:
python app.py
Then open the URL it prints (default http://127.0.0.1:7860). The demo lets you drop in a single image, runs the same TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large") pipeline, and previews the Gaussian / radiance / mesh outputs side-by-side.
Tightening texture baking for the 16 GB floor
The default postprocessing_utils.to_glb(...) call in example.py keeps simplify=0.95 and texture_size=1024, which fits the 5080 comfortably. If you call the pipeline directly with simplify=0 (no mesh decimation) on a complex input, the texture-baking stage can OOM even on 24 GB cards (per PladsElsker on Issue #31). Keep simplify ≥ 0.9 on this card, and for very dense meshes set mode='fast' in to_glb (0lento's workaround on Issue #31).
Results
- Speed: No RTX 5080–named TRELLIS measurement has been published. The 5080 has roughly 2× the memory bandwidth of the 5060 Ti (~960 GB/s vs ~448 GB/s) and more compute, so each pass will be meaningfully faster than the 16 GB-floor sibling card — but with no published 5080-named figure, quoting a number here would be a guess. Once a community benchmark lands via /contribute, this section will pick it up. For now, see /check/trellis-image-large/rtx-5080 for the live data.
- VRAM usage: The canonical TRELLIS README states "An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs." — and Microsoft collaborator JeffreyXiang reiterated this in Issue #5: "Currently at least 16GB VRAM is required." The 5080 is at the floor — the default code path fits, but with no headroom for
simplify=0texture bakes (see Troubleshooting). - Quality notes: TRELLIS image-large is a 1.2B-parameter SLAT (Structured LATent) flow model — see the arXiv paper for the architecture. It outputs three representations from one pass; the GLB export from
postprocessing_utils.to_glb(...)is the most directly usable downstream artifact (drop into Blender, Three.js, or any GLTF-aware viewer).
For the full benchmark data, see /check/trellis-image-large/rtx-5080.
Troubleshooting
NVIDIA GeForce RTX 5080 with CUDA capability sm_120 is not compatible with the current PyTorch installation
The pre-built PyTorch 2.4.0 wheel that setup.sh --basic installs is compiled for CUDA 11.8 and predates Blackwell. The fix is step 4 above — install PyTorch 2.7.1+cu128 from the cu128 index. The canonical tracking thread is Issue #243, which collects working install paths from multiple contributors (maepopi, SanBingYouYong, zhizdev, Caenorst). RTX 50-series Blackwell support is also tracked in Issue #343, where IgorAherne confirms his recompiled trellis-stable-projectorz build supports "5000 cards".
Unknown CUDA arch ("12.0") or GPU not supported
Reported by Polytoo on Issue #243 — fires when an installed extension's bundled torch.utils.cpp_extension doesn't recognize compute_120. Rebuild the offending extension after step 4: usually kaolin (step 7) or xformers (step 5). Make sure you're on the upstream master of each (kaolin v0.18.0+, xformers latest) — older tagged releases pre-date Blackwell.
Texture-baking OOM at the 16 GB floor
The texture-bake stage in postprocessing_utils.to_glb(...) is the single largest VRAM consumer in the pipeline. On a 16 GB card with the default simplify=0.95 and texture_size=1024 the bake fits, but on detailed meshes with simplify=0 it can OOM (per Issue #31). Three remediations, in order:
- Keep
simplify≥0.9(the default0.95is already safe). - Set
mode='fast'in theto_glbcall (0lento's diff). - If calling the pipeline programmatically (not via
app.py),del pipelinebefore invokingto_glbto free the SLAT decoder's VRAM for the bake stage (same comment).
If you still OOM, you have effectively outgrown the 16 GB floor — the 24 GB tier (RTX 5090 sibling recipe) is the next stop.
flash_attn import fails (undefined symbol: _ZN3c105ErrorC...)
This recipe skips --flash-attn in step 3 precisely because FlashAttention 2 does not currently ship sm_120 kernels — coverage is tracked at Dao-AILab/flash-attention#2168. If you installed flash-attn anyway and hit an ABI / undefined symbol error after pinning PyTorch 2.7.1+cu128, force the xformers backend before importing TRELLIS:
import os
os.environ['ATTN_BACKEND'] = 'xformers' # before any TRELLIS import
TRELLIS supports both flash-attn and xformers attention backends — see the Minimal Example at the top of the upstream README. The xformers backend (step 5) works on Blackwell.
GLIBCXX_3.4.30 not found at import time
conda install -c conda-forge libstdcxx-ng
The system libstdc++ shipped with older Ubuntu LTS lags the version Caffe2 / PyTorch needs. The conda-forge package is the safe override.
Tremendous VRAM allocation request (Tried to allocate 196.89 GiB)
Issue #79 — diffoctreerast can mis-size its allocation when given certain input image shapes (transparency / unusual aspect ratios). Pre-process input images to a square aspect ratio (the upstream app.py does this automatically; if calling pipeline.run directly, mirror its preprocessing).
Windows install
Windows is documented as not fully tested by Microsoft — see Issue #3. For RTX 50-series on Windows, Issue #259 collects a full tutorial with pre-compiled libraries from FurkanGozukara, and trellis-stable-projectorz v40 (recompiled for "5000 cards" per #343 comment) is the path of least resistance for first-class Blackwell support on Windows. The steps above target Linux.