What You'll Build
A working install of Microsoft's TRELLIS image-large (1.2B-parameter image-to-3D mesh generator, MIT-licensed) on an RTX 5090 — capable of converting a single input image into a textured GLB mesh, a Gaussian-splat representation, or a radiance field. The default setup.sh install path does not work on Blackwell GPUs; this recipe walks through the community-maintained Blackwell build path that compiles xformers, kaolin, and diffoctreerast from source against PyTorch + CUDA 12.8.
Hardware data: RTX 5090 (32 GB VRAM) · canonical 16 GB minimum per TRELLIS README · See benchmark data
⚠️ Known issue: Stock TRELLIS fails on RTX 5090 with
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation— multiple submodules (kaolin,xformers,diffoctreerast,spconv) need to be built against PyTorch'scu128wheel rather than the default cu118. See Issue #243.
ℹ️ Image-to-3D mesh, not video or interactive world model. TRELLIS image-large turns a single image into a 3D asset (Gaussians, radiance field, or textured mesh — exportable as
.glband.ply). It is distinct from thewaypoint-1-5real-time world model (also in our3dvertical) and from the newerTRELLIS.2-4B(Microsoft's 4B-parameter successor with a 24 GB minimum).
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16 GB VRAM (per README, verified on A100 / A6000) | RTX 5090 (32 GB) |
| RAM | 16 GB system RAM | — |
| Storage | ~3.3 GB for model weights (HF tree); ~20 GB total with conda env and CUDA extensions | — |
| Software | CUDA 12.8, Conda, Python 3.10+, PyTorch ≥ 2.7.1 + cu128 | — |
Installation
The default setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast from the TRELLIS README is hard-coded to PyTorch 2.4.0 + CUDA 11.8 and will fail on Blackwell. The steps below follow the community-tested Blackwell path documented in maepopi's fork README and corroborated in TRELLIS Issue #243. Note that maepopi's fork (PR #257) was closed without merge — the upstream README still does not contain Blackwell instructions, so this is the de-facto community recipe.
1. Verify CUDA 12.8 toolkit
nvcc --version
# Expected: release 12.8 or newer
If nvcc reports an older release, install CUDA Toolkit 12.8 before continuing.
2. Clone the repo
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git
cd TRELLIS
3. Run partial setup (skip xformers / diffoctreerast / kaolin — we'll build those from source)
. ./setup.sh --new-env --basic --flash-attn --spconv --nvdiffrast
This creates a fresh trellis conda env, installs basic Python dependencies, builds flash-attn, installs spconv-cu120, and builds nvdiffrast. Activate the env if not already active: conda activate trellis.
4. Replace torch with cu128 wheel (sm_120 support)
PyTorch 2.7.0+ shipped pre-built CUDA 12.8 wheels with native sm_120 support; install the latest nightly to be safe:
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
This replaces the torch installed in step 3. CUDA extensions built in step 3 may need to be rebuilt against the new torch; if you hit an undefined symbol error later, rebuild the offending extension.
5. Build xformers from source
The PyPI xformers wheels lag behind PyTorch nightly; build against your installed torch:
git clone --recurse-submodules https://github.com/facebookresearch/xformers.git
cd xformers
pip install -e .
cd ..
6. Build diffoctreerast from source
mkdir -p /tmp/extensions
git clone --recurse-submodules https://github.com/JeffreyXiang/diffoctreerast.git /tmp/extensions/diffoctreerast
pip install /tmp/extensions/diffoctreerast
7. Install kaolin v0.18.0+ (sm_120 support)
NVIDIA Kaolin maintainer Caenorst confirmed kaolin v0.18.0 supports the latest PyTorch / CUDA versions (Issue #243 comment):
git clone https://github.com/NVIDIAGameWorks/kaolin
cd kaolin
export IGNORE_TORCH_VER=1
pip install "Cython >= 0.29.37"
pip install -e .
cd ..
If pip install -e . fails on a cuda_post_cflags / Unknown CUDA arch ("12.0") or GPU not supported error, ensure you're on kaolin master (v0.18.0+); older releases hard-coded the supported arch list.
8. Install Gradio demo dependencies (optional but recommended)
. ./setup.sh --demo
9. Re-pin torchvision (the demo setup may downgrade it)
pip uninstall -y torchvision
pip install --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
Running
Verify the install with the upstream example.py — it downloads the weights from HuggingFace on first run (~3.3 GB to ~/.cache/huggingface/hub/):
python example.py
You should get five files in the working directory:
sample_gs.mp4— turntable video of the 3D Gaussian representationsample_rf.mp4— turntable of the radiance fieldsample_mesh.mp4— turntable of the normal-shaded meshsample.glb— textured GLB exportable to Blender / Unity / web viewerssample.ply— raw 3D Gaussian point cloud
For the interactive Gradio demo:
python app.py
Then open the URL it prints (default http://127.0.0.1:7860). The demo lets you drop in a single image, runs the same TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large") pipeline, and previews the Gaussian / radiance / mesh outputs side-by-side.
Results
- Speed: No RTX 5090–named TRELLIS measurement has been published. Once a community benchmark lands via /contribute, this section will pick it up. For now, see /check/trellis-image-large/rtx-5090 for the live data.
- VRAM usage: The canonical TRELLIS README states "An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs." The 5090's 32 GB envelope comfortably exceeds that floor with ~16 GB of headroom — see "Spending the headroom" below for productive uses of the spare capacity.
- Quality notes: TRELLIS image-large is a 1.2B-parameter SLAT (Structured LATent) flow model — see the arXiv paper for the architecture. It outputs three representations from one pass; the GLB export from
postprocessing_utils.to_glb(...)is the most directly usable downstream artifact (drop into Blender, Three.js, or any GLTF-aware viewer).
For the full benchmark data, see /check/trellis-image-large/rtx-5090.
Spending the headroom
A 5090 (32 GB) is roughly 2× over-provisioned for the canonical TRELLIS image-large workload (16 GB minimum per README). Concrete uses for the spare ~16 GB:
- Larger texture maps. The
postprocessing_utils.to_glb(...)call in example.py defaults totexture_size=1024; bump to2048or4096for higher-fidelity surface detail on the exported.glb. Texture baking is where VRAM pressure spikes in the mesh stage. - Skip the offload forks. Community forks like 0lento/TRELLIS (8 GB target) and the FP16 TRELLIS-BOX (~50% VRAM reduction) stream models between CPU and GPU to fit smaller cards — on a 5090 you can keep everything resident for faster per-image throughput.
- Multi-image conditioning. Use the multi-image conditioning support added 2024-12-18 to condition on 2-4 input views simultaneously; each extra view costs more VRAM but produces noticeably more consistent geometry.
- Colocate with an image generator. Pair TRELLIS with a smaller image-gen model (e.g. flux-2-klein-4b at ~9 GB FP8) on the same card to build a text→image→3D pipeline without a second GPU.
Troubleshooting
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation
The pre-built PyTorch 2.4.0 wheel that setup.sh --basic installs is compiled for CUDA 11.8 and predates Blackwell. The fix is step 4 above — install PyTorch from the nightly/cu128 index. The canonical tracking thread is Issue #243, which collects working install paths from multiple contributors (maepopi, SanBingYouYong, zhizdev).
Unknown CUDA arch ("12.0") or GPU not supported
Reported by Polytoo on Issue #243 — fires when an installed extension's bundled torch.utils.cpp_extension doesn't recognize compute_120. Rebuild the offending extension after step 4: usually kaolin (step 7) or xformers (step 5). Make sure you're on the upstream master of each (kaolin v0.18.0+, xformers latest) — older tagged releases pre-date Blackwell.
flash_attn import fails after step 4 (undefined symbol: _ZN3c105ErrorC...)
Pinned PyTorch nightly often breaks flash_attn ABI compatibility. Either rebuild flash_attn from source against the installed torch, or set the xformers backend before importing TRELLIS:
import os
os.environ['ATTN_BACKEND'] = 'xformers' # before any TRELLIS import
TRELLIS supports both flash-attn and xformers attention backends — see the Minimal Example at the top of the upstream README. FlashAttention 2 itself does not currently ship sm_120 kernels — coverage tracked at Dao-AILab/flash-attention#2168. The xformers backend works on Blackwell.
GLIBCXX_3.4.30 not found at import time
conda install -c conda-forge libstdcxx-ng
The system libstdc++ shipped with older Ubuntu LTS lags the version Caffe2 / PyTorch nightly needs. The conda-forge package is the safe override.
Tremendous VRAM allocation request (Tried to allocate 196.89 GiB)
Issue #79 — diffoctreerast can mis-size its allocation when given certain input image shapes (transparency / unusual aspect ratios). Pre-process input images to a square aspect ratio (the upstream app.py does this automatically; if calling pipeline.run directly, mirror its preprocessing).
Windows install
Windows is documented as not fully tested by Microsoft — see Issue #3. Issue #259 collects a community Windows installer with pre-compiled libraries for RTX 50-series. For first-class Blackwell support on Windows, that installer is currently the path of least resistance; the steps above target Linux.