self-hosted/ai
§01·recipe · 3d

TRELLIS.2-4B on Apple M2 Max: image-to-3D in unified memory via the community Metal port

3dadvanced17GB+ VRAMJun 22, 2026

This advanced recipe sets up TRELLIS.2-4B on the Apple M2 Max, needing about 17 GB of VRAM.

models
tools
prerequisites
  • Apple M2 Max with 64 GB unified memory (~48 GB GPU-addressable) — the README recommends 24 GB+ unified memory; the 64 GB M2 Max clears this with room to spare
  • macOS on Apple Silicon (M1 or later), with the Xcode Metal Toolchain for the accelerated texture baker
  • Python 3.11+
  • ~18 GB free disk for TRELLIS.2-4B (15.12 GiB) + DINOv3 (1.21 GB) + RMBG-2.0 (0.88 GB)
  • A Hugging Face account with access granted to the gated DINOv3 and RMBG-2.0 weights

What You'll Build

Generate high-fidelity textured 3D assets from a single image locally with TRELLIS.2-4B — Microsoft's 4-billion-parameter image-to-3D flow-matching transformer — on an Apple M2 Max, with no NVIDIA GPU and no CUDA. The model outputs PBR-ready meshes (base color, roughness, metallic) at voxel-grid resolutions from 512³ up to 1536³, using its O-Voxel sparse representation (Microsoft model card: "a state-of-the-art large 3D generative model designed for high-fidelity image-to-3D generation", "O-Voxel", "Parameters: 4 Billion", "Resolution: Varies from 512³ to 1536³"). The path to Apple Silicon is a community Metal/MPS port, not Microsoft's official repo.

Hardware data: Apple M2 Max (64 GB unified memory, ~48 GB GPU-addressable) · 4B image-to-3D pipeline, ~18 GB weights on disk · See benchmark data

⚠️ The Apple path is a community Metal port, not official Microsoft support. Microsoft's canonical TRELLIS.2 repo is CUDA-only — it builds flex_gemm and nvdiffrast and assumes an NVIDIA GPU. The recipe below uses shivampkumar/trellis-mac (412★, MIT), a community port whose README line 1 reads "TRELLIS.2 for Apple Silicon" and which describes itself as "a port of Microsoft's TRELLIS.2 — a state-of-the-art image-to-3D model — from CUDA-only to Apple Silicon via PyTorch MPS. No NVIDIA GPU required." It works (a real M-series run is cited below) but is community-maintained — treat it accordingly.

Variant pin — this is v2, not v1. This recipe targets TRELLIS.2-4B at canonical slug microsoft/TRELLIS.2-4B (note the dot in TRELLIS.2). It is NOT the older microsoft/TRELLIS-image-large (the 2024 v1, a different architecture). The distinction is load-bearing on Apple: v1 depends on spconv, which has no Metal port, so v1 has no working Apple path. v2's sparse-conv kernel (flex_gemm) does have a Metal replacement (mtlgemm), which is why an Apple port exists for v2 only.

Requirements

ComponentMinimumTested
GPU / memory24 GB+ unified memory recommended (per the port README)Apple M2 Max (64 GB unified memory, ~48 GB GPU-addressable)
RAMSame pool — unified64 GB unified
Storage~18 GB (TRELLIS.2-4B 15.12 GiB + DINOv3 1.21 GB + RMBG-2.0 0.88 GB)~18 GB
SoftwaremacOS on Apple Silicon (M1+), Python 3.11+, Xcode Metal Toolchain

The binding constraint on Apple Silicon is addressable unified memory, not raw capacity. The M2 Max's 64 GB of unified memory is shared by CPU and GPU; macOS lets the GPU address roughly 75% of it by default (~48 GB via Metal's recommendedMaxWorkingSetSize). The TRELLIS.2-4B checkpoints total 15.12 GiB on disk (HF tree API) and the port also downloads the DINOv3 image encoder (1.21 GB) and the RMBG-2.0 background remover (0.88 GB) — about 17 GB of weights in all. The cited reference run (README Results) was made on a 24 GB M4 Pro, so the 64 GB M2 Max — with ~48 GB GPU-addressable — fits this pipeline comfortably and needs no wired_limit tuning for the default 512 pipeline.

Installation

The install path is the shivampkumar/trellis-mac port. There is nothing CUDA-shaped to install — no cu12x PyTorch wheel, no flash-attn build, no flex_gemm/nvdiffrast compile. The port replaces each CUDA kernel with a Metal/MPS equivalent (see What Was Ported): flex_gemmmtlgemm, nvdiffrastmtldiffrast, flash_attn → PyTorch SDPA, with a cumesh decode step skipped in favour of fast_simplification.

1. Clone the port

git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac

2. Download the Xcode Metal Toolchain (for the accelerated texture baker)

xcodebuild -downloadComponent MetalToolchain

Per the README, this lets setup.sh build the Metal-accelerated texture baker. Without it, setup "falls back to a pure Python KDTree baker (slower, slightly lower quality)." To skip the Metal build entirely (e.g. on older hardware), prefix setup with SKIP_METAL=1 (see step 4).

3. Log into Hugging Face and request the gated weights

hf auth login

TRELLIS.2-4B itself is MIT-licensed and ungated, but the port also downloads two gated auxiliary models. Request access (the README notes "usually instant approval" for these) before running setup:

4. Run setup (creates the venv, installs deps, clones & patches TRELLIS.2, builds the Metal backends)

bash setup.sh
source .venv/bin/activate

To skip the Metal build and use the pure-Python baker:

SKIP_METAL=1 bash setup.sh
source .venv/bin/activate

setup.sh pre-clones its Git dependencies into deps/ so all network I/O happens up front; the TRELLIS.2-4B / DINOv3 / RMBG-2.0 weights (~18 GB) download on first generation and cache locally.

Running

After activating the environment, generate a 3D model from a single image:

python generate.py path/to/image.png

With options (verbatim from the port's Usage section):

python generate.py photo.png --seed 123 --output my_model --pipeline-type 512

Documented flags: --seed (default 42), --output (default output_3d), --pipeline-type (512, 1024, 1024_cascade; default 512), --texture-size (512, 1024, 2048; default 1024), and --no-texture to export geometry only. The output is a GLB with base-color, metallic, and roughness textures, written to the current directory (or the --output path). The default 512 pipeline is the lightest path and the one the cited reference run used.

Results

  • Speed: No first-party Apple M2 Max benchmark for this pair exists yet — /check/trellis-2/m2-max returns verdict: unknown with zero measurements. We deliberately do not quote an M2 Max time. The port's README cites "~5 minutes 13 seconds on M4 Pro (24GB, cold start, weights cached, cool machine, pipeline type 512)" (README Results) — but the M4 Pro is a different chip (different memory bandwidth) than the M2 Max, so that figure is an existence-proof that the pipeline runs on Apple Silicon, not an M2 Max throughput number. If you run this on an M2 Max, please contribute your timing so we can seed a real datapoint.
  • Memory usage: ~18 GB of weights resident (TRELLIS.2-4B + DINOv3 + RMBG-2.0), plus generation/baking activations and the sparse-voxel structures. The M2 Max's ~48 GB default GPU-addressable pool clears this with headroom; the port recommends 24 GB+ unified memory and the cited run fit a 24 GB machine.
  • Quality notes: TRELLIS.2-4B outputs PBR-ready meshes (base color, roughness, metallic, opacity) at 512³–1536³ (model card). The port's reference output is "400K+ vertex meshes with baked PBR textures". Note two community-port quality caveats from the What Was Ported table: SDPA attention is "padded, not fused — room for improvement", and the cumesh decode step is skipped on meshes large enough to crash the Metal port (replaced with fast_simplification before baking). Without the Metal Toolchain, texture baking falls back to a pure-Python KDTree baker that is "slower, slightly lower quality."

For the full benchmark data (and to be the first to populate it), see /check/trellis-2/m2-max.

Troubleshooting

Tried to install flash-attn / a cu12x wheel / build flex_gemm or nvdiffrast — it failed

None of those apply on Apple Silicon. There is no CUDA, no FlashAttention, and no flex_gemm/nvdiffrast build on a Mac. The shivampkumar/trellis-mac port replaces every CUDA kernel with a Metal/MPS equivalent (mtlgemm, mtldiffrast, PyTorch SDPA) and patches all .cuda() calls to use the active MPS device. If a generic TRELLIS.2 tutorial tells you to pip install flash-attn, install a cu128 PyTorch wheel, or compile flex_gemm for sm_120, skip those steps entirely — bash setup.sh from the port is the complete Apple path.

"Access to model … is restricted" when weights download

DINOv3 and RMBG-2.0 are gated on Hugging Face. Run hf auth login and accept the license on each model page before generating: DINOv3 ViT-L (manual approval) and RMBG-2.0 (auto approval, usually instant). TRELLIS.2-4B itself is ungated and MIT-licensed, so it downloads without an access step. Licensing note: while TRELLIS.2-4B is MIT, RMBG-2.0 is CC BY-NC 4.0 (non-commercial) — because it sits in the default pipeline as the background remover, 3D assets you generate this way are encumbered for non-commercial use unless you swap in a different background-removal step or obtain BRIA's commercial agreement.

Texture baking is slow or lower quality than expected

The Metal-accelerated texture baker requires the Xcode Metal Toolchain (xcodebuild -downloadComponent MetalToolchain). If you ran SKIP_METAL=1 bash setup.sh or the toolchain was unavailable, the port falls back to a pure-Python KDTree baker (xatlas UV unwrap + scipy cKDTree inverse-distance weighting), which the README calls "slower, slightly lower quality." Re-run setup with the toolchain installed to enable the Metal stack (mtldiffrast / mtlbvh / mtlmesh).

Mesh extraction or BVH instability on very large meshes

The port pre-simplifies the decoder mesh to ~200K faces with fast_simplification before handing it to the Metal BVH, because "the BVH builder is unstable on 800K+ face inputs" (What Was Ported). If you hit instability at higher pipeline resolutions, stay on the default --pipeline-type 512, or remove deps/ and re-run bash setup.sh to reset local clone state (a documented recovery step in the README).

No other widely-reported issues on this Apple port. Report problems via the submission form.

common questions
How much VRAM does TRELLIS.2-4B need?

About 17 GB — the minimum this recipe targets.

Which GPUs is TRELLIS.2-4B tested on?

Apple M2 Max (64 GB).

How hard is this setup?

Advanced — follow the steps above.