What You'll Build
Generate high-fidelity textured 3D assets from a single image locally with TRELLIS.2-4B — Microsoft's 4-billion-parameter image-to-3D flow-matching transformer — on an Apple M4 Max, with no NVIDIA GPU and no CUDA. The model outputs PBR-ready meshes (base color, roughness, metallic) at voxel-grid resolutions from 512³ up to 1536³, using its O-Voxel sparse representation (Microsoft model card: "a state-of-the-art large 3D generative model designed for high-fidelity image-to-3D generation", "O-Voxel", "Parameters: 4 Billion", "Resolution: Varies from 512³ to 1536³"). The path to Apple Silicon is a community Metal/MPS port, not Microsoft's official repo.
Hardware data: Apple M4 Max (48 GB unified memory, ~32 GB GPU-addressable) · 4B image-to-3D pipeline, ~18 GB weights on disk · See benchmark data
⚠️ The Apple path is a community Metal port, not official Microsoft support. Microsoft's canonical TRELLIS.2 repo is CUDA-only — it builds
flex_gemmandnvdiffrastand assumes an NVIDIA GPU. The recipe below usesshivampkumar/trellis-mac, a community port whose README line 1 reads "TRELLIS.2 for Apple Silicon" and which describes itself as "a port of Microsoft's TRELLIS.2 — a state-of-the-art image-to-3D model — from CUDA-only to Apple Silicon via PyTorch MPS. No NVIDIA GPU required." It works (a real M-series run is cited below) but is community-maintained — treat it accordingly.
Variant pin — this is v2, not v1. This recipe targets TRELLIS.2-4B at canonical slug
microsoft/TRELLIS.2-4B(note the dot inTRELLIS.2). It is NOT the oldermicrosoft/TRELLIS-image-large(the 2024 v1, a different architecture). The distinction is load-bearing on Apple: v1 depends onspconv, which has no Metal port, so v1 has no working Apple path. v2's sparse-conv kernel (flex_gemm) does have a Metal replacement (mtlgemm), which is why an Apple port exists for v2 only.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU / memory | 24 GB+ unified memory recommended (per the port README) | Apple M4 Max (48 GB unified memory, ~32 GB GPU-addressable) |
| RAM | Same pool — unified | 48 GB unified |
| Storage | ~18 GB (TRELLIS.2-4B 15.12 GiB + DINOv3 1.21 GB + RMBG-2.0 0.88 GB) | ~18 GB |
| Software | macOS on Apple Silicon (M1+), Python 3.11+, Xcode Metal Toolchain | — |
The binding constraint on Apple Silicon is addressable unified memory, not raw capacity. The M4 Max's 48 GB of unified memory is shared by CPU and GPU; on a sub-64 GB Mac macOS lets the GPU address roughly two-thirds of it by default — about ~32 GB safe (~36 GB optimistic) via Metal's recommendedMaxWorkingSetSize. The TRELLIS.2-4B checkpoints total 15.12 GiB on disk (HF tree API) and the port also downloads the DINOv3 image encoder (1.21 GB) and the RMBG-2.0 background remover (0.88 GB) — about 18 GB of weights in all. The port itself reports that "Memory usage peaks at around 18GB unified memory during generation" (README Performance notes). At ~18 GB peak against the M4 Max's ~32 GB default GPU-addressable pool, this pipeline fits comfortably and needs no iogpu.wired_limit_mb tuning for the default 512 pipeline.
Installation
The install path is the shivampkumar/trellis-mac port. There is nothing CUDA-shaped to install — no cu12x PyTorch wheel, no flash-attn build, no flex_gemm/nvdiffrast compile. The port replaces each CUDA kernel with a Metal/MPS equivalent (see What Was Ported): flex_gemm → mtlgemm, nvdiffrast → mtldiffrast, flash_attn → PyTorch SDPA, with a cumesh decode step skipped in favour of fast_simplification.
1. Clone the port
git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac
2. Download the Xcode Metal Toolchain (for the accelerated texture baker)
xcodebuild -downloadComponent MetalToolchain
Per the README, this lets setup.sh build the Metal-accelerated texture baker. Without it, setup falls back to a pure "Python KDTree baker (slower, slightly lower quality)." To skip the Metal build entirely (e.g. on older hardware), prefix setup with SKIP_METAL=1 (see step 4).
3. Log into Hugging Face and request the gated weights
hf auth login
TRELLIS.2-4B itself is MIT-licensed and ungated, but the port also downloads two gated auxiliary models. Request access (the README notes "usually instant approval" for these) before running setup:
facebook/dinov3-vitl16-pretrain-lvd1689m— the DINOv3 image encoder (gated: Meta custom license, review before commercial use)briaai/RMBG-2.0— the background remover (gated; license CC BY-NC 4.0)
4. Run setup (creates the venv, installs deps, clones & patches TRELLIS.2, builds the Metal backends)
bash setup.sh
source .venv/bin/activate
To skip the Metal build and use the pure-Python baker:
SKIP_METAL=1 bash setup.sh
source .venv/bin/activate
setup.sh pre-clones its Git dependencies into deps/ so all network I/O happens up front; the TRELLIS.2-4B / DINOv3 / RMBG-2.0 weights (~18 GB) download on first generation and cache locally.
Running
After activating the environment, generate a 3D model from a single image:
python generate.py path/to/image.png
With options (verbatim from the port's Usage section):
python generate.py photo.png --seed 123 --output my_model --pipeline-type 512
Documented flags: --seed (default 42), --output (default output_3d), --pipeline-type (512, 1024, 1024_cascade; default 512), --texture-size (512, 1024, 2048; default 1024), and --no-texture to export geometry only. The output is a GLB with base-color, metallic, and roughness textures, written to the current directory (or the --output path). The default 512 pipeline is the lightest path and the one the cited reference run used.
Results
- Speed: No first-party Apple M4 Max benchmark for this pair exists yet — /check/trellis-2/m4-max returns
verdict: unknownwith zero measurements. We deliberately do not quote an M4 Max time. The port's README cites "~5 minutes 13 seconds on M4 Pro" (24GB, cold start, weights cached, cool machine, pipeline type512) (README). That confirms the pipeline runs on M4-generation Apple Silicon — but the M4 Pro is a different chip than the M4 Max (the M4 Max has substantially higher memory bandwidth, ~546 GB/s vs the M4 Pro's lower figure), so that timing is an existence-proof that the path works, not an M4 Max throughput number. If you run this on an M4 Max, please contribute your timing so we can seed a real datapoint. - Memory usage: The port reports generation "peaks at around 18GB unified memory" (README Performance notes) — TRELLIS.2-4B + DINOv3 + RMBG-2.0 resident, plus generation/baking activations and the sparse-voxel structures. The M4 Max's ~32 GB default GPU-addressable pool clears this with headroom; the port recommends 24 GB+ unified memory and the cited run fit a 24 GB machine.
- Quality notes: TRELLIS.2-4B outputs PBR-ready meshes (base color, roughness, metallic, opacity) at 512³–1536³ (model card). The port's reference output is "400K+ vertex meshes with baked PBR textures". Note two community-port quality caveats from the What Was Ported table: SDPA attention is "padded, not fused — room for improvement", and the
cumeshdecode step is skipped on meshes large enough to crash the Metal port (replaced withfast_simplificationbefore baking). Without the Metal Toolchain, texture baking falls back to a pure-Python KDTree baker that is "slower, slightly lower quality" — the README measures the bake taking ~15s instead of ~11s, with coverage near UV chart boundaries slightly softer.
For the full benchmark data (and to be the first to populate it), see /check/trellis-2/m4-max.
Troubleshooting
Tried to install flash-attn / a cu12x wheel / build flex_gemm or nvdiffrast — it failed
None of those apply on Apple Silicon. There is no CUDA, no FlashAttention, and no flex_gemm/nvdiffrast build on a Mac. The shivampkumar/trellis-mac port replaces every CUDA kernel with a Metal/MPS equivalent (mtlgemm, mtldiffrast, PyTorch SDPA) and patches all .cuda() calls to use the active MPS device. If a generic TRELLIS.2 tutorial tells you to pip install flash-attn, install a cu128 PyTorch wheel, or compile flex_gemm for sm_120, skip those steps entirely — bash setup.sh from the port is the complete Apple path.
"Access to model … is restricted" when weights download
DINOv3 and RMBG-2.0 are gated on Hugging Face. Run hf auth login and accept the license on each model page before generating: DINOv3 ViT-L and RMBG-2.0 (the README notes "usually instant approval"). TRELLIS.2-4B itself is ungated and MIT-licensed, so it downloads without an access step. Licensing note: while TRELLIS.2-4B is MIT, RMBG-2.0 is CC BY-NC 4.0 (non-commercial) — because it sits in the default pipeline as the background remover, 3D assets you generate this way are encumbered for non-commercial use unless you swap in a different background-removal step or obtain BRIA's commercial agreement.
Texture baking is slow or lower quality than expected
The Metal-accelerated texture baker requires the Xcode Metal Toolchain (xcodebuild -downloadComponent MetalToolchain). If you ran SKIP_METAL=1 bash setup.sh or the toolchain was unavailable, the port falls back to a pure-Python KDTree baker (xatlas UV unwrap + scipy cKDTree inverse-distance weighting), which the README calls "slower, slightly lower quality". Re-run setup with the toolchain installed to enable the Metal stack (mtldiffrast / mtlbvh / mtlmesh).
Mesh extraction or BVH instability on very large meshes
The port pre-simplifies the decoder mesh to ~200K faces with fast_simplification before handing it to the Metal BVH, because "the BVH builder is unstable on 800K+ face inputs" (What Was Ported). If you hit instability at higher pipeline resolutions, stay on the default --pipeline-type 512, or remove deps/ and re-run bash setup.sh to reset local clone state (a documented recovery step in the README).
No other widely-reported issues on this Apple port. Report problems via the submission form.