self-hosted/ai
§01·recipe · video

LTX Video 2.3 on Apple M2 Pro (16 GB): experimental 22B audio-video in unified memory via the MLX --low-ram port

videoadvanced14GB+ VRAMJun 25, 2026

This advanced recipe sets up LTX Video 2.3 on the Apple M2 Pro, needing about 14 GB of VRAM.

models
tools
prerequisites
  • Apple M2 Pro with 16GB unified memory — the tightest config this model runs on, and only in a streaming low-RAM mode
  • macOS Sonoma 14 or Sequoia 15+ (for the unified-memory wired-limit raise)
  • Python 3.11+ and ffmpeg for the MLX path
  • ~19GB free disk space for the distilled int4 / 12GB-build MLX weights

What You'll Build

Generate short text-to-video and image-to-video clips — with joint audio — locally on a 16 GB Apple M2 Pro using LTX Video 2.3, the 22B-parameter audio-video DiT from Lightricks. This is the one open video model with a real, cited Apple-Silicon path that stretches down to a 16 GB Mac, and it sits at the very edge of what the chip can hold: it only runs through a block-streaming, low-RAM MLX pipeline, and even then expect preview-quality output, long generation times, low resolution, and constant memory pressure. The lead path is the community ltx-2-mlx runner with its --low-ram block-streaming flag, paired with an int4 / mixed-precision build of the weights.

Hardware data: Apple M2 Pro (16GB unified memory, ~200 GB/s, ~10.5GB GPU-addressable by default) · LTX-2.3 22B distilled int4 · See benchmark data

⚠️ Experimental / preview-quality only — and the tightest Apple config we document. Video is the weakest local-AI tier on Apple Silicon, and 16 GB is the floor for it. The 22B transformer alone is ~9–11 GB quantized; on a 16 GB Mac (only ~10.5 GB is GPU-addressable by default) it does not fit without block-streaming weights from disk. You must use the --low-ram streaming mode (or a purpose-built low-RAM build) and raise the unified-memory wired limit. Treat every clip as a preview: keep resolution and frame count small, and expect minutes-per-clip with the system under memory pressure the whole time.

ℹ️ No Draw Things, no int8, no big-Metal path here. On a 64 GB M2 Max this model also runs in Draw Things' native Metal app and at int8 — both of those want far more memory than a 16 GB Mac has and are dropped from this recipe. On the M2 Pro the only viable path is the smallest MLX build with block streaming. There is also no first-party Apple benchmark for this pair yet (/check/ltx-video-2-3/m2-pro returns unknown), so no speed figure is quoted below.

Requirements

ComponentMinimumTested
GPU / chipApple Silicon, 16GB unified memory (streaming low-RAM mode)Apple M2 Pro (16GB unified memory)
Unified memory16GB — raise the GPU wired limit; this is the floor, not headroom16GB
Storage~19GB (distilled int4 / 12GB-build MLX weights)~19GB
SoftwaremacOS Sonoma 14+ · Python 3.11+ · ffmpegmacOS Sequoia 15

There is no CUDA, no FlashAttention, no torch.compile, and no FP8 path on this platform — skip every pip install flash-attn step and every --use-cuda / cu128 wheel index you'll see in the NVIDIA LTX guides, and do not download the FP8 checkpoints (they do not run on Metal — see Troubleshooting). The Apple-native MLX path below replaces that entire stack.

Installation

The lead path is the pure-MLX ltx-2-mlx runner (community, not official Lightricks) with its block-streaming --low-ram mode, which is what makes the 22B model fit a 16 GB Mac at all.

1. Install the MLX runner

# Clone and set up the MLX port (community, native Apple Silicon — no CUDA)
git clone https://github.com/dgrauet/ltx-2-mlx.git
cd ltx-2-mlx
uv sync --all-extras

The runner's own Requirements line states the memory tiers explicitly: "32GB+ RAM recommended (int8) or 16GB+ with --low-ram. 16GB minimum (int4 without streaming)" (dgrauet/ltx-2-mlx README). The --low-ram flag "stream[s] transformer blocks from disk so q8 fits 16 GB Macs" per the same README — that block-streaming behaviour is the load-bearing capability for this recipe.

2. Choose a 16 GB-class weight build

Two cited builds fit the M2 Pro's envelope. Either works with the runner above:

  • dgrauet/ltx-2.3-mlx-q4 — the int4 conversion. The distilled transformer is 11.32 GB on disk, plus a 6.34 GB connector and small VAE/upscaler/vocoder files (dgrauet/ltx-2.3-mlx-q4 file tree). Pair it with --low-ram so the transformer streams rather than fully resident.
  • baa-ai/LTX-2.3-22B-RAM-12GB-MLX — a purpose-built mixed-precision (avg 4.6-bit) build whose model card lists a ~9.3 GB transformer, ~18 GB total, and a requirement of just "~14 GB unified memory" (baa-ai/LTX-2.3-22B-RAM-12GB-MLX). It is the explicit low-RAM sibling of baa-ai/LTX-2.3-22B-RAM-24GB-MLX, and runs on the same ltx-2-mlx runner.

Both builds are based on Lightricks/LTX-2.3 and carry the ltx-2-community-license-agreement license (Lightricks/LTX-2.3).

Running

Use the --distilled pipeline (the fastest mode — it is DistilledPipeline, the distilled-only two-stage path, and is classified Stable in the runner's maturity doc) with --low-ram and a small resolution / frame count. Frame count must follow the model's 32k + 1 rule (33, 65, 97 …) (baa-ai 12GB card).

# int4 build + block streaming — the 16 GB M2 Pro path. Keep it small.
ltx-2-mlx generate \
  --prompt "A serene mountain lake at sunrise, mist over calm water" \
  --distilled --low-ram \
  --model dgrauet/ltx-2.3-mlx-q4 \
  -H 480 -W 704 --frames 65 \
  -o lake.mp4

The command writes an .mp4 with joint audio. On a 16 GB Mac, leave nothing else running, keep the resolution and frame count low, and watch Activity Monitor's Memory-Pressure gauge — this build runs right up against the chip's ceiling. Subcommands flagged [beta] or [experimental] in --help (a2v, retake/extend, lipdub) may have known quality limitations; the --distilled generate path is the stable lead.

A reference stock-PyTorch (MPS) pipeline from the Lightricks model card also exists, but it is the slowest and most fragile option on a Mac (no torch.compile, op-by-op MPS fallbacks, and a known noise bug on recent torch builds — see Troubleshooting). Use the MLX --low-ram path instead.

Results

  • Speed: Omitted — no first-party benchmark exists for this pair. Our /check data has no Apple measurement here (/check/ltx-video-2-3/m2-pro returns unknown). The M2 Pro's ~200 GB/s memory bandwidth is the lowest of any Apple chip we cover (half the M2/M3 Max's ~400 GB/s), so any timing you may have seen for a larger Mac does not apply — and under block streaming, throughput is gated by disk I/O on top of that. If you run this, please contribute your numbers so the next reader gets a real M2 Pro figure.
  • Unified-memory usage: The low-RAM builds are designed to keep the working set near the chip's limit, not under it comfortably — the baa-ai 12GB build documents a "~14 GB unified memory" requirement, and the int4 dgrauet build relies on --low-ram block streaming to fit. On a 16 GB M2 Pro (only ~10.5 GB GPU-addressable by default) this means you must raise the wired limit (Troubleshooting), keep clips short, and expect sustained memory pressure.
  • Quality notes: Preview-grade. Distilled / heavily-quantized checkpoints trade fidelity for fewer steps and lower memory; the community thread for the LTX family reports a practical frame-count ceiling before noise appears (one user noted a limit around 4 seconds on a Mac, Lightricks/LTX-Video discussion #26). The CUDA-only IC-LoRAs that sharpen the NVIDIA output are unavailable here.

For the full benchmark data, see /check/ltx-video-2-3/m2-pro.

Troubleshooting

Out of memory / the model won't load on 16 GB

A 16 GB Mac only exposes ~10.5 GB to the GPU by default (Metal's recommendedMaxWorkingSetSize is ~66% of unified memory on sub-64 GB machines). The 22B transformer exceeds that even quantized, so two things are mandatory: (1) use a low-RAM build with --low-ram block streaming, and (2) raise the GPU wired limit before generating (macOS Sonoma 14 / Sequoia 15+):

# Allow the GPU to wire down ~13GB on a 16GB Mac. Leaves only ~3GB for macOS — risky.
sudo sysctl iogpu.wired_limit_mb=13312
# Reset to default afterwards:
sudo sysctl iogpu.wired_limit_mb=0

This is temporary and resets on reboot. Pushing the wired limit this high on a 16 GB Mac leaves very little for macOS — close every other app, watch Activity Monitor's Memory-Pressure gauge, and back off (lower frames/resolution, or =0) the moment it pushes toward red. On older macOS (Monterey/Ventura) the key is debug.iogpu.wired_limit in bytes instead.

fp8 weights fail on Metal

fp8-quantized checkpoints (common in the NVIDIA workflow) do not run on Apple's Metal backend — a community user in the LTX-2 Mac thread notes "fp8 models don't work on metal...change the model to b16 or a gguf" (Lightricks/LTX-2 discussion #43). Use the MLX int4 / mixed-precision builds linked above, never an fp8 file. There is no FP8 or NVFP4 tensor-core hardware on Apple Silicon to accelerate those formats anyway.

Noisy / garbage video on the stock PyTorch (MPS) path

If you fall back to the reference Diffusers/MPS path rather than the MLX runner, recent PyTorch builds can produce noise on Apple Silicon. A community user reported "torch 2.5 or above would give me noise" […] "Backed down to 2.4.1 solved the problem" (Lightricks/LTX-Video discussion #26). The simplest fix is to avoid the stock-PyTorch path entirely and use the MLX --low-ram port, whose engine doesn't hit this bug.

It's slow

Expected, and more so than on a larger Mac. Video generation on Apple Silicon is minutes-per-clip with no torch.compile acceleration, and on the M2 Pro you're adding block-streaming disk I/O on top of the slowest unified-memory bandwidth (~200 GB/s) in the lineup. Use the distilled int4 weights, keep resolution and frame count low, and treat the output as a preview. Report your timings via the submission form so we can seed a real M2 Pro benchmark.

common questions
How much VRAM does LTX Video 2.3 need?

About 14 GB — the minimum this recipe targets.

Which GPUs is LTX Video 2.3 tested on?

Apple M2 Pro (16 GB).

How hard is this setup?

Advanced — follow the steps above.