What You'll Build
Generate short text-to-video and image-to-video clips — with joint audio — locally on a 16 GB Apple M2 Pro using LTX Video 2.3, the 22B-parameter audio-video DiT from Lightricks. This is the one open video model with a real, cited Apple-Silicon path that stretches down to a 16 GB Mac, and it sits at the very edge of what the chip can hold: it only runs through a block-streaming, low-RAM MLX pipeline, and even then expect preview-quality output, long generation times, low resolution, and constant memory pressure. The lead path is the community ltx-2-mlx runner with its --low-ram block-streaming flag, paired with an int4 / mixed-precision build of the weights.
Hardware data: Apple M2 Pro (16GB unified memory, ~200 GB/s, ~10.5GB GPU-addressable by default) · LTX-2.3 22B distilled int4 · See benchmark data
⚠️ Experimental / preview-quality only — and the tightest Apple config we document. Video is the weakest local-AI tier on Apple Silicon, and 16 GB is the floor for it. The 22B transformer alone is ~9–11 GB quantized; on a 16 GB Mac (only ~10.5 GB is GPU-addressable by default) it does not fit without block-streaming weights from disk. You must use the
--low-ramstreaming mode (or a purpose-built low-RAM build) and raise the unified-memory wired limit. Treat every clip as a preview: keep resolution and frame count small, and expect minutes-per-clip with the system under memory pressure the whole time.
ℹ️ No Draw Things, no int8, no big-Metal path here. On a 64 GB M2 Max this model also runs in Draw Things' native Metal app and at int8 — both of those want far more memory than a 16 GB Mac has and are dropped from this recipe. On the M2 Pro the only viable path is the smallest MLX build with block streaming. There is also no first-party Apple benchmark for this pair yet (/check/ltx-video-2-3/m2-pro returns
unknown), so no speed figure is quoted below.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU / chip | Apple Silicon, 16GB unified memory (streaming low-RAM mode) | Apple M2 Pro (16GB unified memory) |
| Unified memory | 16GB — raise the GPU wired limit; this is the floor, not headroom | 16GB |
| Storage | ~19GB (distilled int4 / 12GB-build MLX weights) | ~19GB |
| Software | macOS Sonoma 14+ · Python 3.11+ · ffmpeg | macOS Sequoia 15 |
There is no CUDA, no FlashAttention, no torch.compile, and no FP8 path on this platform — skip every pip install flash-attn step and every --use-cuda / cu128 wheel index you'll see in the NVIDIA LTX guides, and do not download the FP8 checkpoints (they do not run on Metal — see Troubleshooting). The Apple-native MLX path below replaces that entire stack.
Installation
The lead path is the pure-MLX ltx-2-mlx runner (community, not official Lightricks) with its block-streaming --low-ram mode, which is what makes the 22B model fit a 16 GB Mac at all.
1. Install the MLX runner
# Clone and set up the MLX port (community, native Apple Silicon — no CUDA)
git clone https://github.com/dgrauet/ltx-2-mlx.git
cd ltx-2-mlx
uv sync --all-extras
The runner's own Requirements line states the memory tiers explicitly: "32GB+ RAM recommended (int8) or 16GB+ with --low-ram. 16GB minimum (int4 without streaming)" (dgrauet/ltx-2-mlx README). The --low-ram flag "stream[s] transformer blocks from disk so q8 fits 16 GB Macs" per the same README — that block-streaming behaviour is the load-bearing capability for this recipe.
2. Choose a 16 GB-class weight build
Two cited builds fit the M2 Pro's envelope. Either works with the runner above:
dgrauet/ltx-2.3-mlx-q4— the int4 conversion. The distilled transformer is 11.32 GB on disk, plus a 6.34 GB connector and small VAE/upscaler/vocoder files (dgrauet/ltx-2.3-mlx-q4 file tree). Pair it with--low-ramso the transformer streams rather than fully resident.baa-ai/LTX-2.3-22B-RAM-12GB-MLX— a purpose-built mixed-precision (avg 4.6-bit) build whose model card lists a ~9.3 GB transformer, ~18 GB total, and a requirement of just "~14 GB unified memory" (baa-ai/LTX-2.3-22B-RAM-12GB-MLX). It is the explicit low-RAM sibling ofbaa-ai/LTX-2.3-22B-RAM-24GB-MLX, and runs on the sameltx-2-mlxrunner.
Both builds are based on Lightricks/LTX-2.3 and carry the ltx-2-community-license-agreement license (Lightricks/LTX-2.3).
Running
Use the --distilled pipeline (the fastest mode — it is DistilledPipeline, the distilled-only two-stage path, and is classified Stable in the runner's maturity doc) with --low-ram and a small resolution / frame count. Frame count must follow the model's 32k + 1 rule (33, 65, 97 …) (baa-ai 12GB card).
# int4 build + block streaming — the 16 GB M2 Pro path. Keep it small.
ltx-2-mlx generate \
--prompt "A serene mountain lake at sunrise, mist over calm water" \
--distilled --low-ram \
--model dgrauet/ltx-2.3-mlx-q4 \
-H 480 -W 704 --frames 65 \
-o lake.mp4
The command writes an .mp4 with joint audio. On a 16 GB Mac, leave nothing else running, keep the resolution and frame count low, and watch Activity Monitor's Memory-Pressure gauge — this build runs right up against the chip's ceiling. Subcommands flagged [beta] or [experimental] in --help (a2v, retake/extend, lipdub) may have known quality limitations; the --distilled generate path is the stable lead.
A reference stock-PyTorch (MPS) pipeline from the Lightricks model card also exists, but it is the slowest and most fragile option on a Mac (no torch.compile, op-by-op MPS fallbacks, and a known noise bug on recent torch builds — see Troubleshooting). Use the MLX --low-ram path instead.
Results
- Speed: Omitted — no first-party benchmark exists for this pair. Our
/checkdata has no Apple measurement here (/check/ltx-video-2-3/m2-pro returnsunknown). The M2 Pro's ~200 GB/s memory bandwidth is the lowest of any Apple chip we cover (half the M2/M3 Max's ~400 GB/s), so any timing you may have seen for a larger Mac does not apply — and under block streaming, throughput is gated by disk I/O on top of that. If you run this, please contribute your numbers so the next reader gets a real M2 Pro figure. - Unified-memory usage: The low-RAM builds are designed to keep the working set near the chip's limit, not under it comfortably — the
baa-ai12GB build documents a "~14 GB unified memory" requirement, and the int4dgrauetbuild relies on--low-ramblock streaming to fit. On a 16 GB M2 Pro (only ~10.5 GB GPU-addressable by default) this means you must raise the wired limit (Troubleshooting), keep clips short, and expect sustained memory pressure. - Quality notes: Preview-grade. Distilled / heavily-quantized checkpoints trade fidelity for fewer steps and lower memory; the community thread for the LTX family reports a practical frame-count ceiling before noise appears (one user noted a limit around 4 seconds on a Mac, Lightricks/LTX-Video discussion #26). The CUDA-only IC-LoRAs that sharpen the NVIDIA output are unavailable here.
For the full benchmark data, see /check/ltx-video-2-3/m2-pro.
Troubleshooting
Out of memory / the model won't load on 16 GB
A 16 GB Mac only exposes ~10.5 GB to the GPU by default (Metal's recommendedMaxWorkingSetSize is ~66% of unified memory on sub-64 GB machines). The 22B transformer exceeds that even quantized, so two things are mandatory: (1) use a low-RAM build with --low-ram block streaming, and (2) raise the GPU wired limit before generating (macOS Sonoma 14 / Sequoia 15+):
# Allow the GPU to wire down ~13GB on a 16GB Mac. Leaves only ~3GB for macOS — risky.
sudo sysctl iogpu.wired_limit_mb=13312
# Reset to default afterwards:
sudo sysctl iogpu.wired_limit_mb=0
This is temporary and resets on reboot. Pushing the wired limit this high on a 16 GB Mac leaves very little for macOS — close every other app, watch Activity Monitor's Memory-Pressure gauge, and back off (lower frames/resolution, or =0) the moment it pushes toward red. On older macOS (Monterey/Ventura) the key is debug.iogpu.wired_limit in bytes instead.
fp8 weights fail on Metal
fp8-quantized checkpoints (common in the NVIDIA workflow) do not run on Apple's Metal backend — a community user in the LTX-2 Mac thread notes "fp8 models don't work on metal...change the model to b16 or a gguf" (Lightricks/LTX-2 discussion #43). Use the MLX int4 / mixed-precision builds linked above, never an fp8 file. There is no FP8 or NVFP4 tensor-core hardware on Apple Silicon to accelerate those formats anyway.
Noisy / garbage video on the stock PyTorch (MPS) path
If you fall back to the reference Diffusers/MPS path rather than the MLX runner, recent PyTorch builds can produce noise on Apple Silicon. A community user reported "torch 2.5 or above would give me noise" […] "Backed down to 2.4.1 solved the problem" (Lightricks/LTX-Video discussion #26). The simplest fix is to avoid the stock-PyTorch path entirely and use the MLX --low-ram port, whose engine doesn't hit this bug.
It's slow
Expected, and more so than on a larger Mac. Video generation on Apple Silicon is minutes-per-clip with no torch.compile acceleration, and on the M2 Pro you're adding block-streaming disk I/O on top of the slowest unified-memory bandwidth (~200 GB/s) in the lineup. Use the distilled int4 weights, keep resolution and frame count low, and treat the output as a preview. Report your timings via the submission form so we can seed a real M2 Pro benchmark.