Mistral Small 3.2 24B
Mistral Small 3.2 24B (release 2506) is Mistral AI's dense 24-billion-parameter instruction-tuned model — the newest generalist in the Small line, superseding 3.1 (2503). Built on the Mistral 3 architecture with a Pixtral vision tower, it accepts text and images and answers in text, with a 128K-token context window and the Tekken tokenizer. Licensed Apache-2.0 (commercial use permitted). Mistral reports gains over 3.1 in instruction-following, function-calling and reduced repetition (Wildbench v2 65.3%, Arena Hard v2 43.1%), with strong STEM/coding scores (MMLU Pro 69.1%, MATH 69.4%, HumanEval Plus 92.9%) and coverage across 23 languages. No first-party GGUF ships; the community bartowski and unsloth GGUF builds run on current llama.cpp out of the box (Q4_K_M ~14.3 GB fits a 16 GB card; Q6_K/Q8_0 for 24 GB+). Vision input needs the separate mmproj projector file.
| GPU | VRAM | Series | Best speed | Min VRAM | Works | Benchmarks | Recipe | |
|---|---|---|---|---|---|---|---|---|
| Apple M2 Max | 64GB | apple | ~ | 0 | recipe | check ↗ | ||
| Apple M3 Max | 48GB | apple | ~ | 0 | recipe | check ↗ | ||
| RTX 3090 | 24GB | 30 | ~ | 0 | recipe | check ↗ | ||
| RTX 3090 Ti | 24GB | 30 | ~ | 0 | recipe | check ↗ | ||
| RTX 4080 | 16GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 4090 | 24GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 5090 | 32GB | 50 | ~ | 0 | recipe | check ↗ | ||
| RX 7900 XTX | 24GB | amd | ~ | 0 | recipe | check ↗ |
✓ benchmarked·~ runs via recipe (not benchmarked)·— untested·✕doesn't fit