Phi-4
Phi-4 is Microsoft's dense 14-billion-parameter model (release 2024-12), the flagship of the Phi-4 family. Text-only, with a 16K-token context window, built on the phi3 architecture and trained heavily on curated and synthetic data for strong STEM, math and reasoning at a small size. Licensed MIT (commercial use permitted). Microsoft reports MMLU 84.8, GPQA 56.1, MATH 80.4, HumanEval 82.6 and MGSM 80.6 — competitive with much larger models on reasoning benchmarks. Microsoft publishes a first-party GGUF (microsoft/phi-4-gguf); community unsloth and bartowski builds add the conventional K_M quant ladder. Loads on current llama.cpp out of the box (Q4_K_M ~8.9 GB fits an 8-12 GB card; Q6_K/Q8_0 for 16 GB+). Early-2025 GGUFs had an EOS/chat-template bug now fixed — use a current build and a recent GGUF.
Download· 5 variants
| GPU | VRAM | Series | Best speed | Min VRAM | Works | Benchmarks | Recipe | |
|---|---|---|---|---|---|---|---|---|
| Apple M2 Pro | 16GB | apple | ~ | 0 | recipe | check ↗ | ||
| Apple M3 Max | 48GB | apple | ~ | 0 | recipe | check ↗ | ||
| RTX 3090 | 24GB | 30 | ~ | 0 | recipe | check ↗ | ||
| RTX 4070 | 12GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 4080 | 16GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 4090 | 24GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 5090 | 32GB | 50 | ~ | 0 | recipe | check ↗ | ||
| RX 7800 XT | 16GB | amd | ~ | 0 | recipe | check ↗ |
✓ benchmarked·~ runs via recipe (not benchmarked)·— untested·✕doesn't fit