self-hosted/ai
§01·model · /models

Phi-4

llmactiveMIT

Phi-4 is Microsoft's dense 14-billion-parameter model (release 2024-12), the flagship of the Phi-4 family. Text-only, with a 16K-token context window, built on the phi3 architecture and trained heavily on curated and synthetic data for strong STEM, math and reasoning at a small size. Licensed MIT (commercial use permitted). Microsoft reports MMLU 84.8, GPQA 56.1, MATH 80.4, HumanEval 82.6 and MGSM 80.6 — competitive with much larger models on reasoning benchmarks. Microsoft publishes a first-party GGUF (microsoft/phi-4-gguf); community unsloth and bartowski builds add the conventional K_M quant ladder. Loads on current llama.cpp out of the box (Q4_K_M ~8.9 GB fits an 8-12 GB card; Q6_K/Q8_0 for 16 GB+). Early-2025 GGUFs had an EOS/chat-template bug now fixed — use a current build and a recent GGUF.

Download· 5 variants
§02·GPUs that run this model
8 total
GPUVRAMSeriesBest speedMin VRAMWorksBenchmarksRecipe
Apple M2 Pro16GBapple~0recipecheck ↗
Apple M3 Max48GBapple~0recipecheck ↗
RTX 309024GB30~0recipecheck ↗
RTX 407012GB40~0recipecheck ↗
RTX 408016GB40~0recipecheck ↗
RTX 409024GB40~0recipecheck ↗
RTX 509032GB50~0recipecheck ↗
RX 7800 XT16GBamd~0recipecheck ↗

benchmarked·~ runs via recipe (not benchmarked)· untested·doesn't fit