self-hosted/ai
§01·model · /models

Mistral Nemo 12B

llmactiveApache-2.0

Mistral Nemo 12B (Instruct, release 2407) is a dense 12-billion-parameter model built by Mistral AI in collaboration with NVIDIA. Text-only, with a 128K-token context window and the Tekken tokenizer (its first use). Licensed Apache-2.0. Trained with quantization awareness for FP8 inference and tuned for function calling and multilingual use, it was positioned as a drop-in upgrade to Mistral 7B. Mistral reports MMLU 68.0% and HellaSwag 83.5%, with solid multilingual MMLU (French/German/Spanish ~62-65%). No first-party GGUF; community bartowski/unsloth GGUF builds load on current llama.cpp with no special patch (Q4_K_M ~7.5 GB fits an 8 GB card; Q6_K/Q8_0 for 12-24 GB). Recommended sampling temperature is a low ~0.3.

§02·GPUs that run this model
8 total
GPUVRAMSeriesBest speedMin VRAMWorksBenchmarksRecipe
Apple M2 Pro16GBapple~0recipecheck ↗
Apple M3 Max48GBapple~0recipecheck ↗
RTX 309024GB30~0recipecheck ↗
RTX 40608GB40~0recipecheck ↗
RTX 407012GB40~0recipecheck ↗
RTX 408016GB40~0recipecheck ↗
RTX 409024GB40~0recipecheck ↗
RX 7800 XT16GBamd~0recipecheck ↗

benchmarked·~ runs via recipe (not benchmarked)· untested·doesn't fit