§01·model · /models

Mistral Nemo 12B

llmactiveApache-2.0

Mistral Nemo 12B (Instruct, release 2407) is a dense 12-billion-parameter model built by Mistral AI in collaboration with NVIDIA. Text-only, with a 128K-token context window and the Tekken tokenizer (its first use). Licensed Apache-2.0. Trained with quantization awareness for FP8 inference and tuned for function calling and multilingual use, it was positioned as a drop-in upgrade to Mistral 7B. Mistral reports MMLU 68.0% and HellaSwag 83.5%, with solid multilingual MMLU (French/German/Spanish ~62-65%). No first-party GGUF; community bartowski/unsloth GGUF builds load on current llama.cpp with no special patch (Q4_K_M ~7.5 GB fits an 8 GB card; Q6_K/Q8_0 for 12-24 GB). Recommended sampling temperature is a low ~0.3.

huggingface.co ↗mistral.ai ↗

§02·GPUs that run this model

8 total

GPU	VRAM	Series	Works	Recipe
Apple M2 Pro	16GB	apple	~	recipe	check ↗
Apple M3 Max	48GB	apple	~	recipe	check ↗
RTX 3090	24GB	30	~	recipe	check ↗
RTX 4060	8GB	40	~	recipe	check ↗
RTX 4070	12GB	40	~	recipe	check ↗
RTX 4080	16GB	40	~	recipe	check ↗
RTX 4090	24GB	40	~	recipe	check ↗
RX 7800 XT	16GB	amd	~	recipe	check ↗

✓ benchmarked·~ runs via recipe (not benchmarked)·— untested·✕doesn't fit