Gemma 4 12B
Gemma 4 12B (Instruct) is Google DeepMind's dense ~12-billion-parameter model from the Gemma 4 family (release 2026). It is an any-to-any 'unified' multimodal model — text, image and audio inputs, text output — using an encoder-free design that projects raw image patches and audio directly into the decoder, with a 256K-token context window and hybrid sliding-window/global attention. Licensed Apache-2.0 (Gemma 4 moved off the Gemma Terms of Use), with a separate Prohibited Use Policy. Google reports strong reasoning: MMLU Pro 77.2%, GPQA Diamond 78.8%, AIME 2026 77.5%, LiveCodeBench v6 72.0%. Google ships a first-party quantization-aware-trained Q4_0 GGUF (~7.0 GB) plus official ggml-org GGUFs; community unsloth builds add the full K_M ladder. Loads on current llama.cpp out of the box (Q4_K_M ~7.1 GB fits an 8 GB card). Vision/audio needs the separate mmproj projector.