North Mini Code 1.0
Cohere Labs' first developer-focused model — an open, Apache-2.0 licensed agentic-coding LLM built for software engineering and terminal tasks (not general chat). A 30B-A3B Mixture-of-Experts (128 experts, 8 active per token; ~3B active) with a 256K context window, interleaved sliding-window/global attention, native tool-calling via JSON-schema, and interleaved thinking for multi-step agent loops. Vendor evals: 80.2% pass@10 on SWE-Bench Verified and 55.1% pass@10 on Terminal-Bench v2 (SFT), 61.0% pass@1 with mini-SWE-Agent. Runs locally via llama.cpp/Ollama serving a GGUF quant behind an OpenAI-compatible API, paired with an agent client (OpenHands, Aider, Cline). Because an MoE keeps all 128 experts resident in VRAM, footprint is the full quant file: Q4_K_M is ~17.5 GB (comfortable on 24 GB), down to Q2_K ~10.3 GB for tighter cards; long 256K context needs 32 GB+ or KV-cache quantization. Notably permissive vs Cohere Labs' usual non-commercial releases.
| GPU | VRAM | Series | Best speed | Min VRAM | Works | Benchmarks | Recipe | |
|---|---|---|---|---|---|---|---|---|
| Apple M2 Max | 64GB | apple | ~ | 0 | recipe | check ↗ | ||
| Apple M3 Max | 48GB | apple | ~ | 0 | recipe | check ↗ | ||
| RTX 3090 | 24GB | 30 | ~ | 0 | recipe | check ↗ | ||
| RTX 3090 Ti | 24GB | 30 | ~ | 0 | recipe | check ↗ | ||
| RTX 4080 | 16GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 4090 | 24GB | 40 | ~ | 0 | recipe | check ↗ | ||
| RTX 5090 | 32GB | 50 | ~ | 0 | recipe | check ↗ | ||
| RX 7900 XTX | 24GB | amd | ~ | 0 | recipe | check ↗ |
✓ benchmarked·~ runs via recipe (not benchmarked)·— untested·✕doesn't fit