§01·model · /models

North Mini Code 1.0

llmactiveApache-2.0

Cohere Labs' first developer-focused model — an open, Apache-2.0 licensed agentic-coding LLM built for software engineering and terminal tasks (not general chat). A 30B-A3B Mixture-of-Experts (128 experts, 8 active per token; ~3B active) with a 256K context window, interleaved sliding-window/global attention, native tool-calling via JSON-schema, and interleaved thinking for multi-step agent loops. Vendor evals: 80.2% pass@10 on SWE-Bench Verified and 55.1% pass@10 on Terminal-Bench v2 (SFT), 61.0% pass@1 with mini-SWE-Agent. Runs locally via llama.cpp/Ollama serving a GGUF quant behind an OpenAI-compatible API, paired with an agent client (OpenHands, Aider, Cline). Because an MoE keeps all 128 experts resident in VRAM, footprint is the full quant file: Q4_K_M is ~17.5 GB (comfortable on 24 GB), down to Q2_K ~10.3 GB for tighter cards; long 256K context needs 32 GB+ or KV-cache quantization. Notably permissive vs Cohere Labs' usual non-commercial releases.

huggingface.co ↗huggingface.co ↗

§02·GPUs that run this model

8 total

GPU	VRAM	Series	Works	Recipe
Apple M2 Max	64GB	apple	~	recipe	check ↗
Apple M3 Max	48GB	apple	~	recipe	check ↗
RTX 3090	24GB	30	~	recipe	check ↗
RTX 3090 Ti	24GB	30	~	recipe	check ↗
RTX 4080	16GB	40	~	recipe	check ↗
RTX 4090	24GB	40	~	recipe	check ↗
RTX 5090	32GB	50	~	recipe	check ↗
RX 7900 XTX	24GB	amd	~	recipe	check ↗

✓ benchmarked·~ runs via recipe (not benchmarked)·— untested·✕doesn't fit