§01·compatibility · /check
gpt-oss 20B on RTX 3060
Yes — gpt-oss 20B runs on the RTX 3060 (12 GB). Fastest community-measured result: 64 tokens/s.
✓ runsllmactive30 series12GB VRAM
§02·benchmarks
| Task | Quant | Speed | VRAM | Works | Confidence | Source | Verified |
|---|---|---|---|---|---|---|---|
| llm | MXFP4 | 64tokens/s | ✓ | github.com· web | 2026-06-13 |
§03·common questions
Can you run gpt-oss 20B on RTX 3060?
Yes — gpt-oss 20B runs on the RTX 3060 (12 GB). Fastest community-measured result: 64 tokens/s.
Which quantizations have been tested for gpt-oss 20B on RTX 3060?
MXFP4 — measured in community benchmarks.
How fast is gpt-oss 20B on RTX 3060?
Up to 64 tokens/s (llm), the fastest community-measured result.
Are there step-by-step instructions for gpt-oss 20B on RTX 3060?
Yes — 5 published recipes document this setup; see the recipes listed below.
§04·related recipes
- llmintermediate12GB+
gpt-oss 20B on RTX 3060: MXFP4 Chat at 64 tok/s in 12 GB via llama.cpp Expert Offload
- llmintermediate12GB+
gpt-oss 20B on RTX 4070 Ti: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
- llmintermediate12GB+
gpt-oss 20B on RTX 4070 SUPER: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
- llmintermediate12GB+
gpt-oss 20B on RTX 4070: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
- llmintermediate12GB+
gpt-oss 20B on RTX 5070: MXFP4 Chat in 12 GB via llama.cpp Expert Offload