§01·compatibility · /check
Qwen3-8B on RTX 3080 Ti
✓ runsllmactive30 series12GB VRAM
§02·benchmarks
| Task | Quant | Speed | VRAM | Works | Confidence | Source | Verified |
|---|---|---|---|---|---|---|---|
| llm | Q4_K | 4211.7prefill tokens/s | ✓ | hardware-corner.net· web | 2026-05-15 | ||
| llm | Q4_K | 115.2tokens/s | ✓ | hardware-corner.net· web | 2026-05-15 |
§03·related recipes
- llmbeginner6GB+
Qwen3-8B on RTX 5090: Q4_K_M GGUF with 26 GB of Headroom for Colocation, BF16, or Full 131K Context
- llmbeginner6GB+
Qwen3-8B on RTX 3090: Q4_K_M GGUF with 18 GB of Headroom for Colocation or Long Context
- llmbeginner6GB+
Qwen3-8B on RTX 4090: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner16GB+
Qwen3-8B on RTX 4060 Ti 16GB: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner16GB+
Qwen3-8B on RTX 5060 Ti: Q4_K_M GGUF via Ollama or llama.cpp