§01·compatibility · /check
Llama 3.3 70B on RTX 4090
Yes — Llama 3.3 70B runs on the RTX 4090 (24 GB). Fastest community-measured result: 17.79 tokens/s.
✓ runsllmactive40 series24GB VRAM
§02·benchmarks
| Task | Quant | Speed | VRAM | Works | Confidence | Source | Verified |
|---|---|---|---|---|---|---|---|
| llm | Q4_K_XL | 17.79tokens/s | ✓ | hardware-corner.net· web | 2026-06-26 |
§03·common questions
Can you run Llama 3.3 70B on RTX 4090?
Yes — Llama 3.3 70B runs on the RTX 4090 (24 GB). Fastest community-measured result: 17.79 tokens/s.
Which quantizations have been tested for Llama 3.3 70B on RTX 4090?
Q4_K_XL — measured in community benchmarks.
How fast is Llama 3.3 70B on RTX 4090?
Up to 17.79 tokens/s (llm), the fastest community-measured result.
Where can I find step-by-step recipes for Llama 3.3 70B?
No recipe targets the RTX 4090 specifically yet, but 3 published recipes cover Llama 3.3 70B on other GPUs — a solid starting point. See the recipes listed below.
§04·more Llama 3.3 70B recipes