self-hosted/ai
§01·compatibility · /check

Llama 3.3 70B on RTX 4090

Yes — Llama 3.3 70B runs on the RTX 4090 (24 GB). Fastest community-measured result: 17.79 tokens/s.

runsllmactive40 series24GB VRAM
model
name
Llama 3.3 70B
slug
llama-3-3-70b
vertical
llm
status
active
open detail ↗
gpu
name
RTX 4090
slug
rtx-4090
vram
24 GB
series
40
open detail ↗
§02·benchmarks
TaskQuantSpeedVRAMWorksConfidenceSourceVerified
llmQ4_K_XL17.79tokens/shardware-corner.net· web2026-06-26
§03·common questions
Can you run Llama 3.3 70B on RTX 4090?

Yes — Llama 3.3 70B runs on the RTX 4090 (24 GB). Fastest community-measured result: 17.79 tokens/s.

Which quantizations have been tested for Llama 3.3 70B on RTX 4090?

Q4_K_XL — measured in community benchmarks.

How fast is Llama 3.3 70B on RTX 4090?

Up to 17.79 tokens/s (llm), the fastest community-measured result.

Where can I find step-by-step recipes for Llama 3.3 70B?

No recipe targets the RTX 4090 specifically yet, but 3 published recipes cover Llama 3.3 70B on other GPUs — a solid starting point. See the recipes listed below.

§04·more Llama 3.3 70B recipes