Question 1

Can you run Llama 3.3 70B on RTX 4090?

Accepted Answer

Yes — Llama 3.3 70B runs on the RTX 4090 (24 GB). Fastest community-measured result: 17.79 tokens/s.

Question 2

Which quantizations have been tested for Llama 3.3 70B on RTX 4090?

Accepted Answer

Q4_K_XL — measured in community benchmarks.

Question 3

How fast is Llama 3.3 70B on RTX 4090?

Accepted Answer

Up to 17.79 tokens/s (llm), the fastest community-measured result.

Question 4

Where can I find step-by-step recipes for Llama 3.3 70B?

Accepted Answer

No recipe targets the RTX 4090 specifically yet, but 3 published recipes cover Llama 3.3 70B on other GPUs — a solid starting point. See the recipes listed below.

Llama 3.3 70B on RTX 4090

Llama 3.3 70B on Apple M3 Max: 70B-class chat in 48 GB unified memory with MLX

Llama 3.3 70B on Apple M4 Max: 70B-class chat in 48 GB unified memory with MLX

Llama 3.3 70B on Apple M2 Max: 70B-class chat in 64 GB unified memory with MLX