§01·model · /models

Qwen3 32B

llmactiveApache-2.0

32B dense LLM by Alibaba (Qwen3) with hybrid thinking / non-thinking modes. Apache-2.0.

Download· 4 variants

§02·GPUs that run this model

7 total

GPU	VRAM	Series	Best speed	Min VRAM	Works	Benchmarks	Recipe
RTX 5090	32GB	50	61.4tokens/s		✓	3	recipe	check ↗
RTX 3090 Ti	24GB	30	38tokens/s	24GB	✓	2	recipe	check ↗
RTX 3090	24GB	30	35.1tokens/s	24GB	✓	2	recipe	check ↗
Apple M2 Max	64GB	apple			~	0	recipe	check ↗
Apple M3 Max	48GB	apple			~	0	recipe	check ↗
Apple M4 Max	48GB	apple			~	0	recipe	check ↗
RTX 4090	24GB	40			~	0	recipe	check ↗

✓ benchmarked·~ runs via recipe (not benchmarked)·— untested·✕doesn't fit