live · 154 benchmarks tracked
Will it run on your GPU?
Community benchmarks for self-hosting open-weights AI models. Real speed numbers, real peak-VRAM, real consumer hardware. No vendor marketing.
§02·latest · recipes & guides
imageintermediate16GB+Training a reusable character LoRA for Z-Image-Turbo
- multimodalintermediate24GB+
Qwen3.5-35B-A3B on RTX 5090: Blackwell MXFP4 MoE Chat at 165 tok/s
- multimodalintermediate20GB+
Qwen3.5 27B on RTX 5090: Q4_K GGUF local chat via llama.cpp
- videoadvanced14GB+
LTX-2.3 on RTX 4060 Ti 16GB: 22B Audio-Video at the 16 GB Floor via Distilled GGUF + Streamed Encoder
- llmadvanced24GB+
Llama 3.3 70B on RTX 4090: 70B-Class Chat on One 24 GB Card (Q4 Offload or Fully-On-GPU IQ2)
- llmbeginner12GB+
Qwen3-14B on RTX 4060 Ti 16GB: Q4_K_M GGUF via Ollama or llama.cpp
§03·contribute
Ran a benchmark?
Share the numbers.
Drop a GPU, a model, and the numbers you measured. A source link — a forum post, a gist, a screenshot — helps us cross-check before the entry shows up in the dataset.
open dataCC BY-SA
Submit a benchmark