§01·spec · /gpus
RTX 5060 Ti
nvidia50 series16GB VRAM
§02·models that run on this GPU
9 total| Model | Vertical | Best speed | Min VRAM | Works | Benchmarks | |
|---|---|---|---|---|---|---|
| Llama 3.2 1B | llm | 192tokens/s | 16GB | ✓ | 2 | check ↗ |
| gpt-oss 20B | llm | 92.1tokens/s | 16GB | ✓ | 2 | check ↗ |
| GPT-OSS 20B MoE | llm | 82tokens/s | 16GB | ✓ | 1 | check ↗ |
| Qwen3-8B | llm | 69.2tokens/s | 16GB | ✓ | 2 | check ↗ |
| Llama 3.1 8B | llm | 55.5tokens/s | 16GB | ✓ | 2 | check ↗ |
| Qwen 3.5 35B-A3B | llm | 44tokens/s | 16GB | ✓ | 2 | check ↗ |
| Qwen3 14B | llm | 41.1tokens/s | 16GB | ✓ | 2 | check ↗ |
| Qwen 2.5 14B | llm | 33tokens/s | 16GB | ✓ | 2 | check ↗ |
| Flux.1 Dev | image | 16GB | ✓ | 1 | check ↗ |
§03·tested recipes
showing 6 of 20- llmbeginner16GB+recipe
Qwen3-8B on RTX 5060 Ti: Q4_K_M GGUF via Ollama or llama.cpp
- multimodalbeginner6GB+recipe
Gemma 4 E4B on RTX 5060 Ti: Multimodal Inference with transformers or llama.cpp
- ttsintermediate5GB+recipe
OpenAudio S1 Mini on RTX 5060 Ti: 13-Language Distilled TTS in ~5 GB VRAM
- ttsintermediate4GB+recipe
OmniVoice on RTX 5060 Ti: Zero-Shot Voice Cloning Across 646 Languages
- imageintermediate12GB+recipe
SenseNova U1 (8B-MoT) on RTX 5060 Ti: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
- imageintermediate16GB+recipe
LongCat-Image (base T2I) on RTX 5060 Ti: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF