§01·index · /recipes
Recipes
716 community-tested setups for running open-weights AI models on real consumer GPUs.page 7 of 8
- multimodalbeginner20GB+
Gemma 4 E4B on RTX 3090: Multimodal Inference via BF16 (with 16 GB of Headroom to Spend)
- 3dintermediate12GB+
Waypoint 1.5 on RTX 3090: Real-Time Interactive World Model at 720p, 30 FPS
- imagebeginner13GB+
Flux.2 Klein 4B on RTX 3090: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image
- imageintermediate13GB+
Qwen-Image on RTX 3090: 20B Text-to-Image via ComfyUI GGUF (Ampere Path — No FP8)
- videointermediate24GB+
HunyuanVideo-1.5 on RTX 3090: 480p Step-Distilled Image-to-Video on a Razor-Thin 24 GB Envelope
- videointermediate24GB+
Wan 2.2 T2V-A14B on RTX 3090: 720p text-to-video in ComfyUI with FP8 weights (Ampere)
- videointermediate14GB+
LightX2V on RTX 3090: 4-Step Text-to-Video with Distilled Wan2.1-14B via INT8 / BF16 Offload
- videointermediate22GB+
Mochi 1 on RTX 3090: 49-frame 480p Text-to-Video with Diffusers
- videointermediate14GB+
CogVideoX 1.5 5B on RTX 3090: 1360x768 Text-to-Video with Diffusers
- videointermediate24GB+
Wan 2.2 TI2V-5B on RTX 3090: 720p Text/Image-to-Video in ComfyUI
- llmintermediate22GB+
Qwen3-32B on RTX 3090: UD-Q4_K_XL GGUF via llama.cpp
- llmintermediate18GB+
Gemma 4 26B A4B-it on RTX 3090: Local Multimodal Chat via Q4_K_M GGUF + llama.cpp
- llmbeginner10GB+
Qwen3-14B on RTX 3090: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner12GB+
DeepSeek-R1-Distill-Qwen-14B on RTX 3090 via Ollama Q4_K_M GGUF
- llmbeginner6GB+
Qwen3-8B on RTX 3090: Q4_K_M GGUF with 18 GB of Headroom for Colocation or Long Context
- llmbeginner10GB+
Llama 3.1 8B on RTX 3090: Local Chat via llama.cpp + Unsloth UD-Q4_K_XL GGUF
- llmbeginner16GB+
gpt-oss 20B on RTX 3090: MXFP4 Chat at 147 tok/s via Ollama or vLLM
- imageintermediate20GB+
HiDream-O1-Image Full BF16 on RTX 3090: 2048×2048 Text-to-Image in ComfyUI
- imageintermediate18GB+
LongCat-Image (base T2I) on RTX 3090: Bilingual 6B Text-to-Image via diffusers BF16 + CPU Offload
- imageintermediate24GB+
Chroma1-Base (V48) on RTX 3090: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Diffusers BF16
- imageintermediate16GB+
Juggernaut Z on RTX 3090: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI
- imagebeginner16GB+
Z-Image Turbo on RTX 3090: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI
- multimodalintermediate4GB+
MiniMind-O on RTX 4060 Ti 16GB: 0.1B Omni Model with Headroom to Spare
- ttsintermediate8GB+
Foundation-1 on RTX 4060 Ti 16GB: Structured Music Sample Generation
- ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 4060 Ti 16GB: Text-to-Music Generation in ComfyUI
- ttsbeginner2GB+
Kokoro TTS on RTX 4060 Ti 16GB: 82M-Parameter Text-to-Speech, 54 Voices, Under 3 GB VRAM
- videoadvanced16GB+
Sulphur 2 on RTX 4060 Ti 16GB: Uncensored LTX-2.3 Video via ComfyUI GGUF
- ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4060 Ti 16GB: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
- ttsbeginner5GB+
VoxCPM-0.5B on RTX 4060 Ti 16GB: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
- ttsintermediate4GB+
OmniVoice on RTX 4060 Ti 16GB: Zero-Shot Voice Cloning Across 646 Languages with Room to Spare
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4060 Ti 16GB: 13-Language Distilled TTS in ~5 GB VRAM
- imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4060 Ti 16GB: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
- videointermediate8GB+
LightX2V on RTX 4060 Ti 16GB: 4-Step Text-to-Video via the LightX2V Framework with Distilled Wan2.1-14B
- ttsintermediate10GB+
Voxtral Mini 3B on RTX 4060 Ti 16GB: local speech understanding in ~9.5 GB
- ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4060 Ti 16GB: local audio understanding in ~12 GB
- ttsbeginner8GB+
VoxCPM2 on RTX 4060 Ti 16GB: 30-Language 48kHz Voice Cloning with Headroom to Spare
- multimodalbeginner20GB+
Gemma 4 E4B on RTX 4090: Multimodal Inference via BF16 (with optional Q8_0 / Q4_K_M GGUF)
- 3dintermediate12GB+
Waypoint 1.5 on RTX 4090: Real-Time Interactive World Model at 720p
- videointermediate24GB+
LightX2V on RTX 4090: 4-Step Text-to-Video with Distilled Wan2.1-14B
- videointermediate24GB+
Wan 2.2 TI2V-5B on RTX 4090: 720p Text/Image-to-Video in ComfyUI
- imagebeginner20GB+
Flux.2 Klein 4B on RTX 4090: BF16 Full-Resident 4-Step Text-to-Image via Diffusers or ComfyUI
- imageintermediate16GB+
Juggernaut Z on RTX 4090: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI
- llmbeginner16GB+
gpt-oss 20B on RTX 4090: MXFP4 chat via Ollama or vLLM
- llmbeginner10GB+
DeepSeek-R1-Distill-Qwen-14B on RTX 4090 via Ollama Q4_K_M GGUF
- llmintermediate18GB+
Gemma 4 26B A4B-it on RTX 4090: Local Multimodal Chat via Q4_K_M GGUF + llama.cpp
- llmintermediate22GB+
Qwen3-32B on RTX 4090: UD-Q4_K_XL GGUF via llama.cpp
- llmbeginner10GB+
Qwen3-14B on RTX 4090: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner6GB+
Qwen3-8B on RTX 4090: Q4_K_M GGUF via Ollama or llama.cpp
- imageintermediate21GB+
Qwen-Image on RTX 4090: 20B Text-to-Image via ComfyUI FP8 (Native Path)
- imageintermediate18GB+
LongCat-Image (base T2I) on RTX 4090: Bilingual 6B Text-to-Image via diffusers
- imageintermediate24GB+
Chroma1-Base (V48) on RTX 4090: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Diffusers BF16
- imageintermediate20GB+
HiDream-O1-Image on RTX 4090: 2048×2048 Text-to-Image with BF16 in ComfyUI
- imagebeginner16GB+
Z-Image Turbo on RTX 4090: 8-Step 1024x1024 Text-to-Image at BF16 in ~2.3s with Diffusers or ComfyUI
- llmbeginner10GB+
Llama 3.1 8B on RTX 4090: Local Chat via llama.cpp + Unsloth UD-Q4_K_XL GGUF
- videointermediate14GB+
HunyuanVideo-1.5 on RTX 4090: 480p Step-Distilled Text-to-Video in ~75 Seconds
- videointermediate24GB+
Wan 2.2 T2V-A14B on RTX 4090: 720p text-to-video in ComfyUI with FP8 scaled weights
- videointermediate14GB+
CogVideoX 1.5 5B on RTX 4090: 1360x768 Text-to-Video with Diffusers
- videointermediate20GB+
Mochi 1 on RTX 4090: 49-frame 480p Text-to-Video with Diffusers
- imageintermediate10GB+
HiDream-O1-Image on RTX 4060 Ti 16GB: 2048×2048 Text-to-Image with FP8 in ComfyUI
- imageintermediate16GB+
Chroma1-Base (V48) on RTX 4060 Ti 16GB: Uncensored 8.9B FLUX.1-Schnell De-Distillation via GGUF in ComfyUI
- imageintermediate16GB+
Juggernaut Z on RTX 4060 Ti 16GB: Cinematic Photoreal Fine-Tune of Z-Image Base
- imageintermediate16GB+
LongCat-Image (base T2I) on RTX 4060 Ti 16GB: Bilingual 6B Text-to-Image via ComfyUI GGUF
- imageintermediate13GB+
Qwen-Image on RTX 4060 Ti 16GB: 20B Text-to-Image via ComfyUI GGUF
- videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4060 Ti 16GB: 720p Text/Image-to-Video in ComfyUI
- imageintermediate16GB+
Z-Image Turbo on RTX 4060 Ti 16GB: 8-Step Text-to-Image at BF16 with Diffusers or ComfyUI
- multimodalbeginner6GB+
Gemma 4 E4B on RTX 4060 Ti 16GB: Multimodal Inference via Q4_K_M GGUF (with optional Q8_0 / BF16)
- llmbeginner16GB+
Qwen3-8B on RTX 4060 Ti 16GB: Q4_K_M GGUF via Ollama or llama.cpp
- specializedbeginner4GB+
SAM 3 on RTX 4060 Ti 16GB: Promptable Image and Video Segmentation
- ttsintermediate8GB+
Foundation-1 on RTX 4060: Structured Music Sample Generation at the 8 GB Floor
- ttsbeginner5GB+
VoxCPM on RTX 4060: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
- ttsintermediate4GB+
OmniVoice on RTX 4060: Zero-Shot Voice Cloning Across 646 Languages in 8 GB
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4060: 13-Language Distilled TTS in ~5 GB VRAM
- llmbeginner4GB+
Qwen3-4B on RTX 4060: Q4_K_M GGUF via Ollama or llama.cpp
- multimodalintermediate4GB+
MiniMind-O on RTX 4060: 0.1B Omni Model (Text + Speech + Image In/Out)
- specializedbeginner4GB+
SAM 3 on RTX 4060: Promptable Image and Video Segmentation
- ttsbeginner2GB+
Kokoro TTS on RTX 4060: 82M-Parameter Text-to-Speech, 54 Voices, Under 3 GB VRAM
- llmbeginner16GB+
Qwen3-8B on RTX 5060 Ti: Q4_K_M GGUF via Ollama or llama.cpp
- multimodalbeginner6GB+
Gemma 4 E4B on RTX 5060 Ti: Multimodal Inference with transformers or llama.cpp
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 5060 Ti: 13-Language Distilled TTS in ~5 GB VRAM
- ttsintermediate4GB+
OmniVoice on RTX 5060 Ti: Zero-Shot Voice Cloning Across 646 Languages
- imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 5060 Ti: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
- imageintermediate16GB+
LongCat-Image (base T2I) on RTX 5060 Ti: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF
- ttsintermediate8GB+
Foundation-1 on RTX 5060 Ti: Structured Music Sample Generation
- ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 5060 Ti: Text-to-Music Generation in ComfyUI
- ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 5060 Ti: Multilingual Voice Cloning in 10 Languages
- imageintermediate16GB+
Chroma V48 on RTX 5060 Ti: Uncensored 8.9B Flux.1-Schnell De-Distillation via GGUF in ComfyUI
- ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 5060 Ti: local audio understanding in ~12 GB
- videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 5060 Ti: 720p Text/Image-to-Video in ComfyUI
- imageintermediate13GB+
Qwen-Image on RTX 5060 Ti: 20B Text-to-Image via GGUF Quantization
- ttsbeginner2GB+
Kokoro TTS on RTX 5060 Ti: 82M-Parameter Text-to-Speech, 54 Voices, Under 3 GB VRAM
- ttsbeginner8GB+
VoxCPM2 on RTX 5060 Ti: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
- videointermediate8GB+
LightX2V on RTX 5060 Ti: 4-Step Text-to-Video with Distilled Wan2.1-14B
- videoadvanced16GB+
Sulphur 2 on RTX 5060 Ti: Uncensored LTX-2.3 Video via GGUF in ComfyUI
- imageintermediate16GB+
Juggernaut Z on RTX 5060 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base
- ttsintermediate10GB+
Voxtral Mini 3B on RTX 5060 Ti: local speech understanding in ~9.5 GB
- imageintermediate10GB+
HiDream-O1-Image on RTX 5060 Ti: 2048×2048 Text-to-Image with FP8 in ComfyUI
- 3dintermediate10GB+
Hunyuan3D-2.1 on RTX 5060 Ti: Image-to-Mesh 3D Generation (Shape-Only)
- multimodalintermediate4GB+
MiniMind-O on RTX 5060 Ti: 0.1B Omni Model (Text + Speech + Image In/Out)
- ttsbeginner5GB+
VoxCPM on RTX 5060 Ti: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
- specializedintermediate3GB+
KiMoDo on RTX 5060 Ti: Text-to-3D-Motion Generation Guide