§01·index · /recipes
Recipes
716 community-tested setups for running open-weights AI models on real consumer GPUs.page 3 of 8
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4060 Ti 8GB: 13-Language Distilled TTS in ~5 GB VRAM
- ttsbeginner2GB+
Kokoro TTS on RTX 4060 Ti 8GB: 82M-Parameter Text-to-Speech, 47 Voices, Under 3 GB VRAM
- llmbeginner4GB+
Qwen3-4B on RTX 4060 Ti 8GB: Q4_K_M GGUF via Ollama or llama.cpp
- imageintermediate11GB+
Chroma1-Base (V48) on RTX 3080 Ti: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Weight + CPU-Offload T5 in ComfyUI
- imageintermediate12GB+
Qwen-Image on RTX 3080 Ti: 20B Text-to-Image via ComfyUI GGUF Q3 (Ampere sm_86, 12 GB)
- imageintermediate12GB+
Juggernaut Z on RTX 3080 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI
- imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 3080 Ti: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload
- imageintermediate10GB+
ERNIE-Image-Turbo on RTX 3080 Ti: 8-step text-to-image via GGUF in ComfyUI
- ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 3080 Ti: local audio understanding in a tight 12 GB
- ttsintermediate8GB+
ACE-Step 1.5 XL on RTX 3080 Ti: Text-to-Music Generation via the 8 GB Optimization Path
- videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 3080 Ti: 720p Text/Image-to-Video in ComfyUI
- videointermediate12GB+
LightX2V on RTX 3080 Ti: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via INT8 + Offload
- ttsintermediate8GB+
Foundation-1 on RTX 3080 Ti: Structured Music Sample Generation
- ttsintermediate4GB+
OmniVoice on RTX 3080 Ti: Zero-Shot Voice Cloning Across 646 Languages
- ttsintermediate10GB+
Voxtral Mini 3B on RTX 3080 Ti: local speech understanding in ~9.5 GB
- ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 3080 Ti: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
- ttsbeginner8GB+
VoxCPM2 on RTX 3080 Ti: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 3080 Ti: 13-Language Distilled TTS in ~5 GB VRAM
- ttsbeginner5GB+
VoxCPM-0.5B on RTX 3080 Ti: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
- ttsbeginner2GB+
Kokoro TTS on RTX 3080 Ti: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model
- specializedintermediate3GB+
KiMoDo on RTX 3080 Ti: Text-to-3D-Motion Generation Guide
- specializedbeginner4GB+
SAM 3 on RTX 3080 Ti: Promptable Image and Video Segmentation
- multimodalintermediate4GB+
MiniMind-O on RTX 3080 Ti: 0.1B Omni Model with Headroom to Spare
- 3dintermediate10GB+
Hunyuan3D-2.1 on RTX 3080 Ti: Image-to-Mesh 3D Generation (Shape-Only)
- imagebeginner8GB+
Flux.2 Klein 4B on RTX 3080 Ti: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB
- multimodalbeginner6GB+
Gemma 4 E4B on RTX 3080 Ti: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)
- imageintermediate10GB+
HiDream-O1-Image on RTX 3080 Ti: 2048×2048 Text-to-Image with FP8 in ComfyUI
- imagebeginner7GB+
Anima 2B on RTX 3080 Ti: Native ComfyUI Anime Text-to-Image
- llmintermediate12GB+
gpt-oss 20B on RTX 3080 Ti: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
- llmbeginner10GB+
Llama 3.1 8B on RTX 3080 Ti: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
- llmbeginner12GB+
Qwen3-14B on RTX 3080 Ti: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner12GB+
Qwen3-8B on RTX 3080 Ti 12GB: Q4_K_M GGUF via Ollama or llama.cpp
- imageintermediate11GB+
Chroma1-Base (V48) on RTX 3060: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Weight + CPU-Offload T5 in ComfyUI
- imageintermediate12GB+
Qwen-Image on RTX 3060: 20B Text-to-Image via ComfyUI GGUF Q3 (Ampere sm_86, 12 GB)
- imageintermediate12GB+
Juggernaut Z on RTX 3060: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI
- imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 3060: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload
- imageintermediate10GB+
ERNIE-Image-Turbo on RTX 3060: 8-step text-to-image via GGUF in ComfyUI
- ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 3060: local audio understanding in a tight 12 GB
- ttsintermediate8GB+
ACE-Step 1.5 XL on RTX 3060: Text-to-Music Generation via the 8 GB Optimization Path
- videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 3060: 720p Text/Image-to-Video in ComfyUI
- videointermediate12GB+
LightX2V on RTX 3060: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via INT8 + Offload
- ttsintermediate8GB+
Foundation-1 on RTX 3060: Structured Music Sample Generation
- ttsintermediate4GB+
OmniVoice on RTX 3060: Zero-Shot Voice Cloning Across 646 Languages
- ttsintermediate10GB+
Voxtral Mini 3B on RTX 3060: local speech understanding in ~9.5 GB
- ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 3060: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
- ttsbeginner8GB+
VoxCPM2 on RTX 3060: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 3060: 13-Language Distilled TTS in ~5 GB VRAM
- ttsbeginner5GB+
VoxCPM-0.5B on RTX 3060: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
- ttsbeginner2GB+
Kokoro TTS on RTX 3060: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model
- specializedintermediate3GB+
KiMoDo on RTX 3060: Text-to-3D-Motion Generation Guide
- specializedbeginner4GB+
SAM 3 on RTX 3060: Promptable Image and Video Segmentation
- multimodalbeginner6GB+
Gemma 4 E4B on RTX 3060: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)
- multimodalintermediate4GB+
MiniMind-O on RTX 3060: 0.1B Omni Model with Headroom to Spare
- 3dintermediate10GB+
Hunyuan3D-2.1 on RTX 3060: Image-to-Mesh 3D Generation (Shape-Only)
- imagebeginner8GB+
Flux.2 Klein 4B on RTX 3060: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB
- imageintermediate10GB+
HiDream-O1-Image on RTX 3060: 2048×2048 Text-to-Image with FP8 in ComfyUI
- imagebeginner7GB+
Anima 2B on RTX 3060: Native ComfyUI Anime Text-to-Image
- llmintermediate12GB+
gpt-oss 20B on RTX 3060: MXFP4 Chat at 64 tok/s in 12 GB via llama.cpp Expert Offload
- llmbeginner10GB+
Llama 3.1 8B on RTX 3060: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
- llmbeginner12GB+
Qwen3-14B on RTX 3060 12GB: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner12GB+
Qwen3-8B on RTX 3060 12GB: Q4_K_M GGUF via Ollama or llama.cpp
- imageintermediate11GB+
Chroma1-Base (V48) on RTX 4070 Ti: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8 + CPU-Offload T5 in ComfyUI
- imageintermediate12GB+
Qwen-Image on RTX 4070 Ti: 20B Text-to-Image via ComfyUI GGUF Q3 (Ada sm_89, 12 GB)
- imageintermediate12GB+
Juggernaut Z on RTX 4070 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI
- imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4070 Ti: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload
- imageintermediate12GB+
ERNIE-Image-Turbo on RTX 4070 Ti: 8-step text-to-image via GGUF in ComfyUI
- ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4070 Ti: local audio understanding in a tight 12 GB
- ttsintermediate8GB+
ACE-Step 1.5 XL on RTX 4070 Ti: Text-to-Music Generation via the 8 GB Optimization Path
- videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4070 Ti: 720p Text/Image-to-Video in ComfyUI
- videointermediate8GB+
LightX2V on RTX 4070 Ti: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via Ada FP8 + Offload
- ttsintermediate8GB+
Foundation-1 on RTX 4070 Ti: Structured Music Sample Generation
- ttsintermediate4GB+
OmniVoice on RTX 4070 Ti: Zero-Shot Voice Cloning Across 646 Languages
- ttsintermediate10GB+
Voxtral Mini 3B on RTX 4070 Ti: local speech understanding in ~9.5 GB
- ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4070 Ti: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
- ttsbeginner8GB+
VoxCPM2 on RTX 4070 Ti: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
- ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4070 Ti: 13-Language Distilled TTS in ~5 GB VRAM
- ttsbeginner5GB+
VoxCPM-0.5B on RTX 4070 Ti: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
- ttsbeginner2GB+
Kokoro TTS on RTX 4070 Ti: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model
- specializedintermediate3GB+
KiMoDo on RTX 4070 Ti: Text-to-3D-Motion Generation Guide
- specializedbeginner4GB+
SAM 3 on RTX 4070 Ti: Promptable Image and Video Segmentation
- multimodalbeginner6GB+
Gemma 4 E4B on RTX 4070 Ti: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)
- multimodalintermediate4GB+
MiniMind-O on RTX 4070 Ti: 0.1B Omni Model with Headroom to Spare
- 3dintermediate10GB+
Hunyuan3D-2.1 on RTX 4070 Ti: Image-to-Mesh 3D Generation (Shape-Only)
- imagebeginner8GB+
Flux.2 Klein 4B on RTX 4070 Ti: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB
- imageintermediate10GB+
HiDream-O1-Image on RTX 4070 Ti: 2048×2048 Text-to-Image with FP8 in ComfyUI
- imagebeginner7GB+
Anima 2B on RTX 4070 Ti: Native ComfyUI Anime Text-to-Image
- llmintermediate12GB+
gpt-oss 20B on RTX 4070 Ti: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
- llmbeginner10GB+
Llama 3.1 8B on RTX 4070 Ti: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
- llmbeginner12GB+
Qwen3-14B on RTX 4070 Ti: Q4_K_M GGUF via Ollama or llama.cpp
- llmbeginner12GB+
Qwen3-8B on RTX 4070 Ti: Q4_K_M GGUF via Ollama or llama.cpp
- imageintermediate11GB+
Chroma1-Base (V48) on RTX 4070 SUPER: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8 + CPU-Offload T5 in ComfyUI
- imageintermediate12GB+
Qwen-Image on RTX 4070 SUPER: 20B Text-to-Image via ComfyUI GGUF Q3 (Ada sm_89, 12 GB)
- imageintermediate12GB+
Juggernaut Z on RTX 4070 SUPER: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI
- imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4070 SUPER: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload
- imageintermediate12GB+
ERNIE-Image-Turbo on RTX 4070 SUPER: 8-step text-to-image via GGUF in ComfyUI
- ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4070 SUPER: local audio understanding in a tight 12 GB
- ttsintermediate8GB+
ACE-Step 1.5 XL on RTX 4070 SUPER: Text-to-Music Generation via the 8 GB Optimization Path
- videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4070 SUPER: 720p Text/Image-to-Video in ComfyUI
- videointermediate8GB+
LightX2V on RTX 4070 SUPER: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via Ada FP8 + Offload
- ttsintermediate8GB+
Foundation-1 on RTX 4070 SUPER: Structured Music Sample Generation