§01·index · /recipes

Recipes

716 community-tested setups for running open-weights AI models on real consumer GPUs.page 5 of 8

imagebeginner7GB+
Anima 2B on RTX 5070 Ti: Native ComfyUI Anime Text-to-Image
imagebeginner13GB+
Flux.2 Klein 4B on RTX 5070 Ti: Blackwell-Native FP8 4-Step Text-to-Image at ~8.4 GB
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 5070 Ti: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
imageintermediate16GB+
LongCat-Image (base T2I) on RTX 5070 Ti: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF
imageintermediate13GB+
Qwen-Image on RTX 5070 Ti: 20B Text-to-Image via ComfyUI GGUF (Blackwell sm_120, 16 GB)
imageintermediate14GB+
Chroma1-Base (V48) on RTX 5070 Ti: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Blackwell-Native FP8 in ComfyUI
imageintermediate10GB+
HiDream-O1-Image on RTX 5070 Ti: 2048×2048 Text-to-Image with FP8 in ComfyUI
imageintermediate16GB+
Juggernaut Z on RTX 5070 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16
imageintermediate16GB+
Z-Image Turbo on RTX 5070 Ti: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI
llmbeginner16GB+
gpt-oss 20B on RTX 5070 Ti: MXFP4 Chat at 156 tok/s via Ollama or vLLM
llmbeginner16GB+
Qwen3-14B on RTX 5070 Ti: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner16GB+
Qwen3-8B on RTX 5070 Ti: Q4_K_M GGUF via Ollama or llama.cpp
specializedintermediate3GB+
KiMoDo on RTX 4080 SUPER: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 4080 SUPER: Promptable Image and Video Segmentation
multimodalintermediate4GB+
MiniMind-O on RTX 4080 SUPER: 0.1B Omni Model with Headroom to Spare
3dadvanced16GB+
TRELLIS image-large on RTX 4080 SUPER: Image-to-3D Mesh Generation at the 16 GB Floor
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 4080 SUPER: Image-to-Mesh 3D Generation (Shape-Only)
3dintermediate12GB+
Waypoint 1.5 on RTX 4080 SUPER: Real-Time Interactive World Model at 720p
ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 4080 SUPER: Text-to-Music Generation in ComfyUI
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4080 SUPER: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4080 SUPER: local audio understanding in ~12 GB
ttsintermediate8GB+
Foundation-1 on RTX 4080 SUPER: Structured Music Sample Generation
ttsintermediate10GB+
Voxtral Mini 3B on RTX 4080 SUPER: local speech understanding in ~9.5 GB
ttsintermediate4GB+
OmniVoice on RTX 4080 SUPER: Zero-Shot Voice Cloning Across 600+ Languages with Room to Spare
ttsbeginner8GB+
VoxCPM2 on RTX 4080 SUPER: 30-Language 48kHz Voice Cloning with 8 GB to Spare
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4080 SUPER: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner5GB+
VoxCPM-0.5B on RTX 4080 SUPER: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsbeginner1GB+
Kokoro TTS on RTX 4080 SUPER: Universal 82M Voice Synthesis with 15 GB to Spare
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4080 SUPER: 720p Text/Image-to-Video in ComfyUI
videointermediate8GB+
LightX2V on RTX 4080 SUPER: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via FP8/INT8 + Offload
videoadvanced16GB+
Sulphur 2 on RTX 4080 SUPER: Uncensored LTX-2.3 Video via ComfyUI GGUF
videoadvanced16GB+
LTX-2.3 on RTX 4080 SUPER: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4080 SUPER: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
imageintermediate12GB+
ERNIE-Image-Turbo on RTX 4080 SUPER: 8-step text-to-image via GGUF in ComfyUI
imagebeginner7GB+
Anima 2B on RTX 4080 SUPER: Native ComfyUI Anime Text-to-Image
imageintermediate16GB+
LongCat-Image (base T2I) on RTX 4080 SUPER: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF
imagebeginner13GB+
Flux.2 Klein 4B on RTX 4080 SUPER: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image
imageintermediate13GB+
Qwen-Image on RTX 4080 SUPER: 20B Text-to-Image via ComfyUI GGUF (Ada sm_89, 16 GB)
imageintermediate16GB+
Chroma1-Base (V48) on RTX 4080 SUPER: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Scaled in ComfyUI
imageintermediate10GB+
HiDream-O1-Image on RTX 4080 SUPER: 2048×2048 Text-to-Image with FP8 in ComfyUI
imageintermediate16GB+
Juggernaut Z on RTX 4080 SUPER: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI
imageintermediate16GB+
Z-Image Turbo on RTX 4080 SUPER: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI
llmbeginner10GB+
Llama 3.1 8B on RTX 4080 SUPER: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmbeginner16GB+
gpt-oss 20B on RTX 4080 SUPER: MXFP4 chat at 139 tok/s via Ollama or vLLM
llmbeginner16GB+
Qwen3-14B on RTX 4080 SUPER: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner6GB+
Qwen3-8B on RTX 4080 SUPER: Q4_K_M GGUF via Ollama or llama.cpp
specializedintermediate3GB+
KiMoDo on RTX 4070 Ti Super: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 4070 Ti Super: Promptable Image and Video Segmentation
multimodalintermediate4GB+
MiniMind-O on RTX 4070 Ti Super: 0.1B Omni Model with Headroom to Spare
3dadvanced16GB+
TRELLIS image-large on RTX 4070 Ti SUPER: Image-to-3D Mesh Generation at the 16 GB Floor
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 4070 Ti SUPER: Image-to-Mesh 3D Generation (Shape-Only)
3dintermediate12GB+
Waypoint 1.5 on RTX 4070 Ti SUPER: Real-Time Interactive World Model at 720p
ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 4070 Ti Super: Text-to-Music Generation in ComfyUI
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4070 Ti SUPER: local audio understanding in ~12 GB
ttsintermediate8GB+
Foundation-1 on RTX 4070 Ti SUPER: Structured Music Sample Generation
ttsintermediate10GB+
Voxtral Mini 3B on RTX 4070 Ti SUPER: local speech understanding in ~9.5 GB
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4070 Ti SUPER: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
ttsbeginner5GB+
VoxCPM-0.5B on RTX 4070 Ti SUPER: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsintermediate4GB+
OmniVoice on RTX 4070 Ti SUPER: Zero-Shot Voice Cloning Across 600+ Languages with Room to Spare
ttsbeginner8GB+
VoxCPM2 on RTX 4070 Ti Super: 30-Language 48kHz Voice Cloning with 8 GB to Spare
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4070 Ti Super: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner1GB+
Kokoro TTS on RTX 4070 Ti SUPER: Universal 82M Voice Synthesis with 15 GB to Spare
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4070 Ti SUPER: 720p Text/Image-to-Video in ComfyUI
videointermediate8GB+
LightX2V on RTX 4070 Ti SUPER: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via FP8/INT8 + Offload
videoadvanced16GB+
Sulphur 2 on RTX 4070 Ti SUPER: Uncensored LTX-2.3 Video via ComfyUI GGUF
videoadvanced16GB+
LTX-2.3 on RTX 4070 Ti SUPER: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4070 Ti SUPER: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
imageintermediate12GB+
ERNIE-Image-Turbo on RTX 4070 Ti SUPER: 8-step text-to-image via GGUF in ComfyUI
imagebeginner7GB+
Anima 2B on RTX 4070 Ti SUPER: Native ComfyUI Anime Text-to-Image
imageintermediate16GB+
LongCat-Image (base T2I) on RTX 4070 Ti SUPER: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF
imagebeginner13GB+
Flux.2 Klein 4B on RTX 4070 Ti SUPER: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image
imageintermediate13GB+
Qwen-Image on RTX 4070 Ti SUPER: 20B Text-to-Image via ComfyUI GGUF (Ada sm_89, 16 GB)
imageintermediate16GB+
Chroma1-Base (V48) on RTX 4070 Ti SUPER: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Scaled in ComfyUI
imageintermediate10GB+
HiDream-O1-Image on RTX 4070 Ti SUPER: 2048×2048 Text-to-Image with FP8 in ComfyUI
imageintermediate16GB+
Juggernaut Z on RTX 4070 Ti SUPER: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI
imageintermediate16GB+
Z-Image Turbo on RTX 4070 Ti SUPER: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI
llmbeginner10GB+
Llama 3.1 8B on RTX 4070 Ti SUPER: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmbeginner16GB+
gpt-oss 20B on RTX 4070 Ti SUPER: MXFP4 chat at 129 tok/s via Ollama or vLLM
llmbeginner16GB+
Qwen3-14B on RTX 4070 Ti SUPER: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner6GB+
Qwen3-8B on RTX 4070 Ti SUPER: Q4_K_M GGUF via Ollama or llama.cpp
specializedintermediate3GB+
KiMoDo on RTX 4080: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 4080: Promptable Image and Video Segmentation
multimodalintermediate4GB+
MiniMind-O on RTX 4080: 0.1B Omni Model with Headroom to Spare
3dadvanced16GB+
TRELLIS image-large on RTX 4080: Image-to-3D Mesh Generation at the 16 GB Floor
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 4080: Image-to-Mesh 3D Generation (Shape-Only)
3dintermediate12GB+
Waypoint 1.5 on RTX 4080: Real-Time Interactive World Model at 720p
ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 4080: Text-to-Music Generation in ComfyUI
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4080: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4080: local audio understanding in ~12 GB
ttsintermediate8GB+
Foundation-1 on RTX 4080: Structured Music Sample Generation
ttsintermediate10GB+
Voxtral Mini 3B on RTX 4080: local speech understanding in ~9.5 GB
ttsbeginner8GB+
VoxCPM2 on RTX 4080: 30-Language 48kHz Voice Cloning with 8 GB to Spare
ttsbeginner5GB+
VoxCPM-0.5B on RTX 4080: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4080: 13-Language Distilled TTS in ~5 GB VRAM
ttsintermediate4GB+
OmniVoice on RTX 4080: Zero-Shot Voice Cloning Across 600+ Languages with Room to Spare
ttsbeginner1GB+
Kokoro TTS on RTX 4080: Universal 82M Voice Synthesis with 15 GB to Spare
videoadvanced16GB+
LTX-2.3 on RTX 4080: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma
videointermediate8GB+
LightX2V on RTX 4080: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via FP8/INT8 + Offload
videoadvanced16GB+
Sulphur 2 on RTX 4080: Uncensored LTX-2.3 Video via ComfyUI GGUF
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4080: 720p Text/Image-to-Video in ComfyUI

Recipes

Anima 2B on RTX 5070 Ti: Native ComfyUI Anime Text-to-Image

Flux.2 Klein 4B on RTX 5070 Ti: Blackwell-Native FP8 4-Step Text-to-Image at ~8.4 GB

SenseNova U1 (8B-MoT) on RTX 5070 Ti: VAE-Free Unified Image Gen + Understanding via Q4 GGUF

LongCat-Image (base T2I) on RTX 5070 Ti: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF

Qwen-Image on RTX 5070 Ti: 20B Text-to-Image via ComfyUI GGUF (Blackwell sm_120, 16 GB)

Chroma1-Base (V48) on RTX 5070 Ti: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Blackwell-Native FP8 in ComfyUI

HiDream-O1-Image on RTX 5070 Ti: 2048×2048 Text-to-Image with FP8 in ComfyUI

Juggernaut Z on RTX 5070 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16

Z-Image Turbo on RTX 5070 Ti: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI

gpt-oss 20B on RTX 5070 Ti: MXFP4 Chat at 156 tok/s via Ollama or vLLM

Qwen3-14B on RTX 5070 Ti: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 5070 Ti: Q4_K_M GGUF via Ollama or llama.cpp

KiMoDo on RTX 4080 SUPER: Text-to-3D-Motion Generation Guide

SAM 3 on RTX 4080 SUPER: Promptable Image and Video Segmentation

MiniMind-O on RTX 4080 SUPER: 0.1B Omni Model with Headroom to Spare

TRELLIS image-large on RTX 4080 SUPER: Image-to-3D Mesh Generation at the 16 GB Floor

Hunyuan3D-2.1 on RTX 4080 SUPER: Image-to-Mesh 3D Generation (Shape-Only)

Waypoint 1.5 on RTX 4080 SUPER: Real-Time Interactive World Model at 720p

ACE-Step 1.5 XL on RTX 4080 SUPER: Text-to-Music Generation in ComfyUI

Qwen3-TTS 1.7B-Base on RTX 4080 SUPER: Multilingual Voice Cloning in 10 Languages with FlashAttention-2

MOSS-Audio 4B-Instruct on RTX 4080 SUPER: local audio understanding in ~12 GB

Foundation-1 on RTX 4080 SUPER: Structured Music Sample Generation

Voxtral Mini 3B on RTX 4080 SUPER: local speech understanding in ~9.5 GB

OmniVoice on RTX 4080 SUPER: Zero-Shot Voice Cloning Across 600+ Languages with Room to Spare

VoxCPM2 on RTX 4080 SUPER: 30-Language 48kHz Voice Cloning with 8 GB to Spare

OpenAudio S1 Mini on RTX 4080 SUPER: 13-Language Distilled TTS in ~5 GB VRAM

VoxCPM-0.5B on RTX 4080 SUPER: Zero-Shot Voice Cloning TTS in ~5 GB VRAM

Kokoro TTS on RTX 4080 SUPER: Universal 82M Voice Synthesis with 15 GB to Spare

Wan 2.2 TI2V-5B on RTX 4080 SUPER: 720p Text/Image-to-Video in ComfyUI

LightX2V on RTX 4080 SUPER: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via FP8/INT8 + Offload

Sulphur 2 on RTX 4080 SUPER: Uncensored LTX-2.3 Video via ComfyUI GGUF

LTX-2.3 on RTX 4080 SUPER: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma

SenseNova U1 (8B-MoT) on RTX 4080 SUPER: VAE-Free Unified Image Gen + Understanding via Q4 GGUF

ERNIE-Image-Turbo on RTX 4080 SUPER: 8-step text-to-image via GGUF in ComfyUI

Anima 2B on RTX 4080 SUPER: Native ComfyUI Anime Text-to-Image

LongCat-Image (base T2I) on RTX 4080 SUPER: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF

Flux.2 Klein 4B on RTX 4080 SUPER: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image

Qwen-Image on RTX 4080 SUPER: 20B Text-to-Image via ComfyUI GGUF (Ada sm_89, 16 GB)

Chroma1-Base (V48) on RTX 4080 SUPER: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Scaled in ComfyUI

HiDream-O1-Image on RTX 4080 SUPER: 2048×2048 Text-to-Image with FP8 in ComfyUI

Juggernaut Z on RTX 4080 SUPER: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI

Z-Image Turbo on RTX 4080 SUPER: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI

Llama 3.1 8B on RTX 4080 SUPER: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

gpt-oss 20B on RTX 4080 SUPER: MXFP4 chat at 139 tok/s via Ollama or vLLM

Qwen3-14B on RTX 4080 SUPER: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 4080 SUPER: Q4_K_M GGUF via Ollama or llama.cpp

KiMoDo on RTX 4070 Ti Super: Text-to-3D-Motion Generation Guide

SAM 3 on RTX 4070 Ti Super: Promptable Image and Video Segmentation

MiniMind-O on RTX 4070 Ti Super: 0.1B Omni Model with Headroom to Spare

TRELLIS image-large on RTX 4070 Ti SUPER: Image-to-3D Mesh Generation at the 16 GB Floor

Hunyuan3D-2.1 on RTX 4070 Ti SUPER: Image-to-Mesh 3D Generation (Shape-Only)

Waypoint 1.5 on RTX 4070 Ti SUPER: Real-Time Interactive World Model at 720p

ACE-Step 1.5 XL on RTX 4070 Ti Super: Text-to-Music Generation in ComfyUI

MOSS-Audio 4B-Instruct on RTX 4070 Ti SUPER: local audio understanding in ~12 GB

Foundation-1 on RTX 4070 Ti SUPER: Structured Music Sample Generation

Voxtral Mini 3B on RTX 4070 Ti SUPER: local speech understanding in ~9.5 GB

Qwen3-TTS 1.7B-Base on RTX 4070 Ti SUPER: Multilingual Voice Cloning in 10 Languages with FlashAttention-2

VoxCPM-0.5B on RTX 4070 Ti SUPER: Zero-Shot Voice Cloning TTS in ~5 GB VRAM

OmniVoice on RTX 4070 Ti SUPER: Zero-Shot Voice Cloning Across 600+ Languages with Room to Spare

VoxCPM2 on RTX 4070 Ti Super: 30-Language 48kHz Voice Cloning with 8 GB to Spare

OpenAudio S1 Mini on RTX 4070 Ti Super: 13-Language Distilled TTS in ~5 GB VRAM

Kokoro TTS on RTX 4070 Ti SUPER: Universal 82M Voice Synthesis with 15 GB to Spare

Wan 2.2 TI2V-5B on RTX 4070 Ti SUPER: 720p Text/Image-to-Video in ComfyUI

LightX2V on RTX 4070 Ti SUPER: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via FP8/INT8 + Offload

Sulphur 2 on RTX 4070 Ti SUPER: Uncensored LTX-2.3 Video via ComfyUI GGUF

LTX-2.3 on RTX 4070 Ti SUPER: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma

SenseNova U1 (8B-MoT) on RTX 4070 Ti SUPER: VAE-Free Unified Image Gen + Understanding via Q4 GGUF

ERNIE-Image-Turbo on RTX 4070 Ti SUPER: 8-step text-to-image via GGUF in ComfyUI

Anima 2B on RTX 4070 Ti SUPER: Native ComfyUI Anime Text-to-Image

LongCat-Image (base T2I) on RTX 4070 Ti SUPER: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF

Flux.2 Klein 4B on RTX 4070 Ti SUPER: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image

Qwen-Image on RTX 4070 Ti SUPER: 20B Text-to-Image via ComfyUI GGUF (Ada sm_89, 16 GB)

Chroma1-Base (V48) on RTX 4070 Ti SUPER: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Scaled in ComfyUI

HiDream-O1-Image on RTX 4070 Ti SUPER: 2048×2048 Text-to-Image with FP8 in ComfyUI

Juggernaut Z on RTX 4070 Ti SUPER: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI

Z-Image Turbo on RTX 4070 Ti SUPER: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI

Llama 3.1 8B on RTX 4070 Ti SUPER: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

gpt-oss 20B on RTX 4070 Ti SUPER: MXFP4 chat at 129 tok/s via Ollama or vLLM

Qwen3-14B on RTX 4070 Ti SUPER: Q4_K_M GGUF via Ollama or llama.cpp