§01·index · /recipes

Recipes

716 community-tested setups for running open-weights AI models on real consumer GPUs.page 4 of 8

ttsintermediate4GB+
OmniVoice on RTX 4070 SUPER: Zero-Shot Voice Cloning Across 646 Languages
ttsintermediate10GB+
Voxtral Mini 3B on RTX 4070 SUPER: local speech understanding in ~9.5 GB
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4070 SUPER: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
ttsbeginner8GB+
VoxCPM2 on RTX 4070 SUPER: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4070 SUPER: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner5GB+
VoxCPM-0.5B on RTX 4070 SUPER: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsbeginner2GB+
Kokoro TTS on RTX 4070 SUPER: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model
specializedintermediate3GB+
KiMoDo on RTX 4070 SUPER: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 4070 SUPER: Promptable Image and Video Segmentation
multimodalbeginner6GB+
Gemma 4 E4B on RTX 4070 SUPER: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)
multimodalintermediate4GB+
MiniMind-O on RTX 4070 SUPER: 0.1B Omni Model with Headroom to Spare
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 4070 SUPER: Image-to-Mesh 3D Generation (Shape-Only)
imagebeginner8GB+
Flux.2 Klein 4B on RTX 4070 SUPER: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB
imageintermediate10GB+
HiDream-O1-Image on RTX 4070 SUPER: 2048×2048 Text-to-Image with FP8 in ComfyUI
imagebeginner7GB+
Anima 2B on RTX 4070 SUPER: Native ComfyUI Anime Text-to-Image
llmintermediate12GB+
gpt-oss 20B on RTX 4070 SUPER: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
llmbeginner10GB+
Llama 3.1 8B on RTX 4070 SUPER: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmbeginner12GB+
Qwen3-14B on RTX 4070 SUPER: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner12GB+
Qwen3-8B on RTX 4070 SUPER: Q4_K_M GGUF via Ollama or llama.cpp
multimodalbeginner6GB+
Gemma 4 E4B on RTX 5070: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)
imageintermediate11GB+
Chroma1-Base (V48) on RTX 4070: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8 + CPU-Offload T5 in ComfyUI
imageintermediate12GB+
Qwen-Image on RTX 4070: 20B Text-to-Image via ComfyUI GGUF Q3 (Ada sm_89, 12 GB)
imageintermediate12GB+
Juggernaut Z on RTX 4070: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4070: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload
imageintermediate12GB+
ERNIE-Image-Turbo on RTX 4070: 8-step text-to-image via GGUF in ComfyUI
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 4070: local audio understanding in a tight 12 GB
ttsintermediate8GB+
ACE-Step 1.5 XL on RTX 4070: Text-to-Music Generation via the 8 GB Optimization Path
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 4070: 720p Text/Image-to-Video in ComfyUI
videointermediate8GB+
LightX2V on RTX 4070: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via Ada FP8 + Offload
ttsintermediate8GB+
Foundation-1 on RTX 4070: Structured Music Sample Generation
ttsintermediate4GB+
OmniVoice on RTX 4070: Zero-Shot Voice Cloning Across 646 Languages
ttsintermediate10GB+
Voxtral Mini 3B on RTX 4070: local speech understanding in ~9.5 GB
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 4070: Multilingual Voice Cloning in 10 Languages with FlashAttention-2
ttsbeginner8GB+
VoxCPM2 on RTX 4070: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 4070: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner5GB+
VoxCPM-0.5B on RTX 4070: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsbeginner2GB+
Kokoro TTS on RTX 4070: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model
specializedintermediate3GB+
KiMoDo on RTX 4070: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 4070: Promptable Image and Video Segmentation
multimodalbeginner6GB+
Gemma 4 E4B on RTX 4070: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)
multimodalintermediate4GB+
MiniMind-O on RTX 4070: 0.1B Omni Model with Headroom to Spare
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 4070: Image-to-Mesh 3D Generation (Shape-Only)
imagebeginner8GB+
Flux.2 Klein 4B on RTX 4070: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB
imageintermediate10GB+
HiDream-O1-Image on RTX 4070: 2048×2048 Text-to-Image with FP8 in ComfyUI
imagebeginner7GB+
Anima 2B on RTX 4070: Native ComfyUI Anime Text-to-Image
llmbeginner10GB+
Llama 3.1 8B on RTX 5070 Ti: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmbeginner10GB+
Llama 3.1 8B on RTX 5070: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmintermediate12GB+
gpt-oss 20B on RTX 4070: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
llmbeginner10GB+
Llama 3.1 8B on RTX 4070: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmbeginner12GB+
Qwen3-14B on RTX 4070: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner12GB+
Qwen3-8B on RTX 4070: Q4_K_M GGUF via Ollama or llama.cpp
imageintermediate11GB+
Chroma1-Base (V48) on RTX 5070: Uncensored 8.9B FLUX.1-Schnell De-Distillation via GGUF Q4_K_M in ComfyUI
imageintermediate12GB+
Qwen-Image on RTX 5070: 20B Text-to-Image via ComfyUI GGUF Q3 (Blackwell sm_120, 12 GB)
imageintermediate12GB+
Juggernaut Z on RTX 5070: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 5070: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload
imageintermediate12GB+
ERNIE-Image-Turbo on RTX 5070: 8-step text-to-image via GGUF in ComfyUI
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 5070: 720p Text/Image-to-Video in ComfyUI
videointermediate8GB+
LightX2V on RTX 5070: 4-Step Text-to-Video with Distilled Wan2.1-14B via Blackwell-Native FP8 + Offload
ttsintermediate8GB+
ACE-Step 1.5 XL on RTX 5070: Text-to-Music Generation via the 8 GB Optimization Path
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 5070: local audio understanding in a tight 12 GB
ttsintermediate8GB+
Foundation-1 on RTX 5070: Structured Music Sample Generation
ttsintermediate4GB+
OmniVoice on RTX 5070: Zero-Shot Voice Cloning Across 646 Languages
ttsintermediate10GB+
Voxtral Mini 3B on RTX 5070: local speech understanding in ~9.5 GB
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 5070: Multilingual Voice Cloning in 10 Languages
ttsbeginner8GB+
VoxCPM2 on RTX 5070: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 5070: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner5GB+
VoxCPM-0.5B on RTX 5070: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsbeginner2GB+
Kokoro TTS on RTX 5070: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model
specializedintermediate3GB+
KiMoDo on RTX 5070: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 5070: Promptable Image and Video Segmentation
multimodalintermediate4GB+
MiniMind-O on RTX 5070: 0.1B Omni Model with Headroom to Spare
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 5070: Image-to-Mesh 3D Generation (Shape-Only)
imagebeginner7GB+
Anima 2B on RTX 5070: Native ComfyUI Anime Text-to-Image
imagebeginner8GB+
Flux.2 Klein 4B on RTX 5070: Blackwell-Native FP8 4-Step Text-to-Image at ~8.4 GB
imageintermediate10GB+
HiDream-O1-Image on RTX 5070: 2048×2048 Text-to-Image with FP8 in ComfyUI
llmintermediate12GB+
gpt-oss 20B on RTX 5070: MXFP4 Chat in 12 GB via llama.cpp Expert Offload
llmbeginner12GB+
Qwen3-14B on RTX 5070: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner12GB+
Qwen3-8B on RTX 5070: Q4_K_M GGUF via Ollama or llama.cpp
specializedintermediate3GB+
KiMoDo on RTX 5070 Ti: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 5070 Ti: Promptable Image and Video Segmentation
multimodalbeginner6GB+
Gemma 4 E4B on RTX 5070 Ti: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit comfortably)
multimodalintermediate4GB+
MiniMind-O on RTX 5070 Ti: 0.1B Omni Model with Headroom to Spare
3dadvanced16GB+
TRELLIS image-large on RTX 5070 Ti: Image-to-3D Mesh Generation at the 16 GB Floor
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 5070 Ti: Image-to-Mesh 3D Generation (Shape-Only)
3dintermediate12GB+
Waypoint 1.5 on RTX 5070 Ti: Real-Time Interactive World Model at 720p
ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 5070 Ti: Text-to-Music Generation in ComfyUI
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 5070 Ti: local audio understanding in ~12 GB
ttsintermediate8GB+
Foundation-1 on RTX 5070 Ti: Structured Music Sample Generation
ttsintermediate10GB+
Voxtral Mini 3B on RTX 5070 Ti: local speech understanding in ~9.5 GB
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 5070 Ti: Multilingual Voice Cloning in 10 Languages
ttsintermediate4GB+
OmniVoice on RTX 5070 Ti: Zero-Shot Voice Cloning Across 646 Languages
ttsbeginner8GB+
VoxCPM2 on RTX 5070 Ti: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 5070 Ti: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner5GB+
VoxCPM-0.5B on RTX 5070 Ti: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsbeginner2GB+
Kokoro TTS on RTX 5070 Ti: 82M-Parameter Text-to-Speech, 54 Voices, 13 GB Free to Colocate a Second Model
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 5070 Ti: 720p Text/Image-to-Video in ComfyUI
videointermediate8GB+
LightX2V on RTX 5070 Ti: 4-Step Text-to-Video with Distilled Wan2.1-14B via Blackwell-Native FP8
videoadvanced16GB+
Sulphur 2 on RTX 5070 Ti: Uncensored LTX-2.3 Video via ComfyUI GGUF
videoadvanced16GB+
LTX-2.3 on RTX 5070 Ti: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma
imageintermediate12GB+
ERNIE-Image-Turbo on RTX 5070 Ti: 8-step text-to-image via GGUF in ComfyUI

Recipes

OmniVoice on RTX 4070 SUPER: Zero-Shot Voice Cloning Across 646 Languages

Voxtral Mini 3B on RTX 4070 SUPER: local speech understanding in ~9.5 GB

Qwen3-TTS 1.7B-Base on RTX 4070 SUPER: Multilingual Voice Cloning in 10 Languages with FlashAttention-2

VoxCPM2 on RTX 4070 SUPER: 30-Language 48kHz Voice Cloning in ~8 GB VRAM

OpenAudio S1 Mini on RTX 4070 SUPER: 13-Language Distilled TTS in ~5 GB VRAM

VoxCPM-0.5B on RTX 4070 SUPER: Zero-Shot Voice Cloning TTS in ~5 GB VRAM

Kokoro TTS on RTX 4070 SUPER: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model

KiMoDo on RTX 4070 SUPER: Text-to-3D-Motion Generation Guide

SAM 3 on RTX 4070 SUPER: Promptable Image and Video Segmentation

Gemma 4 E4B on RTX 4070 SUPER: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)

MiniMind-O on RTX 4070 SUPER: 0.1B Omni Model with Headroom to Spare

Hunyuan3D-2.1 on RTX 4070 SUPER: Image-to-Mesh 3D Generation (Shape-Only)

Flux.2 Klein 4B on RTX 4070 SUPER: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB

HiDream-O1-Image on RTX 4070 SUPER: 2048×2048 Text-to-Image with FP8 in ComfyUI

Anima 2B on RTX 4070 SUPER: Native ComfyUI Anime Text-to-Image

gpt-oss 20B on RTX 4070 SUPER: MXFP4 Chat in 12 GB via llama.cpp Expert Offload

Llama 3.1 8B on RTX 4070 SUPER: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

Qwen3-14B on RTX 4070 SUPER: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 4070 SUPER: Q4_K_M GGUF via Ollama or llama.cpp

Gemma 4 E4B on RTX 5070: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)

Chroma1-Base (V48) on RTX 4070: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8 + CPU-Offload T5 in ComfyUI

Qwen-Image on RTX 4070: 20B Text-to-Image via ComfyUI GGUF Q3 (Ada sm_89, 12 GB)

Juggernaut Z on RTX 4070: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI

SenseNova U1 (8B-MoT) on RTX 4070: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload

ERNIE-Image-Turbo on RTX 4070: 8-step text-to-image via GGUF in ComfyUI

MOSS-Audio 4B-Instruct on RTX 4070: local audio understanding in a tight 12 GB

ACE-Step 1.5 XL on RTX 4070: Text-to-Music Generation via the 8 GB Optimization Path

Wan 2.2 TI2V-5B on RTX 4070: 720p Text/Image-to-Video in ComfyUI

LightX2V on RTX 4070: 4-Step Text-to-Video with Distilled Wan2.1-T2V-14B via Ada FP8 + Offload

Foundation-1 on RTX 4070: Structured Music Sample Generation

OmniVoice on RTX 4070: Zero-Shot Voice Cloning Across 646 Languages

Voxtral Mini 3B on RTX 4070: local speech understanding in ~9.5 GB

Qwen3-TTS 1.7B-Base on RTX 4070: Multilingual Voice Cloning in 10 Languages with FlashAttention-2

VoxCPM2 on RTX 4070: 30-Language 48kHz Voice Cloning in ~8 GB VRAM

OpenAudio S1 Mini on RTX 4070: 13-Language Distilled TTS in ~5 GB VRAM

VoxCPM-0.5B on RTX 4070: Zero-Shot Voice Cloning TTS in ~5 GB VRAM

Kokoro TTS on RTX 4070: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model

KiMoDo on RTX 4070: Text-to-3D-Motion Generation Guide

SAM 3 on RTX 4070: Promptable Image and Video Segmentation

Gemma 4 E4B on RTX 4070: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit)

MiniMind-O on RTX 4070: 0.1B Omni Model with Headroom to Spare

Hunyuan3D-2.1 on RTX 4070: Image-to-Mesh 3D Generation (Shape-Only)

Flux.2 Klein 4B on RTX 4070: FP8 ComfyUI 4-Step Text-to-Image at ~8.4 GB

HiDream-O1-Image on RTX 4070: 2048×2048 Text-to-Image with FP8 in ComfyUI

Anima 2B on RTX 4070: Native ComfyUI Anime Text-to-Image

Llama 3.1 8B on RTX 5070 Ti: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

Llama 3.1 8B on RTX 5070: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

gpt-oss 20B on RTX 4070: MXFP4 Chat in 12 GB via llama.cpp Expert Offload

Llama 3.1 8B on RTX 4070: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

Qwen3-14B on RTX 4070: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 4070: Q4_K_M GGUF via Ollama or llama.cpp

Chroma1-Base (V48) on RTX 5070: Uncensored 8.9B FLUX.1-Schnell De-Distillation via GGUF Q4_K_M in ComfyUI

Qwen-Image on RTX 5070: 20B Text-to-Image via ComfyUI GGUF Q3 (Blackwell sm_120, 12 GB)

Juggernaut Z on RTX 5070: Cinematic Photoreal Fine-Tune of Z-Image Base via FP8 in ComfyUI

SenseNova U1 (8B-MoT) on RTX 5070: VAE-Free Unified Image Gen + Understanding via Q4 GGUF + Layer Offload

ERNIE-Image-Turbo on RTX 5070: 8-step text-to-image via GGUF in ComfyUI

Wan 2.2 TI2V-5B on RTX 5070: 720p Text/Image-to-Video in ComfyUI

LightX2V on RTX 5070: 4-Step Text-to-Video with Distilled Wan2.1-14B via Blackwell-Native FP8 + Offload

ACE-Step 1.5 XL on RTX 5070: Text-to-Music Generation via the 8 GB Optimization Path

MOSS-Audio 4B-Instruct on RTX 5070: local audio understanding in a tight 12 GB

Foundation-1 on RTX 5070: Structured Music Sample Generation

OmniVoice on RTX 5070: Zero-Shot Voice Cloning Across 646 Languages

Voxtral Mini 3B on RTX 5070: local speech understanding in ~9.5 GB

Qwen3-TTS 1.7B-Base on RTX 5070: Multilingual Voice Cloning in 10 Languages

VoxCPM2 on RTX 5070: 30-Language 48kHz Voice Cloning in ~8 GB VRAM

OpenAudio S1 Mini on RTX 5070: 13-Language Distilled TTS in ~5 GB VRAM

VoxCPM-0.5B on RTX 5070: Zero-Shot Voice Cloning TTS in ~5 GB VRAM

Kokoro TTS on RTX 5070: 82M-Parameter Text-to-Speech, 54 Voices, ~10 GB Free to Colocate a Second Model

KiMoDo on RTX 5070: Text-to-3D-Motion Generation Guide

SAM 3 on RTX 5070: Promptable Image and Video Segmentation

MiniMind-O on RTX 5070: 0.1B Omni Model with Headroom to Spare

Hunyuan3D-2.1 on RTX 5070: Image-to-Mesh 3D Generation (Shape-Only)

Anima 2B on RTX 5070: Native ComfyUI Anime Text-to-Image

Flux.2 Klein 4B on RTX 5070: Blackwell-Native FP8 4-Step Text-to-Image at ~8.4 GB

HiDream-O1-Image on RTX 5070: 2048×2048 Text-to-Image with FP8 in ComfyUI

gpt-oss 20B on RTX 5070: MXFP4 Chat in 12 GB via llama.cpp Expert Offload

Qwen3-14B on RTX 5070: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 5070: Q4_K_M GGUF via Ollama or llama.cpp

KiMoDo on RTX 5070 Ti: Text-to-3D-Motion Generation Guide