§01·index · /recipes

Recipes

716 community-tested setups for running open-weights AI models on real consumer GPUs.page 6 of 8

imageintermediate12GB+
ERNIE-Image-Turbo on RTX 4080: 8-step text-to-image via GGUF in ComfyUI
imagebeginner7GB+
Anima 2B on RTX 4080: Native ComfyUI Anime Text-to-Image
imagebeginner13GB+
Flux.2 Klein 4B on RTX 4080: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 4080: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
imageintermediate16GB+
LongCat-Image (base T2I) on RTX 4080: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF
imageintermediate13GB+
Qwen-Image on RTX 4080: 20B Text-to-Image via ComfyUI GGUF (Ada sm_89, 16 GB)
imageintermediate16GB+
Chroma1-Base (V48) on RTX 4080: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Scaled in ComfyUI
imageintermediate10GB+
HiDream-O1-Image on RTX 4080: 2048×2048 Text-to-Image with FP8 in ComfyUI
imageintermediate16GB+
Juggernaut Z on RTX 4080: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI
imageintermediate16GB+
Z-Image Turbo on RTX 4080: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI
llmbeginner10GB+
Llama 3.1 8B on RTX 4080: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
llmbeginner16GB+
gpt-oss 20B on RTX 4080: MXFP4 chat at 136 tok/s via Ollama or vLLM
llmbeginner16GB+
Qwen3-14B on RTX 4080: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner6GB+
Qwen3-8B on RTX 4080: Q4_K_M GGUF via Ollama or llama.cpp
specializedintermediate3GB+
KiMoDo on RTX 5080: Text-to-3D-Motion Generation Guide
specializedbeginner4GB+
SAM 3 on RTX 5080: Promptable Image and Video Segmentation
llmbeginner10GB+
Llama 3.1 8B on RTX 5080: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
multimodalbeginner6GB+
Gemma 4 E4B on RTX 5080: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit comfortably)
multimodalintermediate4GB+
MiniMind-O on RTX 5080: 0.1B Omni Model with Headroom to Spare
3dadvanced16GB+
TRELLIS image-large on RTX 5080: Image-to-3D Mesh Generation at the 16 GB Floor
3dintermediate10GB+
Hunyuan3D-2.1 on RTX 5080: Image-to-Mesh 3D Generation (Shape-Only)
3dintermediate12GB+
Waypoint 1.5 on RTX 5080: Real-Time Interactive World Model at 720p
ttsintermediate12GB+
ACE-Step 1.5 XL on RTX 5080: Text-to-Music Generation in ComfyUI
ttsintermediate12GB+
MOSS-Audio 4B-Instruct on RTX 5080: local audio understanding in ~12 GB
ttsintermediate8GB+
Foundation-1 on RTX 5080: Structured Music Sample Generation
ttsintermediate10GB+
Voxtral Mini 3B on RTX 5080: local speech understanding in ~9.5 GB
ttsintermediate8GB+
Qwen3-TTS 1.7B-Base on RTX 5080: Multilingual Voice Cloning in 10 Languages
ttsintermediate4GB+
OmniVoice on RTX 5080: Zero-Shot Voice Cloning Across 646 Languages
ttsbeginner8GB+
VoxCPM2 on RTX 5080: 30-Language 48kHz Voice Cloning in ~8 GB VRAM
ttsintermediate5GB+
OpenAudio S1 Mini on RTX 5080: 13-Language Distilled TTS in ~5 GB VRAM
ttsbeginner5GB+
VoxCPM-0.5B on RTX 5080: Zero-Shot Voice Cloning TTS in ~5 GB VRAM
ttsbeginner2GB+
Kokoro TTS on RTX 5080: 82M-Parameter Text-to-Speech, 54 Voices, 13 GB Free to Colocate a Second Model
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 5080: 720p Text/Image-to-Video in ComfyUI
videointermediate8GB+
LightX2V on RTX 5080: 4-Step Text-to-Video with Distilled Wan2.1-14B via Blackwell-Native FP8
videoadvanced16GB+
Sulphur 2 on RTX 5080: Uncensored LTX-2.3 Video via ComfyUI GGUF
videoadvanced16GB+
LTX-2.3 on RTX 5080: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma
imageintermediate16GB+
LongCat-Image (base T2I) on RTX 5080: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF
imageintermediate12GB+
SenseNova U1 (8B-MoT) on RTX 5080: VAE-Free Unified Image Gen + Understanding via Q4 GGUF
imageintermediate13GB+
Qwen-Image on RTX 5080: 20B Text-to-Image via ComfyUI GGUF (Blackwell sm_120, 16 GB)
imageintermediate12GB+
ERNIE-Image-Turbo on RTX 5080: 8-step text-to-image via GGUF in ComfyUI
imageintermediate14GB+
Chroma1-Base (V48) on RTX 5080: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Blackwell-Native FP8 in ComfyUI
imageintermediate10GB+
HiDream-O1-Image on RTX 5080: 2048×2048 Text-to-Image with FP8 in ComfyUI
imagebeginner13GB+
Flux.2 Klein 4B on RTX 5080: Blackwell-Native FP8 4-Step Text-to-Image at ~8.4 GB
imageintermediate16GB+
Juggernaut Z on RTX 5080: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16
imagebeginner7GB+
Anima 2B on RTX 5080: Native ComfyUI Anime Text-to-Image
imageintermediate16GB+
Z-Image Turbo on RTX 5080: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI
llmbeginner16GB+
gpt-oss 20B on RTX 5080: MXFP4 Chat at 172 tok/s via Ollama or vLLM
llmbeginner16GB+
Qwen3-14B on RTX 5080: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner6GB+
Qwen3-8B on RTX 5080: Q4_K_M GGUF via Ollama or llama.cpp
imageintermediate20GB+
HiDream-O1-Image Full BF16 on RTX 3090 Ti: 2048×2048 Text-to-Image in ComfyUI
imageintermediate18GB+
LongCat-Image (base T2I) on RTX 3090 Ti: Bilingual 6B Text-to-Image via diffusers BF16 + CPU Offload
videointermediate14GB+
LightX2V on RTX 3090 Ti: 4-Step Text-to-Video with Distilled Wan2.1-14B via INT8 / BF16 Offload
videointermediate14GB+
CogVideoX 1.5 5B on RTX 3090 Ti: 1360x768 Text-to-Video with Diffusers
videointermediate24GB+
Wan 2.2 TI2V-5B on RTX 3090 Ti: 720p Text/Image-to-Video in ComfyUI
videointermediate24GB+
Wan 2.2 T2V-A14B on RTX 3090 Ti: 720p text-to-video in ComfyUI with FP8 weights (Ampere)
videointermediate22GB+
Mochi 1 on RTX 3090 Ti: 85-frame 480p Text-to-Video with Diffusers
videointermediate24GB+
HunyuanVideo-1.5 on RTX 3090 Ti: 480p Step-Distilled Image-to-Video on the Same Razor-Thin 24 GB Envelope
imagebeginner13GB+
Flux.2 Klein 4B on RTX 3090 Ti: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image
3dintermediate12GB+
Waypoint 1.5 on RTX 3090 Ti: Real-Time Interactive World Model at 720p, ~32 FPS
multimodalbeginner20GB+
Gemma 4 E4B on RTX 3090 Ti: Multimodal Inference via BF16 (with 16 GB of Headroom to Spend)
llmintermediate18GB+
Gemma 4 26B A4B-it on RTX 3090 Ti: Local Multimodal Chat via Q4_K_M GGUF + llama.cpp
llmbeginner12GB+
DeepSeek-R1-Distill-Qwen-14B on RTX 3090 Ti via Ollama Q4_K_M GGUF
llmbeginner10GB+
Llama 3.1 8B on RTX 3090 Ti: Local Chat via llama.cpp + Unsloth UD-Q4_K_XL GGUF
imageintermediate24GB+
Chroma1-Base (V48) on RTX 3090 Ti: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Diffusers BF16
imageintermediate13GB+
Qwen-Image on RTX 3090 Ti: 20B Text-to-Image via ComfyUI GGUF (Ampere Path — No FP8)
imageintermediate16GB+
Juggernaut Z on RTX 3090 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI
llmintermediate22GB+
Qwen3-32B on RTX 3090 Ti: UD-Q4_K_XL GGUF via llama.cpp
llmbeginner10GB+
Qwen3-14B on RTX 3090 Ti: Q4_K_M GGUF via Ollama or llama.cpp
llmbeginner16GB+
gpt-oss 20B on RTX 3090 Ti: MXFP4 Chat at 160 tok/s via Ollama or vLLM
imagebeginner16GB+
Z-Image Turbo on RTX 3090 Ti: 8-Step 1024x1024 Text-to-Image at BF16 in ~6.7s with Diffusers or ComfyUI
llmbeginner6GB+
Qwen3-8B on RTX 3090 Ti: Q4_K_M GGUF with 18 GB of Headroom for Colocation or Long Context
llmbeginner10GB+
Llama 3.1 8B on RTX 5060 Ti: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF
3dadvanced16GB+
TRELLIS image-large on RTX 5060 Ti: Image-to-3D Mesh Generation at the 16 GB Floor
videoadvanced31GB+
Sulphur 2 on RTX 5090: Uncensored LTX-2.3 Video at fp8mixed, Native
llmbeginner10GB+
Llama 3.1 8B on RTX 5090: Local Chat via llama.cpp + Unsloth UD-Q4_K_XL GGUF
3dadvanced24GB+
TRELLIS.2-4B on RTX 5090: First Consumer Card That Hits the 24 GB Floor for Image-to-3D
3dintermediate29GB+
Hunyuan3D-2.1 on RTX 5090: Image-to-Mesh + PBR Texture in One Pass
3dadvanced16GB+
TRELLIS image-large on RTX 5090: Image-to-3D Mesh Generation with the Blackwell Build Path
3dintermediate12GB+
Waypoint 1.5 on RTX 5090: Real-Time Interactive World Model at 720p, 72 FPS
imageintermediate24GB+
ERNIE-Image-Turbo on RTX 5090: 8-Step Text-to-Image at BF16 in ComfyUI
videointermediate28GB+
LTX-2.3 on RTX 5090: First Card That Hits the 32 GB Floor
ttsbeginner3GB+
Kokoro TTS on RTX 5090: 82M-Parameter Text-to-Speech, 54 Voices, 30 GB Free for a Multi-Model Server
multimodalbeginner20GB+
Gemma 4 E4B on RTX 5090: Multimodal Inference via BF16 (with 24 GB of Headroom to Spend)
imageintermediate21GB+
Qwen-Image on RTX 5090: 20B Text-to-Image via ComfyUI FP8 (Blackwell Native Path)
imagebeginner9GB+
Flux.2 Klein 4B on RTX 5090: FP8 1.2-Second Generation, Blackwell-Native Speed Win
imageintermediate18GB+
LongCat-Image (base T2I) on RTX 5090: Bilingual 6B Text-to-Image via diffusers BF16 with 14 GB Headroom
imageintermediate24GB+
Chroma1-Base (V48) on RTX 5090: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Diffusers BF16
imageintermediate17GB+
HiDream-O1-Image on RTX 5090: 2048×2048 Text-to-Image with MXFP8 Blackwell-Native Acceleration in ComfyUI
llmintermediate29GB+
Qwen3-32B on RTX 5090: Q6_K_XL GGUF via llama.cpp (with AWQ-INT4 + 128K context alternative)
llmintermediate29GB+
Gemma 4 26B A4B-it on RTX 5090: Q8_0 Quality Tier via ggml-org GGUF + llama.cpp
llmbeginner10GB+
DeepSeek-R1-Distill-Qwen-14B on RTX 5090: 128K Reasoning Context Unlocked
llmintermediate17GB+
Qwen3-14B on RTX 5090: FP8 via vLLM with Native Blackwell Acceleration
llmbeginner6GB+
Qwen3-8B on RTX 5090: Q4_K_M GGUF with 26 GB of Headroom for Colocation, BF16, or Full 131K Context
llmbeginner16GB+
gpt-oss 20B on RTX 5090: MXFP4 Chat at 298 tok/s via Ollama or vLLM
videointermediate14GB+
HunyuanVideo-1.5 on RTX 5090: 480p Step-Distilled I2V with an 8 GB Headroom Unlock
videointermediate22GB+
LightX2V on RTX 5090: 4-Step Text-to-Video with Distilled Wan2.1-14B (Blackwell-Native FP8 + Future NVFP4 Path)
videointermediate24GB+
Wan 2.2 T2V-A14B on RTX 5090: 720p text-to-video in ComfyUI with FP8 scaled weights (Blackwell)
videointermediate20GB+
CogVideoX 1.5 5B on RTX 5090: 1360x768 Text-to-Video, No-Sequential-Offload Path
videointermediate22GB+
Mochi 1 on RTX 5090: 85-frame 480p Text-to-Video with Diffusers
videointermediate8GB+
Wan 2.2 TI2V-5B on RTX 5090: 720p Text/Image-to-Video in ComfyUI

Recipes

ERNIE-Image-Turbo on RTX 4080: 8-step text-to-image via GGUF in ComfyUI

Anima 2B on RTX 4080: Native ComfyUI Anime Text-to-Image

Flux.2 Klein 4B on RTX 4080: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image

SenseNova U1 (8B-MoT) on RTX 4080: VAE-Free Unified Image Gen + Understanding via Q4 GGUF

LongCat-Image (base T2I) on RTX 4080: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF

Qwen-Image on RTX 4080: 20B Text-to-Image via ComfyUI GGUF (Ada sm_89, 16 GB)

Chroma1-Base (V48) on RTX 4080: Uncensored 8.9B FLUX.1-Schnell De-Distillation via FP8-Scaled in ComfyUI

HiDream-O1-Image on RTX 4080: 2048×2048 Text-to-Image with FP8 in ComfyUI

Juggernaut Z on RTX 4080: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI

Z-Image Turbo on RTX 4080: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI

Llama 3.1 8B on RTX 4080: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

gpt-oss 20B on RTX 4080: MXFP4 chat at 136 tok/s via Ollama or vLLM

Qwen3-14B on RTX 4080: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 4080: Q4_K_M GGUF via Ollama or llama.cpp

KiMoDo on RTX 5080: Text-to-3D-Motion Generation Guide

SAM 3 on RTX 5080: Promptable Image and Video Segmentation

Llama 3.1 8B on RTX 5080: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

Gemma 4 E4B on RTX 5080: Multimodal Inference via Q4_K_M GGUF (llama.cpp or Ollama — BF16 will not fit comfortably)

MiniMind-O on RTX 5080: 0.1B Omni Model with Headroom to Spare

TRELLIS image-large on RTX 5080: Image-to-3D Mesh Generation at the 16 GB Floor

Hunyuan3D-2.1 on RTX 5080: Image-to-Mesh 3D Generation (Shape-Only)

Waypoint 1.5 on RTX 5080: Real-Time Interactive World Model at 720p

ACE-Step 1.5 XL on RTX 5080: Text-to-Music Generation in ComfyUI

MOSS-Audio 4B-Instruct on RTX 5080: local audio understanding in ~12 GB

Foundation-1 on RTX 5080: Structured Music Sample Generation

Voxtral Mini 3B on RTX 5080: local speech understanding in ~9.5 GB

Qwen3-TTS 1.7B-Base on RTX 5080: Multilingual Voice Cloning in 10 Languages

OmniVoice on RTX 5080: Zero-Shot Voice Cloning Across 646 Languages

VoxCPM2 on RTX 5080: 30-Language 48kHz Voice Cloning in ~8 GB VRAM

OpenAudio S1 Mini on RTX 5080: 13-Language Distilled TTS in ~5 GB VRAM

VoxCPM-0.5B on RTX 5080: Zero-Shot Voice Cloning TTS in ~5 GB VRAM

Kokoro TTS on RTX 5080: 82M-Parameter Text-to-Speech, 54 Voices, 13 GB Free to Colocate a Second Model

Wan 2.2 TI2V-5B on RTX 5080: 720p Text/Image-to-Video in ComfyUI

LightX2V on RTX 5080: 4-Step Text-to-Video with Distilled Wan2.1-14B via Blackwell-Native FP8

Sulphur 2 on RTX 5080: Uncensored LTX-2.3 Video via ComfyUI GGUF

LTX-2.3 on RTX 5080: 22B Audio-Video at the 16 GB Floor via GGUF + CPU-Offloaded Gemma

LongCat-Image (base T2I) on RTX 5080: Bilingual 6B Text-to-Image at 16 GB via ComfyUI GGUF

SenseNova U1 (8B-MoT) on RTX 5080: VAE-Free Unified Image Gen + Understanding via Q4 GGUF

Qwen-Image on RTX 5080: 20B Text-to-Image via ComfyUI GGUF (Blackwell sm_120, 16 GB)

ERNIE-Image-Turbo on RTX 5080: 8-step text-to-image via GGUF in ComfyUI

Chroma1-Base (V48) on RTX 5080: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Blackwell-Native FP8 in ComfyUI

HiDream-O1-Image on RTX 5080: 2048×2048 Text-to-Image with FP8 in ComfyUI

Flux.2 Klein 4B on RTX 5080: Blackwell-Native FP8 4-Step Text-to-Image at ~8.4 GB

Juggernaut Z on RTX 5080: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16

Anima 2B on RTX 5080: Native ComfyUI Anime Text-to-Image

Z-Image Turbo on RTX 5080: 8-Step 1024x1024 Text-to-Image at BF16 with Diffusers or ComfyUI

gpt-oss 20B on RTX 5080: MXFP4 Chat at 172 tok/s via Ollama or vLLM

Qwen3-14B on RTX 5080: Q4_K_M GGUF via Ollama or llama.cpp

Qwen3-8B on RTX 5080: Q4_K_M GGUF via Ollama or llama.cpp

HiDream-O1-Image Full BF16 on RTX 3090 Ti: 2048×2048 Text-to-Image in ComfyUI

LongCat-Image (base T2I) on RTX 3090 Ti: Bilingual 6B Text-to-Image via diffusers BF16 + CPU Offload

LightX2V on RTX 3090 Ti: 4-Step Text-to-Video with Distilled Wan2.1-14B via INT8 / BF16 Offload

CogVideoX 1.5 5B on RTX 3090 Ti: 1360x768 Text-to-Video with Diffusers

Wan 2.2 TI2V-5B on RTX 3090 Ti: 720p Text/Image-to-Video in ComfyUI

Wan 2.2 T2V-A14B on RTX 3090 Ti: 720p text-to-video in ComfyUI with FP8 weights (Ampere)

Mochi 1 on RTX 3090 Ti: 85-frame 480p Text-to-Video with Diffusers

HunyuanVideo-1.5 on RTX 3090 Ti: 480p Step-Distilled Image-to-Video on the Same Razor-Thin 24 GB Envelope

Flux.2 Klein 4B on RTX 3090 Ti: BFL-Recommended ~13 GB CPU-Offload Path for 4-Step Text-to-Image

Waypoint 1.5 on RTX 3090 Ti: Real-Time Interactive World Model at 720p, ~32 FPS

Gemma 4 E4B on RTX 3090 Ti: Multimodal Inference via BF16 (with 16 GB of Headroom to Spend)

Gemma 4 26B A4B-it on RTX 3090 Ti: Local Multimodal Chat via Q4_K_M GGUF + llama.cpp

DeepSeek-R1-Distill-Qwen-14B on RTX 3090 Ti via Ollama Q4_K_M GGUF

Llama 3.1 8B on RTX 3090 Ti: Local Chat via llama.cpp + Unsloth UD-Q4_K_XL GGUF

Chroma1-Base (V48) on RTX 3090 Ti: Uncensored 8.9B FLUX.1-Schnell De-Distillation via Diffusers BF16

Qwen-Image on RTX 3090 Ti: 20B Text-to-Image via ComfyUI GGUF (Ampere Path — No FP8)

Juggernaut Z on RTX 3090 Ti: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI

Qwen3-32B on RTX 3090 Ti: UD-Q4_K_XL GGUF via llama.cpp

Qwen3-14B on RTX 3090 Ti: Q4_K_M GGUF via Ollama or llama.cpp

gpt-oss 20B on RTX 3090 Ti: MXFP4 Chat at 160 tok/s via Ollama or vLLM

Z-Image Turbo on RTX 3090 Ti: 8-Step 1024x1024 Text-to-Image at BF16 in ~6.7s with Diffusers or ComfyUI

Qwen3-8B on RTX 3090 Ti: Q4_K_M GGUF with 18 GB of Headroom for Colocation or Long Context

Llama 3.1 8B on RTX 5060 Ti: Local Chat via Ollama or llama.cpp + Unsloth UD-Q4_K_XL GGUF

TRELLIS image-large on RTX 5060 Ti: Image-to-3D Mesh Generation at the 16 GB Floor

Sulphur 2 on RTX 5090: Uncensored LTX-2.3 Video at fp8mixed, Native

Llama 3.1 8B on RTX 5090: Local Chat via llama.cpp + Unsloth UD-Q4_K_XL GGUF

TRELLIS.2-4B on RTX 5090: First Consumer Card That Hits the 24 GB Floor for Image-to-3D

Hunyuan3D-2.1 on RTX 5090: Image-to-Mesh + PBR Texture in One Pass

TRELLIS image-large on RTX 5090: Image-to-3D Mesh Generation with the Blackwell Build Path

Waypoint 1.5 on RTX 5090: Real-Time Interactive World Model at 720p, 72 FPS