self-hosted/ai
§01·recipe · image

Krea 2 Turbo on RTX 4070 SUPER (12GB) via ComfyUI + GGUF: 8-Step Text-to-Image

imageintermediate12GB+ VRAMJun 25, 2026

This intermediate recipe sets up Krea 2 on the RTX 4070 Super, needing about 12 GB of VRAM.

models
tools
prerequisites
  • NVIDIA RTX 4070 SUPER (12GB VRAM, Ada AD104, sm_89) or any consumer GPU with 12GB+ VRAM
  • ComfyUI 0.25.0 or newer + the ComfyUI-GGUF custom node (city96)
  • Recent PyTorch with a CUDA build (cu121 or cu124) — the RTX 4070 SUPER's Ada sm_89 is stock-supported, no special wheel
  • ~10GB free disk for the GGUF diffusion model, plus the text encoder and VAE

What You'll Build

A local install of Krea 2 Turbo — the distilled, few-step variant of Krea AI's from-scratch aesthetic-first text-to-image foundation model (released 2026-06-23) — running 8-step text-to-image at up to 1280×720 on a 12GB RTX 4070 SUPER, inside ComfyUI. The 16GB-class recipes for this model lead with the community FP8 build, but that transformer is 12.01 GiB — it does not leave room on a 12GB card once the encoder, VAE, and activations are added. So this recipe leads with a community GGUF quantization loaded through the ComfyUI-GGUF node: on 12GB the Q5_K_M tier (8.87 GB) is the sweet spot, with Q4_K_M for more headroom. A clearly-labelled section below covers the full-quality Krea 2 Raw (undistilled, CFG) tier, which also has a GGUF build that fits 12GB at the lower quants.

Hardware data: RTX 4070 SUPER (12GB VRAM, Ada AD104, sm_89) · Krea 2 Turbo GGUF, 8 steps at 1280×720 · See benchmark data

ℹ️ This is Krea 2, not FLUX.1-Krea-dev. Krea 2 is Krea AI's own from-scratch ~12.9B-parameter DiT released 2026-06-23 — a different model from the 2025 black-forest-labs/FLUX.1-Krea-dev (a BFL×Krea collaboration built on FLUX). Don't mix their weights, sizes, or workflows.

⚠️ On 12GB, use GGUF — not the FP8 build. Krea 2 Turbo's community FP8 (float8_e4m3fn) build is 12.01 GiB (AlperKTS/Krea2_FP8 documents it as targeting "16GB and 24GB GPUs"). On a 12GB card that transformer alone leaves no room for the text encoder, VAE, and sampling activations — it overflows. The path that fits 12GB is a GGUF quant (this recipe's lead), loaded via the ComfyUI-GGUF node; the Q5_K_M tier is 8.87 GB. (GGUF dequantizes to the card's BF16/FP16 compute path — it is a memory-fit choice, not an FP8-acceleration one.) Pin the format before you download.

⚠️ Two variants, two fits. Krea 2 ships as Turbo (distilled, 8 steps — this recipe's lead) and Raw / Base (undistilled, CFG, 52 steps). Both have community GGUF builds; on 12GB stick to the lower GGUF tiers for either. See "The Raw quality tier" below. Pin the variant before you download.

ℹ️ Where the weights come from. Krea published the official weights as gated repos under its verified org — krea/Krea-2-Raw and krea/Krea-2-Turbo (access-restricted; license approval required). An ungated community mirror of the official turbo.safetensors is the krea-community/krea-2 bucket. The GGUF quants used here are a community conversion of the official Turbo weights, published at vantagewithai/Krea-2-Turbo-GGUF (which also ships a ready ComfyUI workflow JSON). Model identity and license come from krea.ai; read the license before any commercial use (see Requirements).

Requirements

ComponentMinimumTested
GPU12GB VRAM consumer cardRTX 4070 SUPER (12GB, Ada AD104, sm_89)
RAM16GB system RAM (32GB comfortable)
Storage~14GB (Q5_K_M GGUF 8.87GB + 4.88GB FP8 text encoder + 0.24GB VAE)
SoftwareComfyUI 0.25.0+ · ComfyUI-GGUF (city96) · PyTorch cu121/cu124 (sm_89)ComfyUI native Krea2 + UnetLoaderGGUF

The RTX 4070 SUPER is Ada (sm_89) and runs on any recent PyTorch CUDA build (cu121 or cu124) — no special wheel is required. Its 192-bit memory bus makes it more bandwidth-limited than the higher-tier Ada cards, but it runs the same GGUF workflow unchanged.

Licensing — read before commercial use. Krea 2 is released under the Krea 2 Community License. Key terms: you own the Outputs you generate; commercial use is free only if your company's total annual revenue is under $1,000,000 USD (above that requires an Enterprise License); any derivative AI model name must begin with "Krea"; you must implement reasonable content-filtering; and you may not circumvent or remove the model's content-provenance or watermarking mechanisms.

Installation

1. Install / update ComfyUI to 0.25.0+

ComfyUI 0.25.0 and newer have built-in Krea 2 model support, per the AlperKTS/Krea2_FP8 model card. Update via ComfyUI Manager → "Update ComfyUI", or pull the latest and reinstall requirements:

cd ComfyUI
git pull
pip install -r requirements.txt

Note: The RTX 4070 SUPER is Ada (sm_89) and runs on any recent PyTorch CUDA build (cu121 or cu124) — no special wheel is required. If torch.cuda.is_available() is False, reinstall a current CUDA wheel: pip install --upgrade torch torchvision.

2. Install the ComfyUI-GGUF custom node

Unlike the 16GB FP8 path, the GGUF diffusion model loads through ComfyUI-GGUF (city96) — a custom node, not built-in. Install it via ComfyUI Manager ("Install Custom Nodes" → search "GGUF"), or by hand:

cd ComfyUI/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
pip install --upgrade gguf

Restart ComfyUI; you should now have the Unet Loader (GGUF) node.

3. Download the model files

Pick one GGUF tier (see the table under "Running" — Q5_K_M is the 12GB lead), plus the text encoder and VAE. File-to-folder mapping follows the vantagewithai/Krea-2-Turbo-GGUF workflow:

# from your ComfyUI root

# GGUF diffusion model (Q5_K_M = 8.87 GB, the 12GB lead) → unet/
cd models/unet
wget https://huggingface.co/vantagewithai/Krea-2-Turbo-GGUF/resolve/main/krea2_turbo-Q5_K_M.gguf

# FP8-scaled Qwen3-VL 4B text encoder (4.88 GiB) → text_encoders/
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen3-VL/resolve/main/text_encoders/qwen3vl_4b_fp8_scaled.safetensors

# Qwen-Image VAE (242 MiB) → vae/
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors

Krea 2's text encoder is Qwen/Qwen3-VL-4B-Instruct and its VAE is the Qwen-Image autoencoder (AutoencoderKLQwenImage, f8, 16 latent channels), per the Krea-2-Base-Diffusers model card. The FP8-scaled encoder is the right choice on the RTX 4070 SUPER — its native fp8 hardware loads it at 4.88 GiB (half the BF16 encoder's ~8 GiB), keeping the encode stage light on a 12GB card.

4. Load the workflow

Drag the Vantage_Krea-2-Turbo.json workflow onto the ComfyUI canvas. Set the Unet Loader (GGUF) node to the .gguf tier you downloaded, the encoder loader to qwen3vl_4b_fp8_scaled.safetensors, and the VAE to qwen_image_vae.safetensors.

Running

Edit the prompt node and click Queue Prompt. The Turbo defaults baked into the workflow, per the AlperKTS/Krea2_FP8 model card, are:

  • Steps: 8
  • CFG: 1.0
  • Sampler: er_sde
  • Scheduler: simple
  • Resolution: 1280×720

Sequential encode is what makes 12GB work. ComfyUI runs the FP8 Qwen3-VL encoder (4.88 GiB) to encode your prompt, then frees it before the diffusion sampling stage — so the encoder and the GGUF transformer are never both resident. The sampling-stage peak is therefore near the GGUF tier you chose plus the VAE and activations, not the sum of both. Output PNGs land in ComfyUI/output/.

GGUF quant tiers (byte sizes verified via the HuggingFace tree API; pick by VRAM and quality target):

TierFile sizeNotes
Q8_013.71 GBNear-BF16 — overflows 12GB; this is the 24GB-card lead
Q6_K10.58 GBHigh quality; tight on 12GB — leaves little room for activations
Q5_K_M8.87 GBStrong quality/size balance — lead on 12GB
Q4_K_M7.49 GBLighter/faster; comfortable headroom on 12GB
Q3_K_M6.01 GBVisible degradation; only if tight
Q2_K4.89 GBLowest tier; quality drops noticeably

Tip — natural-language prompts. Krea 2 is prompted in natural language; long, detailed descriptions yield the best results, and words to be rendered as text in the image are wrapped in quotes (per the Krea-2-Base-Diffusers model card).

The Raw quality tier (full-quality, undistilled)

Krea 2 Raw / Base is the undistilled foundation checkpoint — no step or guidance distillation, run with classifier-free guidance (recommended settings: 52 steps, CFG 3.5, up to 1024×1024). It is the LoRA-training base (LoRAs trained on Base apply to Turbo). On a 12GB card, run Raw as GGUF too — community Raw GGUFs are published at vantagewithai/Krea-2-Raw-GGUF (same tier ladder as Turbo: Q5_K_M 8.87 GB, Q4_K_M 7.49 GB), loaded through the same Unet Loader (GGUF) node. Stick to the Q4/Q5 Raw tiers on 12GB, and expect substantially longer generations (52 vs 8 steps, plus CFG roughly doubling the per-step work); verify your chosen tier fits before a long run.

Results

  • Speed: No community benchmark exists for Krea 2 on the RTX 4070 SUPER yet — the /check/krea-2/rtx-4070-super endpoint currently returns verdict: unknown with no benchmark rows. GGUF inference is dequantized to the card's BF16/FP16 compute (the GGUF path does not use Ada's fp8 acceleration), so throughput is governed by the card's native diffusion path; no vendor figure names this card, so we omit a measured number rather than quote different hardware. If you run it, please submit your numbers so they appear on /check/krea-2/rtx-4070-super.
  • VRAM usage: The lead Q5_K_M GGUF transformer is 8.87 GB on disk; tiers run 4.89–13.71 GB (verified via the HuggingFace tree API). Because ComfyUI frees the 4.88 GiB FP8 encoder before sampling (see "Running"), the sampling-stage peak sits near the chosen GGUF tier plus the VAE and activations — within the RTX 4070 SUPER's 12GB at the Q4–Q5 tiers. Q6_K is tight and Q8_0 overflows 12GB. Live measurements will land at /check/krea-2/rtx-4070-super.
  • Quality notes: Q5_K_M is a strong quality/size balance; Q6_K is higher quality but tight on 12GB; below Q4 expect visible degradation. Turbo is distilled for 8-step CFG-1.0 generation; for maximum fidelity use the Raw tier (52 steps, CFG 3.5) — see above. Architecture is a single-stream DiT, 12.9B parameters, 28 blocks at width 6144, with grouped-query attention and flow-matching sampling, per the Krea-2-Base-Diffusers model card.

For the full benchmark data, see /check/krea-2/rtx-4070-super.

Troubleshooting

"Unet Loader (GGUF) node not found"

Install ComfyUI-GGUF (city96) in custom_nodes/, run pip install --upgrade gguf, and restart ComfyUI. The base ComfyUI install cannot load .gguf diffusion models without it — this is the one custom node this recipe needs.

Out of memory during sampling

On 12GB, drop to a lighter GGUF tier (Q6_K → Q5_K_M → Q4_K_M) — each step roughly follows the file-size column above. Q6_K is tight and Q8_0 will not fit; prefer Q5_K_M or Q4_K_M. Keep to the default 1280×720 resolution and close other GPU apps. Do not substitute the full-precision BF16 Qwen3-VL-4B encoder (~8 GiB) — the FP8-scaled encoder is what keeps the encode stage light on 12GB.

torch.cuda.is_available() is False

If ComfyUI falls back to CPU (extremely slow) or reports no CUDA device, your PyTorch install is missing CUDA support. The RTX 4070 SUPER (Ada sm_89) is stock-supported by any recent CUDA PyTorch build — reinstall a current wheel:

pip install --upgrade torch torchvision

ComfyUI's sampling uses PyTorch SDPA, so you do not need FlashAttention for this recipe. Skip any pip install flash-attn step; it is not required for the ComfyUI Krea 2 path.

common questions
How much VRAM does Krea 2 need?

About 12 GB — the minimum this recipe targets.

Which GPUs is Krea 2 tested on?

RTX 4070 Super (12 GB).

How hard is this setup?

Intermediate — follow the steps above.