self-hosted/ai
§01·recipe · image

Krea 2 Turbo (FP8) on RTX 5070 Ti via ComfyUI: 8-Step Text-to-Image in 16GB

imageintermediate16GB+ VRAMJun 23, 2026

This intermediate recipe sets up Krea 2 on the RTX 5070 Ti, needing about 16 GB of VRAM.

models
tools
prerequisites
  • NVIDIA RTX 5070 Ti (16GB VRAM) or any consumer GPU with 16GB+ VRAM
  • ComfyUI 0.25.0 or newer (native Krea 2 support, no custom nodes)
  • CUDA 12.8+ PyTorch build (cu128 — required for Blackwell sm_120)
  • ~18GB free disk for the FP8 diffusion model, FP8 text encoder, and VAE

What You'll Build

A local install of Krea 2 Turbo — the distilled, few-step variant of Krea AI's from-scratch aesthetic-first text-to-image foundation model (released 2026-06-23) — running 8-step text-to-image at up to 1280×720 on a 16GB RTX 5070 Ti, entirely inside native ComfyUI with no custom nodes. The lead configuration is the community FP8 (float8_e4m3fn) Turbo build, which shrinks the transformer from 24.76 GiB (BF16) to 12.01 GiB so it fits a 16GB card. A clearly-labelled section below covers the full-quality Krea 2 Raw (undistilled, CFG) tier and exactly what it takes to fit it on the same card.

Hardware data: RTX 5070 Ti (16GB VRAM) · Krea 2 Turbo FP8, 8 steps at 1280×720 · See benchmark data

ℹ️ This is Krea 2, not FLUX.1-Krea-dev. Krea 2 is Krea AI's own from-scratch ~12.9B-parameter DiT released 2026-06-23 — a different model from the 2025 black-forest-labs/FLUX.1-Krea-dev (a BFL×Krea collaboration built on FLUX). Don't mix their weights, sizes, or workflows.

⚠️ Two variants, two very different fits. Krea 2 ships as Turbo (distilled, 8 steps, fits 16GB at FP8 — this recipe's lead) and Raw / Base (undistilled, CFG, 52 steps, 24.76 GiB BF16). As of release day there is no pre-quantized sub-16GB Raw build — see "The Raw quality tier" below for the honest options. Pin the variant before you download.

ℹ️ Where the weights come from. Krea published the official weights as gated repos under its verified org — krea/Krea-2-Raw and krea/Krea-2-Turbo (access-restricted; license approval required). An ungated community mirror of the same raw.safetensors + turbo.safetensors (plus reference inference.py) is the practical download today: the krea-community/krea-2 bucket. Neither is a ComfyUI checkpoint — the ComfyUI-loadable FP8 build used in this recipe is a community conversion of the official Turbo weights. Model identity and license come from krea.ai; read the license before any commercial use (see Requirements).

Requirements

ComponentMinimumTested
GPU16GB VRAM consumer cardRTX 5070 Ti (16GB, Blackwell GB203, sm_120)
RAM16GB system RAM (32GB comfortable)
Storage~18GB (12.01GB FP8 transformer + 4.88GB FP8 text encoder + 0.24GB VAE)
SoftwareComfyUI 0.25.0+, PyTorch cu128 (sm_120)ComfyUI native Krea2 nodes

The FP8 Turbo build is documented as runnable on "standard consumer hardware (such as 16GB and 24GB GPUs)" per the AlperKTS/Krea2_FP8 model card. The RTX 5070 Ti's 16GB matches the lower bound of that target.

Licensing — read before commercial use. Krea 2 is released under the Krea 2 Community License. Key terms: you own the Outputs you generate; commercial use is free only if your company's total annual revenue is under $1,000,000 USD (above that requires an Enterprise License); any derivative AI model name must begin with "Krea"; you must implement reasonable content-filtering; and you may not circumvent or remove the model's content-provenance or watermarking mechanisms.

Installation

1. Install / update ComfyUI to 0.25.0+

ComfyUI 0.25.0 and newer have built-in Krea 2 support — no custom nodes needed, per the AlperKTS/Krea2_FP8 model card. Update via ComfyUI Manager → "Update ComfyUI", or pull the latest and reinstall requirements:

cd ComfyUI
git pull
pip install -r requirements.txt

Blackwell (RTX 50-series) note: The RTX 5070 Ti is sm_120 and needs a CUDA 12.8+ PyTorch build. If torch.cuda.is_available() is False or you see "no kernel image" errors, reinstall the cu128 wheel:

pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu128

2. Download the three model files

Place each file in the indicated ComfyUI/models/ subfolder. File-to-folder mapping and sources are from the AlperKTS/Krea2_FP8 model card:

# from your ComfyUI root

# FP8 Turbo diffusion model (12.01 GiB) → unet/
cd models/unet
wget https://huggingface.co/AlperKTS/Krea2_FP8/resolve/main/krea2_turbo_fp8.safetensors

# FP8-scaled Qwen3-VL 4B text encoder (4.88 GiB) → text_encoders/
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen3-VL/resolve/main/text_encoders/qwen3vl_4b_fp8_scaled.safetensors

# Qwen-Image VAE (242 MiB) → vae/
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors

Krea 2's text encoder is Qwen/Qwen3-VL-4B-Instruct and its VAE is the Qwen-Image autoencoder (AutoencoderKLQwenImage, f8, 16 latent channels), per the Krea-2-Base-Diffusers model card. The Comfy-Org repackaged files above are the ComfyUI-loader-compatible versions of those two components.

3. Load the workflow

The FP8 repo ships native ComfyUI workflow JSONs. Drag workflows/Krea 2 simple workflow.json (or krea2_native_workflow.json) from the AlperKTS/Krea2_FP8 repo onto your ComfyUI canvas. The workflow wires the unet/, text_encoders/, and vae/ files into the native Krea 2 sampler graph.

Running

Edit the prompt node and click Queue Prompt. The Turbo defaults shipped in the workflow, per the AlperKTS/Krea2_FP8 model card, are:

  • Steps: 8
  • CFG: 1.0
  • Sampler: er_sde
  • Scheduler: simple
  • Resolution: 1280×720

ComfyUI loads and runs the FP8 Qwen3-VL text encoder to encode your prompt, then frees it before the diffusion sampling stage — so the text encoder (4.88 GiB) and the 12.01 GiB diffusion model are not both resident at peak. This sequential encode-then-sample pattern is what keeps the sampling-stage footprint near the 12 GiB transformer plus the small VAE and activations, comfortably inside 16GB. Output PNGs land in ComfyUI/output/.

Tip — natural-language prompts. Krea 2 is prompted in natural language; long, detailed descriptions yield the best results, and words to be rendered as text in the image are wrapped in quotes (per the Krea-2-Base-Diffusers model card).

The Raw quality tier (full-quality, undistilled)

Krea 2 Raw / Base is the undistilled foundation checkpoint — no step or guidance distillation, run with classifier-free guidance. Its recommended settings are 52 steps, CFG 3.5, up to 1024×1024, which trades much longer generation time for maximum diversity and malleability (it is also the checkpoint intended for LoRA training — LoRAs trained on Base apply cleanly to Turbo).

Fitting Raw on 16GB is the hard part, and as of release day it is not turnkey. Raw ships only at full precision: the official single-file raw.safetensors (26.6 GB BF16) in the krea-community/krea-2 bucket, and the equivalent 24.76 GiB BF16 diffusers shards at CalamitousFelicitousness/Krea-2-Base-Diffusers (six diffusion_pytorch_model-0000X-of-00006.safetensors files, verified totalling 26,585,322,200 bytes). There is no pre-quantized sub-16GB Raw build yet: the bucket's raw.safetensors is full-precision and bound to Krea's reference inference.py (not a ComfyUI checkpoint), and Winnougan/Krea-2-Base-Turbo-NVFP4-FP8-INT8 — despite its name — ships only Turbo quantizations (Krea2_Turbo_*: FP8 12.01 GiB, INT8 12.02 GiB, convrot-INT8 12.02 GiB, MXFP8 12.39 GiB, NVFP4 6.76 GiB) with no Base/Raw weight at all, so its NVFP4/INT8 files are Turbo, not the Raw build the name implies.

Two cited ways to run Raw on a 16GB card today:

  1. On-the-fly FP8 cast in ComfyUI (once a single-file Raw checkpoint exists, or via the launch flag). ComfyUI's Load Diffusion Model node exposes a weight_dtype setting. As the official ComfyUI examples document, setting weight_dtype to fp8 in that node lowers memory usage by about half (with a small possible quality cost). The equivalent launch flag is --fp8_e4m3fn-unet, whose help text in ComfyUI's cli_args.py reads Store unet weights in fp8_e4m3fn. Casting the 24.76 GiB BF16 Raw transformer to fp8_e4m3fn yields roughly the same ~12 GiB resident footprint as the Turbo FP8 build. Caveat: the Load Diffusion Model node needs a ComfyUI-format single-file checkpoint. The official raw.safetensors is a single file but is bound to Krea's reference inference.py (custom layout), not the ComfyUI key layout — so this path is fully turnkey only once a ComfyUI-format Raw checkpoint is published; until then it requires converting the official weights (the same step the community already did for Turbo).

  2. Diffusers with CPU offload (slow). Run the BF16 Raw checkpoint through diffusers with enable_model_cpu_offload() / sequential offload so layers stream between system RAM and the GPU. This fits within 16GB VRAM but is substantially slower than a resident FP8 run, and it needs ample system RAM to hold the offloaded weights.

If you want maximum quality and don't need it today, the cleaner option is to wait for a native FP8 (or GGUF) single-file Raw build — at which point Raw drops onto this same RTX 5070 Ti the way Turbo does.

Results

  • Speed: No community benchmark exists for Krea 2 on the RTX 5070 Ti yet — the /check/krea-2/rtx-5070-ti endpoint currently returns verdict: unknown with no benchmark rows. Krea AI's vendor materials describe Turbo as a fast few-step model, but no vendor-published figure names this consumer card, so we omit a measured speed here rather than quote a number from different hardware. If you run it, please submit your numbers so they appear on /check/krea-2/rtx-5070-ti.
  • VRAM usage: The Turbo FP8 transformer is 12.01 GiB on disk (down from 24.76 GiB BF16) per the AlperKTS/Krea2_FP8 model card; the FP8-scaled text encoder is 4.88 GiB and the VAE 0.24 GiB (verified via the HuggingFace tree API). Because ComfyUI encodes the prompt and frees the text encoder before sampling, the sampling-stage peak sits near the 12 GiB transformer plus VAE and activations — within the RTX 5070 Ti's 16GB. Live measurements will land at /check/krea-2/rtx-5070-ti.
  • Quality notes: Turbo is distilled for 8-step CFG-1.0 generation; for maximum fidelity and diversity use the Raw tier (52 steps, CFG 3.5) — see above. Architecture is a single-stream DiT, 12.9B parameters, 28 blocks at width 6144, with grouped-query attention and flow-matching sampling, per the Krea-2-Base-Diffusers model card.

For the full benchmark data, see /check/krea-2/rtx-5070-ti.

Troubleshooting

"No kernel image is available" / CUDA error on first generation (Blackwell)

The RTX 5070 Ti is Blackwell sm_120 and needs a CUDA 12.8+ PyTorch build. A cu126 (or older) wheel will load but crash at the first kernel launch. Reinstall the cu128 wheel:

pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu128

ComfyUI's native sampling uses PyTorch SDPA, so you do not need FlashAttention for this recipe — which is helpful because prebuilt FlashAttention-2 wheels still lag for sm_120. Skip any pip install flash-attn step; it is not required for the ComfyUI Krea 2 path.

Out of memory during sampling

If the FP8 Turbo build OOMs (e.g. another app is holding VRAM, or you raised the resolution well above the 1280×720 default), close other GPU apps and keep to the default resolution. The 12.01 GiB FP8 transformer plus the small VAE leaves limited headroom on a 16GB card, so very large resolutions or batch sizes can exceed it. The FP8-scaled text encoder is the recommended encoder precisely because it keeps the encode stage light; do not substitute the full-precision BF16 Qwen3-VL-4B encoder (~8 GiB).

ComfyUI doesn't recognize the Krea 2 nodes

Native Krea 2 support requires ComfyUI 0.25.0 or newer per the AlperKTS/Krea2_FP8 model card. Update via ComfyUI Manager → "Update ComfyUI" → restart. No custom nodes are needed; if you installed a third-party "Krea" node pack, remove it to avoid conflicts.

I downloaded "NVFP4/INT8/Base" weights but they look like Turbo

The Winnougan/Krea-2-Base-Turbo-NVFP4-FP8-INT8 repo's name is misleading — every file in it is a Turbo quant (Krea2_Turbo_*: FP8, INT8, convrot-INT8, MXFP8, and NVFP4, ranging 6.76–12.39 GiB). The NVFP4 and INT8 files do exist, but they are quantizations of Turbo, not of the Base/Raw model — there is no Raw weight in the repo despite the "Base" in its name. For Turbo, prefer the documented AlperKTS/Krea2_FP8 repo (it ships the native workflow JSONs); for Raw, see "The Raw quality tier" above.

common questions
How much VRAM does Krea 2 need?

About 16 GB — the minimum this recipe targets.

Which GPUs is Krea 2 tested on?

RTX 5070 Ti (16 GB).

How hard is this setup?

Intermediate — follow the steps above.