What You'll Build
A local install of Krea 2 — Krea AI's from-scratch aesthetic-first text-to-image foundation model (released 2026-06-23) — on a 32GB RTX 5090. Two configurations:
- Krea 2 Turbo (FP8) — the distilled, few-step variant, running 8-step text-to-image at up to 1280×720 inside native ComfyUI with no custom nodes. The 12.01 GiB FP8 transformer leaves the 5090 with ~20GB of headroom, so Turbo is effectively unconstrained here.
- Krea 2 Raw / Base (full BF16) — the undistilled, full-quality tier. Unlike a 16GB card, the RTX 5090's 32GB fits the 24.76 GiB BF16 Raw transformer resident (encoder run sequentially), so you get maximum quality without any FP8 quantization or CPU offload. See "The Raw quality tier" below.
Hardware data: RTX 5090 (32GB VRAM) · Krea 2 Turbo FP8, 8 steps at 1280×720 (Raw BF16 also resident) · See benchmark data
ℹ️ This is Krea 2, not FLUX.1-Krea-dev. Krea 2 is Krea AI's own from-scratch ~12.9B-parameter DiT released 2026-06-23 — a different model from the 2025
black-forest-labs/FLUX.1-Krea-dev(a BFL×Krea collaboration built on FLUX). Don't mix their weights, sizes, or workflows.
⚠️ Two variants — and on 32GB both fit. Krea 2 ships as Turbo (distilled, 8 steps, ComfyUI-native today — this recipe's lead) and Raw / Base (undistilled, CFG, 52 steps, 24.76 GiB BF16). On a 16GB card Raw needs FP8 casting or CPU offload; on the RTX 5090's 32GB the BF16 Raw transformer fits resident at full precision. The only day-zero catch is toolchain, not memory — see "The Raw quality tier" below. Pin the variant before you download.
ℹ️ Where the weights come from. Krea published the official weights as gated repos under its verified org —
krea/Krea-2-Rawandkrea/Krea-2-Turbo(access-restricted; license approval required). An ungated community mirror of the sameraw.safetensors+turbo.safetensors(plus referenceinference.py) is the practical download today: thekrea-community/krea-2bucket. Neither is a ComfyUI checkpoint — the ComfyUI-loadable FP8 Turbo build used in this recipe is a community conversion of the official Turbo weights. Model identity and license come from krea.ai; read the license before any commercial use (see Requirements).
Requirements
| Component | Minimum (Turbo) | Tested |
|---|---|---|
| GPU | 16GB VRAM consumer card | RTX 5090 (32GB, Blackwell GB202, sm_120) |
| RAM | 16GB system RAM (32GB+ comfortable, more for Raw) | — |
| Storage | ~18GB Turbo (12.01GB FP8 transformer + 4.88GB FP8 text encoder + 0.24GB VAE); +~30GB for the full BF16 Raw weights | — |
| Software | ComfyUI 0.25.0+, PyTorch cu128 (sm_120) | ComfyUI native Krea2 nodes |
The FP8 Turbo build is documented as runnable on "standard consumer hardware (such as 16GB and 24GB GPUs)" per the AlperKTS/Krea2_FP8 model card. The RTX 5090's 32GB clears that target with room to spare, and is the smallest consumer card on which full BF16 Raw fits resident.
Licensing — read before commercial use. Krea 2 is released under the Krea 2 Community License. Key terms: you own the Outputs you generate; commercial use is free only if your company's total annual revenue is under $1,000,000 USD (above that requires an Enterprise License); any derivative AI model name must begin with "Krea"; you must implement reasonable content-filtering; and you may not circumvent or remove the model's content-provenance or watermarking mechanisms.
Installation
1. Install / update ComfyUI to 0.25.0+
ComfyUI 0.25.0 and newer have built-in Krea 2 support — no custom nodes needed, per the AlperKTS/Krea2_FP8 model card. Update via ComfyUI Manager → "Update ComfyUI", or pull the latest and reinstall requirements:
cd ComfyUI
git pull
pip install -r requirements.txt
Blackwell (RTX 50-series) note: The RTX 5090 is sm_120 and needs a CUDA 12.8+ PyTorch build. If
torch.cuda.is_available()is False or you see "no kernel image" errors, reinstall the cu128 wheel:pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu128
2. Download the three model files (Turbo)
Place each file in the indicated ComfyUI/models/ subfolder. File-to-folder mapping and sources are from the AlperKTS/Krea2_FP8 model card:
# from your ComfyUI root
# FP8 Turbo diffusion model (12.01 GiB) → unet/
cd models/unet
wget https://huggingface.co/AlperKTS/Krea2_FP8/resolve/main/krea2_turbo_fp8.safetensors
# FP8-scaled Qwen3-VL 4B text encoder (4.88 GiB) → text_encoders/
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen3-VL/resolve/main/text_encoders/qwen3vl_4b_fp8_scaled.safetensors
# Qwen-Image VAE (242 MiB) → vae/
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors
Krea 2's text encoder is Qwen/Qwen3-VL-4B-Instruct and its VAE is the Qwen-Image autoencoder (AutoencoderKLQwenImage, f8, 16 latent channels), per the Krea-2-Base-Diffusers model card. The Comfy-Org repackaged files above are the ComfyUI-loader-compatible versions of those two components.
32GB note — you can use the full-precision encoder for Turbo. On a 16GB card the FP8-scaled encoder is mandatory to keep the encode stage light. On the 5090 you have the headroom to use the full BF16
Qwen/Qwen3-VL-4B-Instructencoder (~8 GiB) alongside the 12.01 GiB Turbo transformer if you prefer a touch more prompt fidelity — both resident is ~20GB, well inside 32GB. The FP8 encoder above is still the simplest drop-in.
3. Load the workflow
The FP8 repo ships native ComfyUI workflow JSONs. Drag workflows/Krea 2 simple workflow.json (or krea2_native_workflow.json) from the AlperKTS/Krea2_FP8 repo onto your ComfyUI canvas. The workflow wires the unet/, text_encoders/, and vae/ files into the native Krea 2 sampler graph.
Running
Edit the prompt node and click Queue Prompt. The Turbo defaults shipped in the workflow, per the AlperKTS/Krea2_FP8 model card, are:
- Steps: 8
- CFG: 1.0
- Sampler:
er_sde - Scheduler:
simple - Resolution: 1280×720
On the 5090 the 12.01 GiB Turbo transformer plus the VAE and activations leave roughly 20GB free, so you can comfortably raise the resolution or batch multiple images per queue without approaching the memory ceiling. Output PNGs land in ComfyUI/output/.
Tip — natural-language prompts. Krea 2 is prompted in natural language; long, detailed descriptions yield the best results, and words to be rendered as text in the image are wrapped in quotes (per the Krea-2-Base-Diffusers model card).
The Raw quality tier (full-quality — and it fits resident on 32GB)
Krea 2 Raw / Base is the undistilled foundation checkpoint — no step or guidance distillation, run with classifier-free guidance. Its recommended settings are 52 steps, CFG 3.5, up to 1024×1024, which trades much longer generation time for maximum diversity and malleability (it is also the checkpoint intended for LoRA training — LoRAs trained on Base apply cleanly to Turbo).
On the RTX 5090, Raw is a memory non-issue — it fits resident at full BF16. Raw ships at full precision as the official single-file raw.safetensors (26.6 GB BF16) in the krea-community/krea-2 bucket, and as the equivalent 24.76 GiB BF16 diffusers shards at CalamitousFelicitousness/Krea-2-Base-Diffusers (six diffusion_pytorch_model-0000X-of-00006.safetensors files, verified totalling 26,585,322,200 bytes). The 24.76 GiB transformer sits inside the 5090's 32GB with headroom for the VAE and activations, as long as the text encoder is run-then-freed (sequential encode) rather than held alongside it. No FP8 cast, no CPU offload, no quality compromise — the exact opposite of the 16GB story.
The one day-zero caveat is toolchain, not memory: the official Raw weights are not yet a ComfyUI single-file checkpoint (raw.safetensors is bound to Krea's reference inference.py, and the diffusers release is multi-shard), so the turnkey ComfyUI drag-and-drop path is Turbo for now. Two ways to run full Raw on the 5090 today:
- Diffusers reference pipeline (works today, full BF16). Load the BF16 Raw weights through
diffusers/ Krea's referenceinference.py(Krea2Pipeline). With 32GB you can keep the transformer resident on the GPU at full precision — you do not needenable_model_cpu_offload()the way a 16GB card would; at most run the encoder sequentially so it isn't held alongside the transformer. This is the full-quality, no-compromise path and the 5090 is the smallest consumer card where it's comfortable. - ComfyUI native Raw (once a single-file Raw checkpoint lands). When a ComfyUI-format single-file Raw checkpoint is published, drop it into
models/unet/and load it with theLoad Diffusion Modelnode at its native BF16weight_dtype— on the 5090 there's no need to set the fp8weight_dtypethat a 16GB card would require (the official ComfyUI examples document that fp8weight_dtypeis a memory-saving knob, with a small possible quality cost — unnecessary here). The fp8 launch flag--fp8_e4m3fn-unet(help text in ComfyUI'scli_args.py:Store unet weights in fp8_e4m3fn.) is likewise something you can leave off on a 5090.
Note that Winnougan/Krea-2-Base-Turbo-NVFP4-FP8-INT8 — despite its name — ships only Turbo quantizations (Krea2_Turbo_*: FP8 12.01 GiB, INT8 12.02 GiB, convrot-INT8 12.02 GiB, MXFP8 12.39 GiB, NVFP4 6.76 GiB) with no Base/Raw weight at all, so it is not a shortcut to Raw.
Results
- Speed: No community benchmark exists for Krea 2 on the RTX 5090 yet — the
/check/krea-2/rtx-5090endpoint currently returnsverdict: unknownwith no benchmark rows. Krea AI's vendor materials describe Turbo as a fast few-step model, but no vendor-published figure names this consumer card, so we omit a measured speed here rather than quote a number from different hardware. If you run it, please submit your numbers so they appear on /check/krea-2/rtx-5090. - VRAM usage: The Turbo FP8 transformer is 12.01 GiB on disk per the AlperKTS/Krea2_FP8 model card; the FP8-scaled text encoder is 4.88 GiB and the VAE 0.24 GiB (verified via the HuggingFace tree API), so the Turbo sampling-stage peak (~12 GiB + VAE + activations) leaves the 5090 ~20GB idle. The full BF16 Raw transformer is 24.76 GiB (verified totalling 26,585,322,200 bytes across six shards), which fits resident on 32GB with the encoder run sequentially. Live measurements will land at /check/krea-2/rtx-5090.
- Quality notes: Turbo is distilled for 8-step CFG-1.0 generation; for maximum fidelity and diversity use the Raw tier (52 steps, CFG 3.5) — which, on this card, runs at full BF16 with no quantization. Architecture is a single-stream DiT, 12.9B parameters, 28 blocks at width 6144, with grouped-query attention and flow-matching sampling, per the Krea-2-Base-Diffusers model card.
For the full benchmark data, see /check/krea-2/rtx-5090.
Troubleshooting
"No kernel image is available" / CUDA error on first generation (Blackwell)
The RTX 5090 is Blackwell sm_120 and needs a CUDA 12.8+ PyTorch build. A cu126 (or older) wheel will load but crash at the first kernel launch. Reinstall the cu128 wheel:
pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu128
ComfyUI's native sampling uses PyTorch SDPA, so you do not need FlashAttention for this recipe — which is helpful because prebuilt FlashAttention-2 wheels still lag for sm_120. Skip any pip install flash-attn step; it is not required for the ComfyUI Krea 2 path.
Out of memory during sampling
With 32GB, Turbo OOM is essentially a non-issue — the 12.01 GiB FP8 transformer leaves wide headroom, so this only bites at extreme resolutions or large batch sizes. If you hit it (or another app is holding VRAM), close other GPU apps and lower the resolution or batch size. For full BF16 Raw, the limiting factor is keeping the 24.76 GiB transformer resident without also holding the BF16 text encoder (~8 GiB) — run the encoder sequentially (encode, then free) so the two are not co-resident.
ComfyUI doesn't recognize the Krea 2 nodes
Native Krea 2 support requires ComfyUI 0.25.0 or newer per the AlperKTS/Krea2_FP8 model card. Update via ComfyUI Manager → "Update ComfyUI" → restart. No custom nodes are needed; if you installed a third-party "Krea" node pack, remove it to avoid conflicts.
I downloaded "NVFP4/INT8/Base" weights but they look like Turbo
The Winnougan/Krea-2-Base-Turbo-NVFP4-FP8-INT8 repo's name is misleading — every file in it is a Turbo quant (Krea2_Turbo_*: FP8, INT8, convrot-INT8, MXFP8, and NVFP4, ranging 6.76–12.39 GiB). The NVFP4 and INT8 files do exist, but they are quantizations of Turbo, not of the Base/Raw model — there is no Raw weight in the repo despite the "Base" in its name. For Turbo, prefer the documented AlperKTS/Krea2_FP8 repo (it ships the native workflow JSONs); for Raw, see "The Raw quality tier" above.