How much VRAM does Krea 2 need?

About 24 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

Krea 2 Turbo on Apple M2 Max: 8-Step Text-to-Image in Unified Memory via ComfyUI (MPS)

What You'll Build

A local install of Krea 2 Turbo — the distilled, few-step variant of Krea AI's from-scratch aesthetic-first text-to-image foundation model (released 2026-06-23) — generating 8-step text-to-image at up to 1280×720 on an Apple M2 Max (64 GB unified memory), inside ComfyUI on the Metal/MPS backend. ComfyUI 0.25.0+ has built-in Krea 2 support and runs on Apple Silicon, so the model is fully usable on a Mac; this recipe leads the community GGUF build (via the ComfyUI-GGUF node), which sidesteps the precision pitfalls of MPS and is light on memory. The M2 Max's large unified memory means fit is a non-issue here — the constraint that bound the discrete GPUs (where Krea 2's BF16 transformer overflows 24 GB cards) simply doesn't apply.

Hardware data: Apple M2 Max (64 GB unified memory) · Krea 2 Turbo GGUF in ComfyUI/MPS, 8 steps at 1280×720 · See benchmark data

ℹ️ This is Krea 2, not FLUX.1-Krea-dev. Krea 2 is Krea AI's own from-scratch ~12.9B-parameter DiT released 2026-06-23 — a different model from the 2025 black-forest-labs/FLUX.1-Krea-dev (a BFL×Krea collaboration built on FLUX). Don't mix their weights, sizes, or workflows.

ℹ️ Unified memory is not VRAM. The M2 Max has 64 GB of unified memory shared by CPU and GPU — not 64 GB of dedicated VRAM. macOS lets the GPU address only a fraction by default — about 75% on a ≥64 GB Mac via Metal's recommendedMaxWorkingSetSize, so plan against ~48 GB addressable. That is far more than this recipe needs: the lead GGUF Turbo build is ~13.7 GB, and even the full BF16 Raw transformer (24.76 GiB) fits comfortably — so the M2 Max is the roomiest Krea 2 target in this catalog.

⚠️ On a Mac, the path is ComfyUI on Metal (MPS) — the MLX tools don't support Krea 2 yet. The most-maintained Apple-native image path, mflux, does not list Krea 2 among its supported models, and Draw Things has no confirmed Krea 2 build. ComfyUI on Apple Silicon (its PyTorch MPS backend) is the working local route. It is slower than a CUDA box (no torch.compile, some ops fall back to CPU), and bf16 weights frequently break on MPS — which is exactly why this recipe leads the GGUF build (it dequantizes cleanly) rather than the raw BF16/FP8 safetensors. Set PYTORCH_ENABLE_MPS_FALLBACK=1 before launching.

ℹ️ Where the weights come from. Krea published the official weights as gated repos under its verified org (krea/Krea-2-Raw, krea/Krea-2-Turbo). An ungated community mirror of the official turbo.safetensors is the krea-community/krea-2 bucket. The GGUF quants used here are a community conversion of the official Turbo weights, at vantagewithai/Krea-2-Turbo-GGUF. Model identity and license come from krea.ai; read the license before any commercial use (see Requirements).

Requirements

Component	Minimum	Tested
GPU / memory	24 GB unified memory (~16 GB addressable — fits GGUF Turbo)	Apple M2 Max (64 GB unified, ~48 GB addressable)
OS	macOS Sonoma 14 / Sequoia 15+	macOS Sequoia 15
Storage	~14 GB (Q8_0 GGUF 13.71 GB + ~8 GB BF16 encoder + 0.24 GB VAE)	—
Software	ComfyUI 0.25.0+ · ComfyUI-GGUF (city96) · PyTorch nightly w/ MPS	ComfyUI native Krea2 + UnetLoaderGGUF on MPS

The binding constraint on Apple Silicon is addressable unified memory, not raw capacity — but for Krea 2 that constraint is slack on a 64 GB Mac (~48 GB addressable vs a ~13.7 GB GGUF build). ComfyUI 0.25.0+ has built-in Krea 2 support, confirmed by ComfyUI's Krea 2 announcement, and runs on Apple Silicon via PyTorch's MPS backend (Apple "Accelerated PyTorch on Mac").

Licensing — read before commercial use. Krea 2 is released under the Krea 2 Community License. Key terms: you own the Outputs you generate; commercial use is free only if your company's total annual revenue is under $1,000,000 USD (above that requires an Enterprise License); any derivative AI model name must begin with "Krea"; you must implement reasonable content-filtering; and you may not circumvent or remove the model's content-provenance or watermarking mechanisms.

Installation

1. Install ComfyUI with the MPS (Metal) PyTorch backend

On Apple Silicon, ComfyUI runs on PyTorch's MPS backend (Metal) — there is nothing CUDA-shaped to install (no cu12x wheel index, no FlashAttention, no bitsandbytes). Install a PyTorch nightly with MPS per Apple's "Accelerated PyTorch on Mac" guide, then ComfyUI:

# inside a Python 3.10+ venv
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
git clone https://github.com/Comfy-Org/ComfyUI && cd ComfyUI
pip install -r requirements.txt
export PYTORCH_ENABLE_MPS_FALLBACK=1   # let unimplemented ops fall back to CPU
python main.py

ComfyUI auto-detects the Apple GPU through MPS. The PYTORCH_ENABLE_MPS_FALLBACK=1 flag is important — a few diffusion ops still lack a Metal kernel and would otherwise error.

2. Install the ComfyUI-GGUF custom node

The GGUF diffusion model loads through ComfyUI-GGUF (city96). Install it via ComfyUI Manager ("Install Custom Nodes" → search "GGUF"), or by hand:

cd ComfyUI/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
pip install --upgrade gguf

Restart ComfyUI; you should now have the Unet Loader (GGUF) node.

3. Download the model files

Pick one GGUF tier (Q8_0 is the lead — it fits easily here), plus the text encoder and VAE. File-to-folder mapping follows the vantagewithai/Krea-2-Turbo-GGUF workflow:

# from your ComfyUI root

# GGUF diffusion model (Q8_0 = 13.71 GB, near-BF16 quality) → unet/
cd models/unet
wget https://huggingface.co/vantagewithai/Krea-2-Turbo-GGUF/resolve/main/krea2_turbo-Q8_0.gguf

# Qwen3-VL 4B text encoder, BF16 (8.26 GiB) → text_encoders/
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen3-VL/resolve/main/text_encoders/qwen3vl_4b_bf16.safetensors

# Qwen-Image VAE (242 MiB) → vae/
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors

Krea 2's text encoder is Qwen/Qwen3-VL-4B-Instruct and its VAE is the Qwen-Image autoencoder (AutoencoderKLQwenImage, f8, 16 latent channels), per the Krea-2-Base-Diffusers model card. The VAE file is precision-independent. ComfyUI runs the encoder and VAE in fp16 on MPS; if you hit a bf16-related MPS error on the encoder, that is the expected MPS limitation — see Troubleshooting.

4. Load the workflow

Drag the Vantage_Krea-2-Turbo.json workflow onto the ComfyUI canvas. Set the Unet Loader (GGUF) node to the .gguf tier you downloaded, the encoder loader to qwen3vl_4b_bf16.safetensors, and the VAE to qwen_image_vae.safetensors.

Running

Edit the prompt node and click Queue Prompt. The Turbo defaults baked into the workflow are:

Steps: 8
CFG: 1.0
Sampler: er_sde
Scheduler: simple
Resolution: 1280×720

ComfyUI runs the Qwen3-VL encoder to encode your prompt, then frees it before the diffusion sampling stage — so the encoder and the GGUF transformer are not both resident at peak. On the M2 Max's ~48 GB addressable pool there is wide headroom at every tier. Output PNGs land in ComfyUI/output/.

GGUF quant tiers (byte sizes verified via the HuggingFace tree API; on a 64 GB Mac you can run any of them — Q8_0 is the quality lead):

Tier	File size	Notes
Q8_0	13.71 GB	Near-BF16 quality — lead
Q6_K	10.58 GB	Excellent quality, lighter
Q5_K_M	8.87 GB	Strong quality/size balance
Q4_K_M	7.49 GB	Lighter/faster
Q3_K_M	6.01 GB	Visible degradation
Q2_K	4.89 GB	Lowest tier

Tip — natural-language prompts. Krea 2 is prompted in natural language; long, detailed descriptions yield the best results, and words to be rendered as text in the image are wrapped in quotes (per the Krea-2-Base-Diffusers model card).

The Raw quality tier (full-quality, undistilled)

Krea 2 Raw / Base is the undistilled foundation checkpoint — no step or guidance distillation, run with classifier-free guidance (recommended settings: 52 steps, CFG 3.5, up to 1024×1024). Unlike the 24 GB discrete cards (where the 24.76 GiB BF16 Raw transformer overflows), the M2 Max's ~48 GB addressable pool fits BF16 Raw comfortably — so on this Mac Raw is genuinely in reach. Run it as either the Raw GGUF (vantagewithai/Krea-2-Raw-GGUF, via the same Unet Loader (GGUF) node — the cleaner MPS choice), or the full BF16 build through ComfyUI's fp16 path. Expect substantially longer generations (52 vs 8 steps, plus CFG doubling per-step work).

Results

Speed: No community benchmark exists for Krea 2 on the M2 Max yet — /check/krea-2/m2-max returns verdict: unknown with no rows. We deliberately omit a seconds-per-image figure: image throughput on Apple Silicon is bound by the M2 Max's ~400 GB/s unified-memory bandwidth and its 38-core GPU, and no chip-named first-party number exists for Krea 2 here. Note the ComfyUI-MPS path is slower than a CUDA box of similar memory (no torch.compile, some CPU-fallback ops), and CUDA/ROCm latency numbers from the NVIDIA/AMD recipes do not forward to Apple Silicon. If you run it, please submit your numbers so they seed /check/krea-2/m2-max.
Memory usage: The lead Q8_0 GGUF transformer is 13.71 GB on disk; lighter tiers run 4.89–10.58 GB (verified via the HuggingFace tree API). Because ComfyUI frees the ~8 GiB BF16 encoder before sampling, the sampling-stage peak sits near the chosen GGUF tier plus the VAE and activations — a small fraction of the M2 Max's ~48 GB addressable pool. Even BF16 Raw (24.76 GiB) fits. Live measurements will land at /check/krea-2/m2-max.
Quality notes: Q8_0 is near-indistinguishable from BF16; Q6_K/Q5_K_M are strong; below Q4 expect visible degradation. Turbo is distilled for 8-step CFG-1.0 generation; for maximum fidelity use the Raw tier — see above. Architecture is a single-stream DiT, 12.9B parameters, 28 blocks at width 6144, with grouped-query attention and flow-matching sampling, per the Krea-2-Base-Diffusers model card.

For the full benchmark data, see /check/krea-2/m2-max.

Troubleshooting

Tried to install FlashAttention / bitsandbytes / a `cu12x` wheel and it failed

None of those apply on Apple Silicon. There is no CUDA, no FlashAttention, and no GPU bitsandbytes / GPTQ / AWQ / FP8 / NVFP4 kernel on macOS — ComfyUI runs on the MPS (Metal) backend, and the model is quantized via the GGUF file, not --load-in-4bit or an FP8 build. If a generic Krea 2 or diffusers tutorial tells you to pip install flash-attn, pick a cu128 wheel index, or load the FP8 safetensors, skip those steps — the MPS + GGUF path above is the complete Apple route.

`bf16`-style precision errors or black/garbled output on MPS

bf16 weights frequently break on the MPS backend. This recipe avoids that by leading the GGUF build (dequantized cleanly by ComfyUI-GGUF) rather than the BF16/FP8 safetensors. Make sure PYTORCH_ENABLE_MPS_FALLBACK=1 is set so unimplemented ops fall back to CPU. If you insist on the raw BF16 build, run ComfyUI in its fp16 mode; the GGUF path is the more reliable one on Apple.

"Unet Loader (GGUF) node not found"

Install ComfyUI-GGUF (city96) in custom_nodes/, run pip install --upgrade gguf, and restart ComfyUI.

Generation is slow

The ComfyUI-MPS path is the fallback route on Apple — the most-maintained MLX image tool (mflux) does not support Krea 2 yet, and Draw Things has no confirmed Krea 2 build, so there is no faster Metal-native option for this model today. Expect slower per-image times than a CUDA GPU; drop to a lighter GGUF tier or lower the resolution/steps for faster iteration. If macOS reports memory pressure on a heavy Raw run, raise the GPU wired limit with sudo sysctl iogpu.wired_limit_mb=<MB> (Sonoma 14 / Sequoia 15+; older macOS uses debug.iogpu.wired_limit in bytes), leaving 8–16 GB headroom for the OS — temporary, resets on reboot.