self-hosted/ai
§01·recipe · image

Krea 2 Turbo on RX 7900 XTX via ComfyUI + ROCm: GGUF Text-to-Image in 24GB

imageintermediate16GB+ VRAMJun 24, 2026

This intermediate recipe sets up Krea 2 on the RX 7900 XTX, needing about 16 GB of VRAM.

models
tools
prerequisites
  • AMD Radeon RX 7900 XTX (24GB VRAM), Linux with ROCm 6.2+ (gfx1100, officially supported)
  • ComfyUI 0.25.0 or newer + the ComfyUI-GGUF custom node (city96)
  • PyTorch built for ROCm (rocm6.x wheel) — NOT a CUDA build
  • ~12–14GB free disk for the GGUF diffusion model, plus the text encoder and VAE

What You'll Build

A local install of Krea 2 Turbo — the distilled, few-step variant of Krea AI's from-scratch aesthetic-first text-to-image foundation model (released 2026-06-23) — running 8-step text-to-image at up to 1280×720 on an AMD RX 7900 XTX (24GB), inside ComfyUI on ROCm. Because RDNA 3 has no FP8 hardware, this recipe does not use the NVIDIA FP8 build; it leads with a community GGUF quantization loaded through the ComfyUI-GGUF node, which is the reliable AMD path. On the RX 7900 XTX's 24GB you can run the near-full-quality Q8_0 tier (13.71 GB) with comfortable headroom.

Hardware data: RX 7900 XTX (24GB VRAM, RDNA 3, gfx1100) · Krea 2 Turbo GGUF, 8 steps at 1280×720 · See benchmark data

ℹ️ This is Krea 2, not FLUX.1-Krea-dev. Krea 2 is Krea AI's own from-scratch ~12.9B-parameter DiT released 2026-06-23 — a different model from the 2025 black-forest-labs/FLUX.1-Krea-dev (a BFL×Krea collaboration built on FLUX). Don't mix their weights, sizes, or workflows.

⚠️ On AMD, use GGUF — not the FP8 build. RDNA 3 (gfx1100) has no FP8 tensor hardware (FP8 matrix ops arrived with RDNA 4 / CDNA 3). An FP8 safetensors loads on ROCm but upcasts to BF16 for compute — so it gives no memory saving and the upcast ~24 GiB transformer would overflow even 24GB. Likewise, the full BF16 Turbo transformer is 24.76 GiB, which also overflows a 24GB card. The path that fits and runs is a GGUF quant (this recipe's lead) via the ComfyUI-GGUF node. Pin the format before you download.

ℹ️ Where the weights come from. Krea published the official weights as gated repos under its verified org — krea/Krea-2-Raw and krea/Krea-2-Turbo (access-restricted). An ungated community mirror of the official turbo.safetensors is the krea-community/krea-2 bucket. The GGUF quants used here are a community conversion of the official Turbo weights, published at vantagewithai/Krea-2-Turbo-GGUF (which also ships a ready ComfyUI workflow JSON). Model identity and license come from krea.ai; read the license before any commercial use (see Requirements).

Requirements

ComponentMinimumTested
GPU16GB VRAM RDNA 3 cardRX 7900 XTX (24GB, RDNA 3 Navi 31, gfx1100)
OS / driverLinux + ROCm 6.2+ (gfx1100 is officially supported)
RAM16GB system RAM (32GB comfortable)
Storage~14GB (Q8_0 GGUF 13.71GB + ~8GB BF16 encoder + 0.24GB VAE)
SoftwareComfyUI 0.25.0+ · ComfyUI-GGUF (city96) · PyTorch for ROCmComfyUI native Krea2 + UnetLoaderGGUF

The RX 7900 XTX is an officially ROCm-supported card (gfx1100) per AMD's ROCm system-requirements matrix; HSA_OVERRIDE_GFX_VERSION is not required for it. ROCm is Linux-first — on Windows, use WSL2.

Licensing — read before commercial use. Krea 2 is released under the Krea 2 Community License. Key terms: you own the Outputs you generate; commercial use is free only if your company's total annual revenue is under $1,000,000 USD (above that requires an Enterprise License); any derivative AI model name must begin with "Krea"; you must implement reasonable content-filtering; and you may not circumvent or remove the model's content-provenance or watermarking mechanisms.

Installation

1. Install ComfyUI on ROCm

Install a ROCm PyTorch build (not CUDA) and run ComfyUI on it. The exact rocmX.Y wheel tag moves over time — read the live selector at pytorch.org/get-started/locally and pick the ROCm option, e.g.:

# Linux, inside your ComfyUI venv — pick the current ROCm tag from the selector
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
# then ComfyUI itself
git clone https://github.com/Comfy-Org/ComfyUI && cd ComfyUI
pip install -r requirements.txt

Launch ComfyUI with PyTorch's cross-attention backend — on RDNA 3 this is the stable attention path and avoids a known ComfyUI ROCm VAE-decode crash (ComfyUI #11551):

python main.py --use-pytorch-cross-attention

ROCm notes. Do not install FlashAttention (pip install flash-attn) — upstream CK FlashAttention does not build on gfx1100; ComfyUI's PyTorch SDPA path is what you use. The RX 7900 XTX is officially supported, so HSA_OVERRIDE_GFX_VERSION is a legacy masquerade you do not need here. If a large model load stalls, add --disable-smart-memory.

2. Install the ComfyUI-GGUF custom node

The GGUF diffusion model loads through ComfyUI-GGUF (city96). Install it via ComfyUI Manager ("Install Custom Nodes" → search "GGUF"), or by hand:

cd ComfyUI/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
pip install --upgrade gguf

Restart ComfyUI; you should now have the Unet Loader (GGUF) node.

3. Download the model files

Pick one GGUF tier (see the table under "Running" — Q8_0 is the lead on 24GB), plus the text encoder and VAE. File-to-folder mapping follows the vantagewithai/Krea-2-Turbo-GGUF workflow:

# from your ComfyUI root

# GGUF diffusion model (Q8_0 = 13.71 GiB, the 24GB lead) → unet/
cd models/unet
wget https://huggingface.co/vantagewithai/Krea-2-Turbo-GGUF/resolve/main/krea2_turbo-Q8_0.gguf

# Qwen3-VL 4B text encoder, BF16 (8.26 GiB) → text_encoders/
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/Qwen3-VL/resolve/main/text_encoders/qwen3vl_4b_bf16.safetensors

# Qwen-Image VAE (242 MiB) → vae/
cd ../vae
wget https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors

Krea 2's text encoder is Qwen/Qwen3-VL-4B-Instruct and its VAE is the Qwen-Image autoencoder (AutoencoderKLQwenImage, f8, 16 latent channels), per the Krea-2-Base-Diffusers model card. The VAE file is precision-independent (same file used on NVIDIA). On AMD the BF16 encoder is the clean choice — the workflow's qwen3vl_4b_fp8_scaled file also loads, but on RDNA 3 it upcasts to BF16 anyway (no memory saving), so there's no reason to prefer it here.

4. Load the workflow

Drag the Vantage_Krea-2-Turbo.json workflow onto the ComfyUI canvas. Set the Unet Loader (GGUF) node to the .gguf tier you downloaded, the encoder loader to qwen3vl_4b_bf16.safetensors, and the VAE to qwen_image_vae.safetensors.

Running

Edit the prompt node and click Queue Prompt. The Turbo defaults baked into the workflow are:

  • Steps: 8
  • CFG: 1.0
  • Sampler: er_sde
  • Scheduler: simple
  • Resolution: 1280×720

ComfyUI runs the Qwen3-VL encoder to encode your prompt, then frees it before the diffusion sampling stage — so the encoder (~8 GiB resident) and the GGUF transformer are not both in VRAM at peak. On the RX 7900 XTX's 24GB this leaves wide headroom at every tier. Output PNGs land in ComfyUI/output/.

GGUF quant tiers (byte sizes verified via the HuggingFace tree API; pick by VRAM and quality target):

TierFile sizeNotes
Q8_013.71 GBNear-BF16 quality — lead on 24GB
Q6_K10.58 GBExcellent quality, lighter
Q5_K_M8.87 GBStrong quality/size balance (good 16GB lead)
Q4_K_M7.49 GBSmallest practical quality tier
Q3_K_M6.01 GBVisible degradation; only if tight
Q2_K4.89 GBLowest tier; quality drops noticeably

Tip — natural-language prompts. Krea 2 is prompted in natural language; long, detailed descriptions yield the best results, and words to be rendered as text in the image are wrapped in quotes (per the Krea-2-Base-Diffusers model card).

The Raw quality tier (full-quality, undistilled)

Krea 2 Raw / Base is the undistilled foundation checkpoint — no step or guidance distillation, run with classifier-free guidance (recommended settings: 52 steps, CFG 3.5, up to 1024×1024). It is the LoRA-training base (LoRAs trained on Base apply to Turbo). For the same no-FP8 reason, run Raw on AMD as GGUF too — community Raw GGUFs are published at vantagewithai/Krea-2-Raw-GGUF, loaded through the same Unet Loader (GGUF) node. Expect substantially longer generations (52 vs 8 steps, plus CFG doubling the per-step work); on 24GB the higher Raw GGUF tiers have room, but verify your chosen tier fits before a long run.

Results

  • Speed: No community benchmark exists for Krea 2 on the RX 7900 XTX yet — the /check/krea-2/rx-7900-xtx endpoint currently returns verdict: unknown with no benchmark rows. GGUF inference on ROCm is dequantized to the card's native compute (RDNA 3 has no FP8/INT acceleration for diffusion here), so expect throughput governed by the 7900 XTX's FP16/BF16 path; no vendor figure names this card, so we omit a measured number rather than quote different hardware. If you run it, please submit your numbers so they appear on /check/krea-2/rx-7900-xtx.
  • VRAM usage: The lead Q8_0 GGUF transformer is 13.71 GB on disk; lighter tiers run 4.89–10.58 GB (verified via the HuggingFace tree API). Because ComfyUI encodes the prompt and frees the ~8 GiB BF16 encoder before sampling, the sampling-stage peak sits near the chosen GGUF tier plus the VAE and activations — well within the RX 7900 XTX's 24GB at every tier. Live measurements will land at /check/krea-2/rx-7900-xtx.
  • Quality notes: Q8_0 is near-indistinguishable from BF16; Q6_K/Q5_K_M are strong; below Q4 expect visible degradation. Turbo is distilled for 8-step CFG-1.0 generation; for maximum fidelity use the Raw tier (52 steps, CFG 3.5) — see above. Architecture is a single-stream DiT, 12.9B parameters, 28 blocks at width 6144, with grouped-query attention and flow-matching sampling, per the Krea-2-Base-Diffusers model card.

For the full benchmark data, see /check/krea-2/rx-7900-xtx.

Troubleshooting

ComfyUI ROCm VAE-decode crash / black or garbled output

Launch ComfyUI with --use-pytorch-cross-attention (PyTorch SDPA). On RDNA 3 this is the attention path confirmed in ComfyUI #11551; the alternative --bf16-vae is not the fix (it is contested and can inflate decode VRAM). Do not install FlashAttention — it does not build on gfx1100, and ComfyUI does not need it.

"Unet Loader (GGUF) node not found"

Install ComfyUI-GGUF (city96) in custom_nodes/, run pip install --upgrade gguf, and restart ComfyUI. The base ComfyUI install cannot load .gguf diffusion models without it.

Out of memory during sampling

Drop to a lighter GGUF tier (Q6_K → Q5_K_M → Q4_K_M) — each step roughly follows the file-size column above. Close other GPU apps. If a large load stalls rather than OOMs, add --disable-smart-memory to the launch command (a known ROCm memory-management workaround).

Verify ROCm sees the GPU

If ComfyUI falls back to CPU, confirm python -c "import torch; print(torch.cuda.is_available(), torch.version.hip)" prints True and a HIP version. If torch.version.hip is None, you installed a CUDA wheel — reinstall the ROCm build from the pytorch.org selector. The RX 7900 XTX (gfx1100) is officially supported, so no HSA_OVERRIDE_GFX_VERSION masquerade is needed.

common questions
How much VRAM does Krea 2 need?

About 16 GB — the minimum this recipe targets.

Which GPUs is Krea 2 tested on?

RX 7900 XTX (24 GB).

How hard is this setup?

Intermediate — follow the steps above.