self-hosted/ai
§01·recipe · image

Flux.2 Klein 4B on RTX 5060 Ti: ComfyUI & Diffusers Setup

imageintermediate13GB+ VRAMMay 18, 2026
models
tools
prerequisites
  • NVIDIA RTX 5060 Ti (16GB VRAM) or equivalent 13GB+ consumer GPU
  • Python 3.10+ (Python 3.12 for the official BFL repo)
  • ComfyUI (latest) or `diffusers` Python package

What You'll Build

Generate 1024×1024 images locally with Black Forest Labs' Flux.2 Klein 4B — the smallest, fastest member of the Flux.2 family — on an RTX 5060 Ti. Klein is Apache-2.0 licensed and explicitly targeted at consumer hardware: BFL's model card states it "fits in ~13GB VRAM and is accessible on NVIDIA RTX 3090/4070 and above", leaving room to spare on the 5060 Ti's 16GB.

Hardware data: RTX 5060 Ti (16GB VRAM) · ~13GB peak VRAM per BFL's official card · See benchmark data

Requirements

ComponentMinimumTested
GPU13GB VRAM (RTX 3090 / 4070 and above, per the official model card)RTX 5060 Ti (16GB)
RAM16GB
Storage~24GB (full diffusers checkout) or ~8GB (FP8 ComfyUI single-file)
SoftwareComfyUI (latest) or diffusers, transformers, accelerate

The full diffusers checkout from black-forest-labs/FLUX.2-klein-4B is ~24GB on disk (the consolidated flux-2-klein-4b.safetensors alone is 7.75GB; the rest is the Qwen3 text encoder and VAE). The ComfyUI FP8 single-file path is leaner — see step 2 below.

Installation

Two supported paths — pick one. The diffusers path is the most direct reproduction of the official example; the ComfyUI path is preferred if you already have a Flux.1 workflow set up.

Path A — Diffusers (Python, official example)

1. Install dependencies

pip install -U diffusers transformers accelerate

2. Run the official example

This is the exact snippet published on the model card at huggingface.co/black-forest-labs/FLUX.2-klein-4B:

import torch
from diffusers import Flux2KleinPipeline

device = "cuda"
dtype = torch.bfloat16

pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-4B",
    torch_dtype=dtype,
)
pipe.enable_model_cpu_offload()  # save VRAM by offloading to CPU

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=1.0,
    num_inference_steps=4,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]
image.save("flux-klein.png")

enable_model_cpu_offload() is what keeps peak VRAM near the documented ~13GB — leave it in unless you have spare headroom and want raw throughput.

Path B — ComfyUI

1. Update ComfyUI to the latest build

Klein support landed in ComfyUI's nightly nodes; an older build will fail to load the workflow. From your ComfyUI checkout:

git pull
pip install -r requirements.txt

2. Download the three required files

Klein needs the Flux.2 family VAE (AutoencoderKLFlux2, shared across Klein / Dev / Pro per Klein's model_index.json, and distinct from the Flux.1 VAE) plus a Qwen3 text encoder. The standard ComfyUI layout (also used by the official Flux.2 Dev ComfyUI tutorial) is:

FileFolder
flux-2-klein-4b-fp8.safetensors (or the bf16 single-file)ComfyUI/models/diffusion_models/
qwen_3_4b.safetensorsComfyUI/models/text_encoders/
flux2-vae.safetensorsComfyUI/models/vae/

The text encoder identity is confirmed by Klein's model_index.json, which declares "text_encoder": ["transformers", "Qwen3ForCausalLM"].

Running

Diffusers

python flux_klein.py

Output flux-klein.png lands in your working directory. First run downloads the weights from the Hub (~24GB) into your local ~/.cache/huggingface/.

ComfyUI

python main.py --listen

Open http://localhost:8188, load a Klein workflow (BFL ships official workflow JSONs in the comfyanonymous / BFL ecosystem; drag-and-drop the .json onto the canvas). For the distilled variant use 4 steps at CFG 1.0; for the base variant use 25–50 steps at CFG 5.0, per the published Klein ComfyUI walkthrough.

Results

  • VRAM usage: ~13GB peak per BFL's official model card ("FLUX.2 [klein] 4B model fits in ~13GB VRAM and is accessible on NVIDIA RTX 3090/4070 and above") — well under the 5060 Ti's 16GB budget. See /check/flux-2-klein-4b/rtx-5060-ti for community benchmark data.
  • Quality notes: Klein is the small/distilled member of the Flux.2 family; expect strong prompt adherence at a fraction of the parameter count, with the usual distillation tradeoffs (less prompt-style flexibility than the base Flux.2 Dev at the same resolution).
  • License: Apache-2.0 — commercial use permitted (per the model card).

For the full benchmark data, see /check/flux-2-klein-4b/rtx-5060-ti.

Troubleshooting

"Distorted colors / washed-out output"

You're loading the wrong VAE. Klein must use flux2-vae.safetensors (the Flux.2 family VAE, shared across Klein/Dev/Pro per model_index.json) — loading any other VAE (Flux.1, SDXL, SD1.5) will produce broken output. Confirm the VAE file in ComfyUI/models/vae/ matches the filename above.

"Text encoder shape mismatch / config error"

Klein uses a Qwen3 text encoder per its model_index.json (text_encoder = ['transformers', 'Qwen3ForCausalLM']), not the T5 family that Flux.1 used. Make sure you downloaded qwen_3_4b.safetensors (or the equivalent diffusers shards from the text_encoder/ subfolder), not a Flux.1 T5 file you might still have on disk.

OOM on the first generation

Stick to the distilled variant (4 steps, CFG 1.0) on a 16GB card. If you still see OOM in ComfyUI, launch with the standard low-VRAM flag:

python main.py --listen --lowvram

For diffusers, leave pipe.enable_model_cpu_offload() enabled — it's what keeps peak below 13GB.

"Where do I find the workflow JSON?"

The BFL GitHub repo is the canonical home of the official Flux.2 workflows and command-line tooling; the model card on Hugging Face links community workflows in the discussions tab. If neither has what you need, report your setup via the submission form.