What You'll Build
Generate 1024×1024 images locally with Black Forest Labs' Flux.2 Klein 4B — the smallest, fastest member of the Flux.2 family — on an RTX 5060 Ti. Klein is Apache-2.0 licensed and explicitly targeted at consumer hardware: BFL's model card states it "fits in ~13GB VRAM and is accessible on NVIDIA RTX 3090/4070 and above", leaving room to spare on the 5060 Ti's 16GB.
Hardware data: RTX 5060 Ti (16GB VRAM) · ~13GB peak VRAM per BFL's official card · See benchmark data
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 13GB VRAM (RTX 3090 / 4070 and above, per the official model card) | RTX 5060 Ti (16GB) |
| RAM | 16GB | — |
| Storage | ~24GB (full diffusers checkout) or ~8GB (FP8 ComfyUI single-file) | — |
| Software | ComfyUI (latest) or diffusers, transformers, accelerate | — |
The full diffusers checkout from black-forest-labs/FLUX.2-klein-4B is ~24GB on disk (the consolidated flux-2-klein-4b.safetensors alone is 7.75GB; the rest is the Qwen3 text encoder and VAE). The ComfyUI FP8 single-file path is leaner — see step 2 below.
Installation
Two supported paths — pick one. The diffusers path is the most direct reproduction of the official example; the ComfyUI path is preferred if you already have a Flux.1 workflow set up.
Path A — Diffusers (Python, official example)
1. Install dependencies
pip install -U diffusers transformers accelerate
2. Run the official example
This is the exact snippet published on the model card at huggingface.co/black-forest-labs/FLUX.2-klein-4B:
import torch
from diffusers import Flux2KleinPipeline
device = "cuda"
dtype = torch.bfloat16
pipe = Flux2KleinPipeline.from_pretrained(
"black-forest-labs/FLUX.2-klein-4B",
torch_dtype=dtype,
)
pipe.enable_model_cpu_offload() # save VRAM by offloading to CPU
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt=prompt,
height=1024,
width=1024,
guidance_scale=1.0,
num_inference_steps=4,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
image.save("flux-klein.png")
enable_model_cpu_offload() is what keeps peak VRAM near the documented ~13GB — leave it in unless you have spare headroom and want raw throughput.
Path B — ComfyUI
1. Update ComfyUI to the latest build
Klein support landed in ComfyUI's nightly nodes; an older build will fail to load the workflow. From your ComfyUI checkout:
git pull
pip install -r requirements.txt
2. Download the three required files
Klein needs the Flux.2 family VAE (AutoencoderKLFlux2, shared across Klein / Dev / Pro per Klein's model_index.json, and distinct from the Flux.1 VAE) plus a Qwen3 text encoder. The standard ComfyUI layout (also used by the official Flux.2 Dev ComfyUI tutorial) is:
| File | Folder |
|---|---|
flux-2-klein-4b-fp8.safetensors (or the bf16 single-file) | ComfyUI/models/diffusion_models/ |
qwen_3_4b.safetensors | ComfyUI/models/text_encoders/ |
flux2-vae.safetensors | ComfyUI/models/vae/ |
The text encoder identity is confirmed by Klein's model_index.json, which declares "text_encoder": ["transformers", "Qwen3ForCausalLM"].
Running
Diffusers
python flux_klein.py
Output flux-klein.png lands in your working directory. First run downloads the weights from the Hub (~24GB) into your local ~/.cache/huggingface/.
ComfyUI
python main.py --listen
Open http://localhost:8188, load a Klein workflow (BFL ships official workflow JSONs in the comfyanonymous / BFL ecosystem; drag-and-drop the .json onto the canvas). For the distilled variant use 4 steps at CFG 1.0; for the base variant use 25–50 steps at CFG 5.0, per the published Klein ComfyUI walkthrough.
Results
- VRAM usage: ~13GB peak per BFL's official model card ("FLUX.2 [klein] 4B model fits in ~13GB VRAM and is accessible on NVIDIA RTX 3090/4070 and above") — well under the 5060 Ti's 16GB budget. See /check/flux-2-klein-4b/rtx-5060-ti for community benchmark data.
- Quality notes: Klein is the small/distilled member of the Flux.2 family; expect strong prompt adherence at a fraction of the parameter count, with the usual distillation tradeoffs (less prompt-style flexibility than the base Flux.2 Dev at the same resolution).
- License: Apache-2.0 — commercial use permitted (per the model card).
For the full benchmark data, see /check/flux-2-klein-4b/rtx-5060-ti.
Troubleshooting
"Distorted colors / washed-out output"
You're loading the wrong VAE. Klein must use flux2-vae.safetensors (the Flux.2 family VAE, shared across Klein/Dev/Pro per model_index.json) — loading any other VAE (Flux.1, SDXL, SD1.5) will produce broken output. Confirm the VAE file in ComfyUI/models/vae/ matches the filename above.
"Text encoder shape mismatch / config error"
Klein uses a Qwen3 text encoder per its model_index.json (text_encoder = ['transformers', 'Qwen3ForCausalLM']), not the T5 family that Flux.1 used. Make sure you downloaded qwen_3_4b.safetensors (or the equivalent diffusers shards from the text_encoder/ subfolder), not a Flux.1 T5 file you might still have on disk.
OOM on the first generation
Stick to the distilled variant (4 steps, CFG 1.0) on a 16GB card. If you still see OOM in ComfyUI, launch with the standard low-VRAM flag:
python main.py --listen --lowvram
For diffusers, leave pipe.enable_model_cpu_offload() enabled — it's what keeps peak below 13GB.
"Where do I find the workflow JSON?"
The BFL GitHub repo is the canonical home of the official Flux.2 workflows and command-line tooling; the model card on Hugging Face links community workflows in the discussions tab. If neither has what you need, report your setup via the submission form.