How much VRAM does Flux.1 Dev need?

About 16 GB — the minimum this recipe targets.

How hard is this setup?

Beginner — follow the steps above.

Flux.1 Dev on RTX 5060 Ti: ComfyUI Setup Guide

What You'll Build

Generate images locally with Flux.1 Dev — Black Forest Labs' 12B guidance-distilled rectified-flow text-to-image transformer — on an RTX 5060 Ti (16GB). The catch: the full-precision FP16 transformer does not fit a 16GB card, so this guide runs the model the only way it works on 16GB — with FP8 or GGUF quantized weights plus sequential CPU offload. The backend benchmark confirms it runs on this card with "standard ComfyUI workarounds (sequential CPU offload, fp8 / GGUF weights)."

Hardware data: RTX 5060 Ti (16GB VRAM) · runs via FP8 / GGUF + CPU offload · no measured speed yet · See benchmark data

⚠️ FP16 will not fit. The full-precision flux1-dev.safetensors transformer is 23.8GB on disk (HF file tree) — far over the 16GB the RTX 5060 Ti has. Don't download the full FP16 repo expecting it to run here. Use the FP8 single-file checkpoint or a GGUF quant (below) and let ComfyUI offload to CPU.

⚠️ Known issue — back-to-back generation hang. The backend benchmark and the HF discussion thread #547 note a persistent CUDA memory reservation that can hang a second back-to-back generation on 4060/5060-series cards in some ComfyUI configs. The first generation completes correctly; the workarounds below clear it.

ℹ️ Non-commercial license. Flux.1 Dev ships under the FLUX.1 [dev] Non-Commercial License. Personal and research use is fine; commercial use of the weights or outputs requires a separate license from Black Forest Labs.

Requirements

Component	Minimum	Notes
GPU	16GB VRAM	RTX 5060 Ti (16GB) — runs via FP8/GGUF + offload, not FP16
RAM	16GB	32GB recommended — CPU offload spills weights to system RAM
Storage	~7GB (GGUF Q4) – ~18GB (FP8)	plus ~6GB for text encoders + VAE
Software	ComfyUI, Python 3.10+	latest ComfyUI for GGUF/offload support

This is the FP8/GGUF + sequential CPU offload path, not full FP16. Verified file sizes from the HF tree API:

FP16 transformer flux1-dev.safetensors: 23.8GB — over the 16GB ceiling, do not use (HF tree).
All-in-one FP8 flux1-dev-fp8.safetensors: 17.2GB (bundles transformer + both text encoders + VAE) (Comfy-Org/flux1-dev tree). Larger than 16GB on disk, but ComfyUI offloads it onto a 16GB card.
GGUF transformer-only quants (city96/FLUX.1-dev-gguf tree): Q4_K_S 6.8GB · Q5_K_S 8.3GB · Q6_K 9.9GB · Q8_0 12.7GB. These leave the most VRAM headroom on a 16GB card.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

2. Accept the license and authenticate

Flux.1 Dev is a gated repo — accept the license on the model card first, then log in:

pip install huggingface_hub
huggingface-cli login

3. Download quantized weights (pick ONE path)

Path A — GGUF (recommended for 16GB; most VRAM headroom). GGUF runs through the ComfyUI-GGUF custom node — install it into ComfyUI/custom_nodes/ first, then grab a quant (Q4–Q6 are the sweet spot for 16GB):

# Transformer-only GGUF quant into the diffusion_models folder
huggingface-cli download city96/FLUX.1-dev-gguf flux1-dev-Q5_K_S.gguf \
  --local-dir ./ComfyUI/models/diffusion_models/

GGUF ships the transformer only, so you still need the text encoders and VAE:

# Text encoders (use the fp8-scaled T5 to save memory)
huggingface-cli download comfyanonymous/flux_text_encoders \
  clip_l.safetensors t5xxl_fp8_e4m3fn_scaled.safetensors \
  --local-dir ./ComfyUI/models/text_encoders/

# VAE
huggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors \
  --local-dir ./ComfyUI/models/vae/

Place files in the current ComfyUI folder layout:

flux1-dev-Q5_K_S.gguf → ComfyUI/models/diffusion_models/
clip_l.safetensors → ComfyUI/models/text_encoders/
t5xxl_fp8_e4m3fn_scaled.safetensors → ComfyUI/models/text_encoders/
ae.safetensors → ComfyUI/models/vae/

Path B — FP8 all-in-one (simplest; one file). The bundled FP8 checkpoint packs the transformer, both text encoders, and the VAE into a single 17.2GB file:

huggingface-cli download Comfy-Org/flux1-dev flux1-dev-fp8.safetensors \
  --local-dir ./ComfyUI/models/checkpoints/

Place flux1-dev-fp8.safetensors → ComfyUI/models/checkpoints/. At 17.2GB on disk it's over 16GB, so ComfyUI must offload — launch with --lowvram (see Running).

Folder names matter. Current ComfyUI uses models/diffusion_models/ (not the old models/unet/) and models/text_encoders/ (not the old models/clip/). Older guides that point at unet/ and clip/ are stale. And do not dump the whole FLUX.1-dev repo into one folder — download only the specific quantized file you need.

Running

Start ComfyUI with low-VRAM offload so weights spill to CPU as needed on a 16GB card:

python main.py --listen --lowvram

Navigate to http://localhost:8188:

GGUF (Path A): load a Flux workflow and swap the loader for the Unet Loader (GGUF) node from ComfyUI-GGUF, pointing it at your .gguf file. Load clip_l + t5xxl_fp8_e4m3fn_scaled in a dual-CLIP loader and ae.safetensors as the VAE.
FP8 (Path B): load the official Flux.1 Dev workflow and point the checkpoint loader at flux1-dev-fp8.safetensors.

Recommended settings (RTX 5060 Ti, 16GB)

CFG / guidance: 1.0 — Flux is guidance-distilled and does not use classifier-free guidance like older SD models. Leave CFG at 1.
Steps: 20 is a good starting point for Flux.1 Dev.
Resolution: start at 1024×1024 (1 megapixel); drop lower if you hit OOM.
Offload: keep --lowvram on; for the tightest fits, add a CPU-offload / VRAM-management node to the graph.

Results

Speed: the backend benchmark records no measured speed for this card (peak ~16.0GB, confidence 0.9). We won't invent a number. If you benchmark your setup, please submit your results so we can publish a real figure.
VRAM: the model runs at ~16GB peak via FP8/GGUF + sequential CPU offload — it fits only because ComfyUI offloads to system RAM, which is why 32GB RAM is recommended.
Quality: GGUF Q5/Q6 and FP8 are visually close to FP16 for most prompts. Lower quants (Q4 and below) trade more fidelity for headroom — go higher if your VRAM allows.

For the full benchmark data, see /check/flux-1-dev/rtx-5060-ti.

Troubleshooting

Second generation hangs (CUDA memory reservation): the known issue from discussion #547. The first run is fine; the second can hang because VRAM stays reserved on 4060/5060-series cards in some ComfyUI configs. Fixes: (1) restart the ComfyUI process between sessions; (2) run with --lowvram so ComfyUI manages memory aggressively; (3) add a VRAM/cleanup node to the workflow to free memory between runs.
Out of memory: you're likely on FP16 by mistake (23.8GB won't fit) — switch to a GGUF quant or the FP8 checkpoint. Then lower resolution, drop to a smaller GGUF quant (Q4_K_S), and confirm --lowvram is set.
Blank or gray output: the text encoders aren't loaded. Confirm clip_l.safetensors and t5xxl_fp8_e4m3fn_scaled.safetensors are in ComfyUI/models/text_encoders/ (Path A), or use the all-in-one FP8 checkpoint which bundles them (Path B).
License / access error on download: accept the license at huggingface.co/black-forest-labs/FLUX.1-dev and run huggingface-cli login first. The repo is gated; unauthenticated downloads return a 401-style error.
Slow generation: ensure CUDA is active — torch.cuda.is_available() should return True. Offload-heavy runs are inherently slower; that's the trade-off for fitting on 16GB.
CFG = 1 negative prompts not working: Flux ignores negative prompts at CFG 1 by design. If you need them, install ComfyUI-ppm, which provides nodes that enable negative guidance at CFG 1.