How much VRAM does Flux.1 Dev need?

About 24 GB — the minimum this recipe targets.

How hard is this setup?

Beginner — follow the steps above.

Flux.1 Dev on RTX 4090: ComfyUI Image Generation Guide

What You'll Build

Generate high-quality 1-megapixel images locally on your RTX 4090 using Flux.1 Dev — Black Forest Labs' 12B guidance-distilled rectified-flow text-to-image transformer. No cloud services, no API costs.

Benchmark: FP8 · 1.85 it/s · peak VRAM ~24.5GB (24564 MB) · 1024×1024, 20 steps · See all data

⚠️ The measured path is FP8, and it nearly fills the card. The backend benchmark for this pair (source: ComfyUI discussion #9002) records FP8 at 1.85 it/s with a 24564 MB (~24.5GB) peak — right at the 4090's 24GB ceiling. Full-precision FP16 is tighter still: the flux1-dev.safetensors transformer alone is 23.8GB on disk and loads near/over the limit, so OOM is likely if the 4090 is also driving your display. Lead with FP8; drop to a GGUF Q-quant (below) if you need real headroom.

ℹ️ Non-commercial license. Flux.1 Dev ships under the FLUX.1 [dev] Non-Commercial License. Personal and research use is allowed; commercial use of the weights or outputs requires a separate license from Black Forest Labs.

Requirements

Component	Minimum (sub-24GB fallback)	Tested (measured path)
GPU	12–16GB via GGUF Q4/Q8	RTX 4090 (24GB), FP8
VRAM	~5–17GB (GGUF, quant-dependent)	~24.5GB peak (FP8, #9002)
RAM	16GB	32GB+ (recommended for FP16 t5xxl)
Storage	~17GB (FP8) / ~24GB (FP16)	35GB
Software	ComfyUI, Python 3.10+	—

File sizes (HF file tree): the full FP16 transformer flux1-dev.safetensors is 23.8GB and the VAE ae.safetensors is 0.34GB. The all-in-one FP8 checkpoint flux1-dev-fp8.safetensors is 17.2GB (Comfy-Org/flux1-dev file tree) — this is the practical 24GB path. GGUF Q-quants (below) are the only genuinely sub-24GB route.

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

2. Accept the license and authenticate

Flux.1 Dev is a gated repo — accept the license on the model card first, then log in:

pip install huggingface_hub
huggingface-cli login

3. (Recommended) FP8 single-file checkpoint — the measured path

The bundled FP8 checkpoint packs the transformer, both text encoders, and the VAE into one 17.2GB file. This is the configuration the backend benchmark measured at 1.85 it/s:

huggingface-cli download Comfy-Org/flux1-dev flux1-dev-fp8.safetensors \
  --local-dir ./ComfyUI/models/checkpoints/

Place flux1-dev-fp8.safetensors → ComfyUI/models/checkpoints/.

4. (Alternative) FP16 full-precision path

For maximum fidelity, download the separate FP16 files — but expect this to sit at/over the 24GB ceiling. Per the official ComfyUI Flux examples, the regular full version needs four files:

huggingface-cli download black-forest-labs/FLUX.1-dev \
  flux1-dev.safetensors ae.safetensors --local-dir ./flux-download/

Then download the text encoders (from the ComfyUI examples page) and place every file in its modern ComfyUI folder:

flux1-dev.safetensors → ComfyUI/models/diffusion_models/
clip_l.safetensors → ComfyUI/models/text_encoders/
t5xxl_fp16.safetensors → ComfyUI/models/text_encoders/
ae.safetensors (VAE) → ComfyUI/models/vae/

The official guidance: "You can use t5xxl_fp8_e4m3fn_scaled.safetensors instead for lower memory usage but the fp16 one is recommended if you have more than 32GB ram." On a 4090 that also drives a display, the t5xxl_fp8_e4m3fn_scaled encoder is the safer choice.

5. Install ComfyUI Manager (optional)

cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager

This lets you install workflow dependencies with one click.

Running

Start ComfyUI:

python main.py --listen

Navigate to http://localhost:8188, then download the official Flux.1 Dev workflow and drag it into the canvas.

Recommended settings (RTX 4090)

Precision: FP8 (the benchmarked, comfortable default); FP16 for maximum fidelity if you have headroom
Steps: 20 (the benchmarked configuration)
CFG / guidance: 1.0 — Flux is guidance-distilled and does not use classifier-free guidance like older SD models
Resolution: 1024×1024 (1 megapixel — the benchmarked resolution)

Performance

Config	Speed	Peak VRAM	Source
FP8, 20 steps, 1024×1024	1.85 it/s	~24.5GB (24564 MB)	#9002 · /check
FP16, 20 steps, 1024×1024	not separately benchmarked here	transformer alone 23.8GB on disk; loads near/over 24GB	HF tree

The only measured datapoint for this pair is the FP8 run above. FP16 fidelity is higher but rides the VRAM ceiling — treat it as best-effort, not a guaranteed fit. Have a measured FP16 number on a 4090? Contribute it.

Quantization Options

The FP8 path (~24.5GB peak) is the realistic full-quality route on a 24GB card. If you need to run on less than 24GB, or want faster generation, drop to a GGUF Q-quant via ComfyUI-GGUF:

Format	On-disk size	VRAM footprint	Quality
FP16 (full)	23.8GB transformer	near/over 24GB	Best
FP8 (single-file)	17.2GB	~24.5GB peak (measured)	Excellent
Q8_0 GGUF	~12GB	~12–14GB	Very close to FP16
Q4_K_M GGUF	~6.8GB	~6–8GB	Good

GGUF footprints are quant-dependent and unverified by our backend — install ComfyUI-GGUF, download the matching flux1-dev-Q*.gguf weights, and use the GGUF loader node in place of the standard checkpoint loader.

Troubleshooting

Out of Memory (FP16 or FP8 near the ceiling): Most common when the 4090 also drives your display. Fixes, in order: (1) use the FP8 single-file checkpoint (Step 3); (2) on the FP16 path, swap t5xxl_fp16.safetensors for t5xxl_fp8_e4m3fn_scaled.safetensors; (3) drop to a GGUF Q-quant; (4) launch with python main.py --listen --lowvram to let ComfyUI offload aggressively.

Slow generation: Ensure the GPU is selected — check torch.cuda.is_available() returns True in Python.

Blank/gray output: The T5 text encoder must be loaded. On the FP16 path, verify both clip_l.safetensors and your chosen t5xxl_*.safetensors are present in ComfyUI/models/text_encoders/. The FP8 single-file checkpoint bundles the encoders, so this is not an issue there.

License / access error: Accept the license at huggingface.co/black-forest-labs/FLUX.1-dev and run huggingface-cli login before downloading — the repo is gated, and an unauthenticated download returns a 401-style error.