How much VRAM does Flux.1 Dev need?

About 24 GB — the minimum this recipe targets.

How hard is this setup?

Beginner — follow the steps above.

Flux.1 Dev on RTX 3090 Ti: ComfyUI Image Generation Guide

What You'll Build

Generate high-quality 1-megapixel images locally on your RTX 3090 Ti using Flux.1 Dev — Black Forest Labs' 12B guidance-distilled rectified-flow text-to-image transformer. No cloud, no API costs. This guide leads with the full-precision FP16 path and shows the lighter FP8 path for when you want more headroom or faster runs.

Hardware data: RTX 3090 Ti (24GB VRAM) · ~30s per 1024×1024 image at FP16 · See benchmark data

⚠️ FP16 sits near the 24GB ceiling. The full-precision flux1-dev.safetensors transformer is 23.8GB on disk and loads close to the card's 24GB limit. The benchmark author warns that the FP16 run may not fit into the 3090 Ti's VRAM — especially when it is your primary GPU, since Windows is also consuming VRAM for the desktop. If the FP16 path OOMs — most likely when the 3090 Ti is also driving your display — switch to the FP8 path below (~12GB runtime, comfortable headroom).

ℹ️ Non-commercial license. Flux.1 Dev ships under the FLUX.1 [dev] Non-Commercial License. You may generate images for personal and research use, but commercial use of the weights or outputs requires a separate license from Black Forest Labs.

Requirements

Component	Minimum	Tested
GPU	24GB VRAM (FP16) / 16GB (FP8 single-file)	RTX 3090 Ti (24GB)
RAM	16GB	32GB+ (recommended for FP16 t5xxl)
Storage	~24GB (FP16) / ~17GB (FP8)	35GB
Software	ComfyUI, Python 3.10+	—

The full transformer flux1-dev.safetensors is 23.8GB and the VAE ae.safetensors is 0.34GB (HF file tree). The all-in-one FP8 checkpoint flux1-dev-fp8.safetensors is 17.2GB (Comfy-Org/flux1-dev file tree).

Installation

1. Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

2. Accept the license and authenticate

Flux.1 Dev is a gated repo — accept the license on the model card first, then log in:

pip install huggingface_hub
huggingface-cli login

3. Download model files (FP16 full-precision path)

Per the official ComfyUI Flux examples, the regular full version needs four files:

huggingface-cli download black-forest-labs/FLUX.1-dev \
  flux1-dev.safetensors ae.safetensors --local-dir ./flux-download/

Then download the text encoders (from the ComfyUI examples page) and place every file in its ComfyUI folder:

flux1-dev.safetensors → ComfyUI/models/diffusion_models/
clip_l.safetensors → ComfyUI/models/text_encoders/
t5xxl_fp16.safetensors → ComfyUI/models/text_encoders/
ae.safetensors (VAE) → ComfyUI/models/vae/

The official guidance: "You can use t5xxl_fp8_e4m3fn_scaled.safetensors instead for lower memory usage but the fp16 one is recommended if you have more than 32GB ram." On a 3090 Ti running FP16 as primary GPU, the t5xxl_fp8_e4m3fn_scaled encoder is the safer choice.

4. (Alternative) FP8 single-file checkpoint

If the FP16 path is tight, grab the bundled FP8 checkpoint instead — it packs the transformer, both text encoders, and the VAE into one 17.2GB file:

huggingface-cli download Comfy-Org/flux1-dev flux1-dev-fp8.safetensors \
  --local-dir ./ComfyUI/models/checkpoints/

Place flux1-dev-fp8.safetensors → ComfyUI/models/checkpoints/.

Running

Start ComfyUI:

python main.py --listen

Navigate to http://localhost:8188, then download the official Flux.1 Dev workflow and drag it into the canvas.

Recommended settings (RTX 3090 Ti)

Precision: FP16 for best quality; FP8 for headroom/speed
Steps: 20 (the benchmarked configuration)
CFG / guidance: 1.0 — Flux is guidance-distilled and does not use classifier-free guidance like older SD models
Resolution: 1024×1024 (1 megapixel — the benchmarked resolution)

Results

Speed (FP16, primary): ~30 seconds per 1024×1024 image at 20 steps. The SECourses benchmark reports the RTX 3090 Ti taking around 30 seconds per image at 1 megapixel, corroborated by the backend benchmark at /check/flux-1-dev/rtx-3090-ti.
Speed (FP8, faster option): 1.27 it/s at 1 megapixel / 20 steps — about 25–26 seconds per image per the same source. FP8 trades a little precision for headroom and a slightly faster run. See /check/flux-1-dev/rtx-3090-ti.
VRAM usage: The FP16 transformer (23.8GB on disk) loads close to the 24GB ceiling — derive your own headroom budget and expect OOM risk when the 3090 Ti also drives a display. The FP8 path runs comfortably at roughly 12GB. The backend has no measured peak yet; help us by submitting yours via /contribute.
Quality notes: FP16 is the highest-fidelity path; FP8 is visually very close and the practical default when VRAM is contended.

For the full benchmark data, see /check/flux-1-dev/rtx-3090-ti.

Troubleshooting

Out of memory on the FP16 path

Most common when the 3090 Ti also drives your display — Windows/desktop compositing eats into the 24GB. Fixes, in order: (1) switch to the FP8 single-file checkpoint (Step 4); (2) use the t5xxl_fp8_e4m3fn_scaled.safetensors text encoder instead of t5xxl_fp16; (3) launch with python main.py --listen --lowvram to let ComfyUI offload aggressively.

Blank or gray output

The T5 text encoder must be loaded. Verify both clip_l.safetensors and your chosen t5xxl_*.safetensors are present in ComfyUI/models/text_encoders/ (FP16 path) — or use the all-in-one FP8 checkpoint which bundles them.

License / access error on download

Accept the license at huggingface.co/black-forest-labs/FLUX.1-dev and run huggingface-cli login before downloading. The repo is gated; an unauthenticated download returns a 401-style access error.