What You'll Build
Generate high-quality 1-megapixel images locally on your RTX 3090 Ti using Flux.1 Dev — Black Forest Labs' 12B guidance-distilled rectified-flow text-to-image transformer. No cloud, no API costs. This guide leads with the full-precision FP16 path and shows the lighter FP8 path for when you want more headroom or faster runs.
Hardware data: RTX 3090 Ti (24GB VRAM) · ~30s per 1024×1024 image at FP16 · See benchmark data
⚠️ FP16 sits near the 24GB ceiling. The full-precision
flux1-dev.safetensorstransformer is 23.8GB on disk and loads close to the card's 24GB limit. The benchmark author warns that the FP16 run may not fit into the 3090 Ti's VRAM — especially when it is your primary GPU, since Windows is also consuming VRAM for the desktop. If the FP16 path OOMs — most likely when the 3090 Ti is also driving your display — switch to the FP8 path below (~12GB runtime, comfortable headroom).
ℹ️ Non-commercial license. Flux.1 Dev ships under the FLUX.1 [dev] Non-Commercial License. You may generate images for personal and research use, but commercial use of the weights or outputs requires a separate license from Black Forest Labs.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 24GB VRAM (FP16) / 16GB (FP8 single-file) | RTX 3090 Ti (24GB) |
| RAM | 16GB | 32GB+ (recommended for FP16 t5xxl) |
| Storage | ~24GB (FP16) / ~17GB (FP8) | 35GB |
| Software | ComfyUI, Python 3.10+ | — |
The full transformer flux1-dev.safetensors is 23.8GB and the VAE ae.safetensors is 0.34GB (HF file tree). The all-in-one FP8 checkpoint flux1-dev-fp8.safetensors is 17.2GB (Comfy-Org/flux1-dev file tree).
Installation
1. Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
2. Accept the license and authenticate
Flux.1 Dev is a gated repo — accept the license on the model card first, then log in:
pip install huggingface_hub
huggingface-cli login
3. Download model files (FP16 full-precision path)
Per the official ComfyUI Flux examples, the regular full version needs four files:
huggingface-cli download black-forest-labs/FLUX.1-dev \
flux1-dev.safetensors ae.safetensors --local-dir ./flux-download/
Then download the text encoders (from the ComfyUI examples page) and place every file in its ComfyUI folder:
flux1-dev.safetensors→ComfyUI/models/diffusion_models/clip_l.safetensors→ComfyUI/models/text_encoders/t5xxl_fp16.safetensors→ComfyUI/models/text_encoders/ae.safetensors(VAE) →ComfyUI/models/vae/
The official guidance: "You can use t5xxl_fp8_e4m3fn_scaled.safetensors instead for lower memory usage but the fp16 one is recommended if you have more than 32GB ram." On a 3090 Ti running FP16 as primary GPU, the t5xxl_fp8_e4m3fn_scaled encoder is the safer choice.
4. (Alternative) FP8 single-file checkpoint
If the FP16 path is tight, grab the bundled FP8 checkpoint instead — it packs the transformer, both text encoders, and the VAE into one 17.2GB file:
huggingface-cli download Comfy-Org/flux1-dev flux1-dev-fp8.safetensors \
--local-dir ./ComfyUI/models/checkpoints/
Place flux1-dev-fp8.safetensors → ComfyUI/models/checkpoints/.
Running
Start ComfyUI:
python main.py --listen
Navigate to http://localhost:8188, then download the official Flux.1 Dev workflow and drag it into the canvas.
Recommended settings (RTX 3090 Ti)
- Precision: FP16 for best quality; FP8 for headroom/speed
- Steps: 20 (the benchmarked configuration)
- CFG / guidance: 1.0 — Flux is guidance-distilled and does not use classifier-free guidance like older SD models
- Resolution: 1024×1024 (1 megapixel — the benchmarked resolution)
Results
- Speed (FP16, primary): ~30 seconds per 1024×1024 image at 20 steps. The SECourses benchmark reports the RTX 3090 Ti taking around 30 seconds per image at 1 megapixel, corroborated by the backend benchmark at /check/flux-1-dev/rtx-3090-ti.
- Speed (FP8, faster option): 1.27 it/s at 1 megapixel / 20 steps — about 25–26 seconds per image per the same source. FP8 trades a little precision for headroom and a slightly faster run. See /check/flux-1-dev/rtx-3090-ti.
- VRAM usage: The FP16 transformer (23.8GB on disk) loads close to the 24GB ceiling — derive your own headroom budget and expect OOM risk when the 3090 Ti also drives a display. The FP8 path runs comfortably at roughly 12GB. The backend has no measured peak yet; help us by submitting yours via /contribute.
- Quality notes: FP16 is the highest-fidelity path; FP8 is visually very close and the practical default when VRAM is contended.
For the full benchmark data, see /check/flux-1-dev/rtx-3090-ti.
Troubleshooting
Out of memory on the FP16 path
Most common when the 3090 Ti also drives your display — Windows/desktop compositing eats into the 24GB. Fixes, in order: (1) switch to the FP8 single-file checkpoint (Step 4); (2) use the t5xxl_fp8_e4m3fn_scaled.safetensors text encoder instead of t5xxl_fp16; (3) launch with python main.py --listen --lowvram to let ComfyUI offload aggressively.
Blank or gray output
The T5 text encoder must be loaded. Verify both clip_l.safetensors and your chosen t5xxl_*.safetensors are present in ComfyUI/models/text_encoders/ (FP16 path) — or use the all-in-one FP8 checkpoint which bundles them.
License / access error on download
Accept the license at huggingface.co/black-forest-labs/FLUX.1-dev and run huggingface-cli login before downloading. The repo is gated; an unauthenticated download returns a 401-style access error.