What You'll Build
Generate high-quality images locally on your RTX 5090 using Flux.1 Dev — a 12 billion parameter rectified flow transformer text-to-image model from Black Forest Labs. The 5090's 32 GB of VRAM lets you run the full FP16 weights with comfortable headroom, so there is no need to quantize for fit — though Blackwell's native FP8 support gives you a faster path when you want it.
Hardware data: RTX 5090 (32GB VRAM) · ~9.55s per 1024×1024 image at FP16, 20 steps · See benchmark data
⚠️ Blackwell toolchain currency: The RTX 5090 is an sm_120 (Blackwell) GPU. Per the official ComfyUI 50-series support thread, you need "a pytorch that has been built against cuda 12.8" — stable PyTorch shipped without sm_120 kernels at launch, so use a recent ComfyUI build with a
cu128(CUDA 12.8) or newer PyTorch wheel. A too-old torch will fail at the first inference call.
ℹ️ License — non-commercial. Flux.1 Dev weights are released under the FLUX.1 [dev] Non-Commercial License. You may generate images for personal, scientific, and evaluation purposes, but commercial use of the weights requires a separate license from Black Forest Labs.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 24GB VRAM (FP16) | RTX 5090 (32GB) |
| RAM | 32GB | — |
| Storage | 35GB | 35GB |
| Software | ComfyUI + PyTorch cu128 (CUDA 12.8+) | ComfyUI, Python 3.10+ |
The full FP16 transformer weight (flux1-dev.safetensors) is ~23.8 GB on disk (23,802,932,552 bytes), so at full precision plan on a 24 GB-class card as the floor. On the 5090's 32 GB it fits with room to spare for the T5 text encoder and activations.
Installation
1. Install ComfyUI with a Blackwell-compatible PyTorch
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
# Install a PyTorch wheel built against CUDA 12.8 for Blackwell (sm_120).
# At launch this was the nightly cu128 channel; a recent stable cu128 wheel also works.
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
pip install -r requirements.txt
Do not install xformers on a Blackwell nightly stack — it can force-downgrade PyTorch back to a stable build that lacks sm_120 kernels. ComfyUI's native SDPA attention path works without it.
2. Download Flux.1 Dev Model Files
Flux.1 Dev is a gated repo — accept the license on HuggingFace first, then authenticate:
pip install huggingface_hub
huggingface-cli login # paste a token from an account that accepted the FLUX.1-dev license
huggingface-cli download black-forest-labs/FLUX.1-dev \
--local-dir ./models/
Place files in ComfyUI directories:
flux1-dev.safetensors→ComfyUI/models/unet/ae.safetensors(VAE) →ComfyUI/models/vae/clip_l.safetensors→ComfyUI/models/clip/t5xxl_fp16.safetensors→ComfyUI/models/clip/
3. Install ComfyUI Manager (Recommended)
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager
Running
Start ComfyUI:
python main.py --listen
Navigate to http://localhost:8188 in your browser. Download the official Flux.1 Dev workflow and drag it into the canvas.
Recommended Settings
For RTX 5090 with 32GB VRAM:
- Precision: FP16 (best quality, fits comfortably). Switch to FP8 for a faster path — Blackwell has native FP8 acceleration.
- Steps: 20–30 (20 is the benchmarked baseline)
- CFG Scale: 1.0 (Flux does not use classifier-free guidance like older SD models)
- Resolution: 1024×1024 (≈1 megapixel) standard; the headroom supports higher resolutions
Results
- Speed (FP16, full precision): 9.55 seconds per image at 20 steps, 1 megapixel — about 3× faster than an RTX 3090 Ti (measured by SECourses/FurkanGozukara). This is the headline full-precision metric.
- Speed (FP8, fast path): roughly 2.2–2.38 it/s at 20 steps / 1 megapixel — about 2.5× faster than a 3090 Ti at 8-bit per the SECourses wiki, and "8.78s at 2.38it/s for 3 runs" reported on the ComfyUI Comfy-Org discussion. Per-image times in that thread cluster around 8 seconds unoptimized (one user reached 5.46s with ComfyUI tuning) — treat FP8 throughput as a band, not a single number.
- VRAM usage: the FP16 transformer weight is ~23.8 GB on disk; on the 5090's 32 GB it loads with headroom for the encoder and activations. No measured peak is in the backend yet — see /check/flux-1-dev/rtx-5090.
- Quality notes: FP16 is the reference quality. FP8 trades a small amount of fidelity for speed and is the natural choice on Blackwell given the native FP8 path.
For the full benchmark data, see /check/flux-1-dev/rtx-5090. Measured a peak-VRAM figure of your own? Add it via the submission form — it becomes the next reader's first-party datapoint.
Troubleshooting
Crash at first inference / "no kernel image is available for execution"
This is the Blackwell sm_120 kernel gap. Your PyTorch was not built against CUDA 12.8. Reinstall with the cu128 wheel as in Installation step 1, per the official ComfyUI 50-series thread, and avoid xformers, which can silently revert PyTorch to a stable build without sm_120 kernels.
License / 403 error on download
Flux.1 Dev is gated. Accept the license at huggingface.co/black-forest-labs/FLUX.1-dev with the same account whose token you pass to huggingface-cli login.
Blank or gray output
The T5 text encoder must be loaded. Confirm both clip_l.safetensors and t5xxl_fp16.safetensors are present in ComfyUI/models/clip/.