What You'll Build
A local text-to-image and instruction-edit pipeline for HiDream-O1-Image, an 8B unified pixel-space transformer with a reasoning-driven prompt agent, generating up to 2048×2048 natively. The FP8 variant fits in roughly 10 GB of VRAM, leaving comfortable headroom on a 16 GB RTX 5060 Ti.
Hardware data: RTX 5060 Ti (16 GB VRAM) · FP8 fits in ~10 GB per the official FP8 model card · See benchmark data
Architecture note: HiDream-O1 is a Pixel-level Unified Transformer (UiT) — no external VAE, no disjoint text encoder. Pixels, text, and task conditions share a single token space, with a Qwen3-VL backbone. This is why PyTorch 2.9.x is flagged as incompatible (Qwen3-VL bug).
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 12 GB VRAM (FP8) — BF16/FP16 needs ~18–20 GB | RTX 5060 Ti (16 GB) |
| RAM | 16 GB system RAM | — |
| Storage | Verify against the drbaph FP8 HF files page and the canonical HiDream-ai files page before download | — |
| Software | ComfyUI + HiDream_O1-ComfyUI custom node; transformers 4.57.1–5.3 | — |
Installation
1. Install the ComfyUI custom node
From a working ComfyUI checkout:
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI.git
cd HiDream_O1-ComfyUI
python -m pip install -r requirements.txt
The Saganaki22/HiDream_O1-ComfyUI node ships with HiDream-ai-canonical model loaders and the FP8 entry points.
2. Download the FP8 weights
huggingface-cli download drbaph/HiDream-O1-Image-FP8 \
--local-dir ComfyUI/models/diffusion_models/HiDream-O1-Image-fp8
drbaph/HiDream-O1-Image-FP8 is the FP8-mixed quantization of the canonical HiDream-ai/HiDream-O1-Image weights — the model card links to the source repo and preserves the MIT license.
A separately-quantized Dev variant may also exist in the community ecosystem (HiDream-ai ships HiDream-O1-Image-Dev at the canonical org with the 28-step distilled training). Check for FP8 redistributions on HuggingFace if you want the faster variant; this recipe targets the full FP8 model.
3. (Optional) Install flash-attn
The HiDream pipeline assumes flash-attn is available. If you skip it, the ComfyUI node falls back automatically — but the upstream inference.py would require you to edit models/pipeline.py and flip use_flash_attn: True to False.
Running
Launch ComfyUI as usual:
python main.py
In the ComfyUI graph, load the bundled HiDream-O1 example workflow from the custom node's workflows/ folder. The default loader points at HiDream-O1-Image-fp8 (full, 50 steps). The Dev variant uses 28 steps per the canonical training — if you find an FP8 redistribution and load it, you can trade off some long-text prompt adherence for fewer steps.
For a single text-to-image run via the official inference script (CLI, no ComfyUI):
python inference.py \
--model_path /path/to/HiDream-O1-Image \
--prompt "A bookstore window at dusk with a hand-lettered sign reading 'OPEN LATE'" \
--output_image results/t2i.png \
--height 2048 \
--width 2048
First run downloads sub-modules and warms the FP8 weights into memory — there is a noticeable cold-start delay before the first sampling step; subsequent runs reuse the loaded model.
Results
- Speed: Not quoted — neither the official model card nor the FP8 redistributor benchmark generation time on consumer GPUs. Empirical numbers will land at /check/hidream-o1-image/rtx-5060-ti once a community benchmark seeds the backend.
- VRAM usage: ~10 GB peak with the FP8 model. The drbaph/HiDream-O1-Image-FP8 card states verbatim: "By quantizing to 8-bit floats, the model fits comfortably within ~10 GB of VRAM — making it accessible on 12 GB GPUs (RTX 3080/4070/4080, etc.) with minimal quality trade-off." The Saganaki22 ComfyUI node corroborates "~10–11 GB" for FP8 and "~18–20 GB" for BF16/FP16. See /check/hidream-o1-image/rtx-5060-ti for live data once benchmarked.
- Quality notes: HiDream-O1 specializes in long-text rendering (0.978 on LongText-Bench-EN/ZH per the official card) and prompt adherence (DPG-Bench 89.83, GenEval 0.90). The built-in Reasoning-Driven Prompt Agent rewrites raw user input through layout, subject, and physics reasoning before generation — that's the meaning of the "-O1" suffix.
For the full benchmark data, see /check/hidream-o1-image/rtx-5060-ti.
Troubleshooting
PyTorch 2.9.x crashes on first inference
The Saganaki22/HiDream_O1-ComfyUI README and the official HF card both flag this: "PyTorch 2.9.x is not recommended due to the issue" — a Qwen3-VL backbone incompatibility. Pin PyTorch to 2.8.x or 2.10+ until upstream patches land.
Out of memory at 2048×2048
If the FP8 model still spikes past 16 GB on some samplers, drop the resolution to 1536×1536 (the loader snaps to valid resolutions internally) or switch to the Dev variant — same memory footprint, 28 steps instead of 50, and slightly lower peak activations.
flash-attn install fails on Windows / older CUDA
Skip it. The ComfyUI custom node falls back to the standard attention path automatically. If you're running the upstream CLI directly, edit models/pipeline.py line 341 and set "use_flash_attn": False as documented on the official model card.