self-hosted/ai
§01·recipe · image

HiDream-O1-Image on RTX 5060 Ti: 2048×2048 Text-to-Image with FP8 in ComfyUI

imageintermediate10GB+ VRAMMay 18, 2026
models
tools
prerequisites
  • NVIDIA RTX 5060 Ti (16 GB VRAM) or any 12 GB+ consumer GPU
  • Python 3.10+ and a working ComfyUI install
  • PyTorch built against CUDA 12.x — avoid PyTorch 2.9.x (Qwen3-VL incompatibility)

What You'll Build

A local text-to-image and instruction-edit pipeline for HiDream-O1-Image, an 8B unified pixel-space transformer with a reasoning-driven prompt agent, generating up to 2048×2048 natively. The FP8 variant fits in roughly 10 GB of VRAM, leaving comfortable headroom on a 16 GB RTX 5060 Ti.

Hardware data: RTX 5060 Ti (16 GB VRAM) · FP8 fits in ~10 GB per the official FP8 model card · See benchmark data

Architecture note: HiDream-O1 is a Pixel-level Unified Transformer (UiT) — no external VAE, no disjoint text encoder. Pixels, text, and task conditions share a single token space, with a Qwen3-VL backbone. This is why PyTorch 2.9.x is flagged as incompatible (Qwen3-VL bug).

Requirements

ComponentMinimumTested
GPU12 GB VRAM (FP8) — BF16/FP16 needs ~18–20 GBRTX 5060 Ti (16 GB)
RAM16 GB system RAM
StorageVerify against the drbaph FP8 HF files page and the canonical HiDream-ai files page before download
SoftwareComfyUI + HiDream_O1-ComfyUI custom node; transformers 4.57.1–5.3

Installation

1. Install the ComfyUI custom node

From a working ComfyUI checkout:

cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI.git
cd HiDream_O1-ComfyUI
python -m pip install -r requirements.txt

The Saganaki22/HiDream_O1-ComfyUI node ships with HiDream-ai-canonical model loaders and the FP8 entry points.

2. Download the FP8 weights

huggingface-cli download drbaph/HiDream-O1-Image-FP8 \
    --local-dir ComfyUI/models/diffusion_models/HiDream-O1-Image-fp8

drbaph/HiDream-O1-Image-FP8 is the FP8-mixed quantization of the canonical HiDream-ai/HiDream-O1-Image weights — the model card links to the source repo and preserves the MIT license.

A separately-quantized Dev variant may also exist in the community ecosystem (HiDream-ai ships HiDream-O1-Image-Dev at the canonical org with the 28-step distilled training). Check for FP8 redistributions on HuggingFace if you want the faster variant; this recipe targets the full FP8 model.

3. (Optional) Install flash-attn

The HiDream pipeline assumes flash-attn is available. If you skip it, the ComfyUI node falls back automatically — but the upstream inference.py would require you to edit models/pipeline.py and flip use_flash_attn: True to False.

Running

Launch ComfyUI as usual:

python main.py

In the ComfyUI graph, load the bundled HiDream-O1 example workflow from the custom node's workflows/ folder. The default loader points at HiDream-O1-Image-fp8 (full, 50 steps). The Dev variant uses 28 steps per the canonical training — if you find an FP8 redistribution and load it, you can trade off some long-text prompt adherence for fewer steps.

For a single text-to-image run via the official inference script (CLI, no ComfyUI):

python inference.py \
    --model_path /path/to/HiDream-O1-Image \
    --prompt "A bookstore window at dusk with a hand-lettered sign reading 'OPEN LATE'" \
    --output_image results/t2i.png \
    --height 2048 \
    --width 2048

First run downloads sub-modules and warms the FP8 weights into memory — there is a noticeable cold-start delay before the first sampling step; subsequent runs reuse the loaded model.

Results

  • Speed: Not quoted — neither the official model card nor the FP8 redistributor benchmark generation time on consumer GPUs. Empirical numbers will land at /check/hidream-o1-image/rtx-5060-ti once a community benchmark seeds the backend.
  • VRAM usage: ~10 GB peak with the FP8 model. The drbaph/HiDream-O1-Image-FP8 card states verbatim: "By quantizing to 8-bit floats, the model fits comfortably within ~10 GB of VRAM — making it accessible on 12 GB GPUs (RTX 3080/4070/4080, etc.) with minimal quality trade-off." The Saganaki22 ComfyUI node corroborates "~10–11 GB" for FP8 and "~18–20 GB" for BF16/FP16. See /check/hidream-o1-image/rtx-5060-ti for live data once benchmarked.
  • Quality notes: HiDream-O1 specializes in long-text rendering (0.978 on LongText-Bench-EN/ZH per the official card) and prompt adherence (DPG-Bench 89.83, GenEval 0.90). The built-in Reasoning-Driven Prompt Agent rewrites raw user input through layout, subject, and physics reasoning before generation — that's the meaning of the "-O1" suffix.

For the full benchmark data, see /check/hidream-o1-image/rtx-5060-ti.

Troubleshooting

PyTorch 2.9.x crashes on first inference

The Saganaki22/HiDream_O1-ComfyUI README and the official HF card both flag this: "PyTorch 2.9.x is not recommended due to the issue" — a Qwen3-VL backbone incompatibility. Pin PyTorch to 2.8.x or 2.10+ until upstream patches land.

Out of memory at 2048×2048

If the FP8 model still spikes past 16 GB on some samplers, drop the resolution to 1536×1536 (the loader snaps to valid resolutions internally) or switch to the Dev variant — same memory footprint, 28 steps instead of 50, and slightly lower peak activations.

flash-attn install fails on Windows / older CUDA

Skip it. The ComfyUI custom node falls back to the standard attention path automatically. If you're running the upstream CLI directly, edit models/pipeline.py line 341 and set "use_flash_attn": False as documented on the official model card.