ERNIE-Image-Turbo on RTX 5080: 8-step text-to-image via GGUF in ComfyUI

What You'll Build

A working ComfyUI text-to-image pipeline that runs Baidu's 8B ERNIE-Image-Turbo on a 16GB RTX 5080 using the unsloth Q8_0 GGUF quant (ernie-image-turbo-Q8_0.gguf, 8.69 GB on disk) loaded through city96's ComfyUI-GGUF custom node. 8 inference steps per image, full 1024×1024 native resolution, no CPU offload required at Q8_0.

Hardware data: RTX 5080 (16GB VRAM) · 8 inference steps · GGUF Q8_0 · See benchmark data

ℹ️ Why GGUF and not the full BF16 release. Baidu's card states ERNIE-Image-Turbo "can run on consumer GPUs with 24G VRAM" (HF card) — and a user reports OOM during inference even on a 24 GB RTX 4090 on the BF16 paths in Issue #4. The RTX 5080 carries 16 GB, below that documented floor, so this recipe runs the Q8_0 GGUF quant (8.69 GB weights) through ComfyUI-GGUF rather than the full 16.07 GB BF16 single-file. The 32 GB RTX 5090 is the first consumer card that clears the BF16 floor with margin — see the RTX 5090 recipe for the no-quantization BF16 path.

Requirements

Component	Minimum	Tested
GPU	12GB VRAM NVIDIA (per Civitai workflow notes)	RTX 5080 (16GB)
RAM	16GB system RAM	—
Storage	~17 GB for Q8_0 UNet (8.69 GB) + text encoder (7.72 GB) + VAE (0.34 GB)	—
Software	ComfyUI (latest), ComfyUI-Manager, Python 3.10+, PyTorch with CUDA 12.8 (cu128) wheels for Blackwell sm_120	—

The unquantized Baidu release "can run on consumer GPUs with 24G VRAM" per the official ERNIE-Image-Turbo card — the Q8_0 GGUF brings that down to where a 16GB GPU has comfortable headroom for the Ministral-3B text encoder, the Flux2 VAE, and activation memory. The SarcasticTOFU Civitai workflow (a Base-or-Turbo ERNIE-Image flow) documents a 12 GB minimum for its FP8 path; the same floor applies to the Q8_0 GGUF path on this tier.

Installation

1. Use the cu128 PyTorch wheels for Blackwell sm_120

The RTX 5080 is Blackwell sm_120 — kernels for this architecture first ship in CUDA 12.8 (cu128) PyTorch wheels; the older cu126 default misses them. The ComfyUI portable Windows build ships cu128 by default. For a manual install, pin:

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

Verify the runtime sees the device:

python -c "import torch; print(torch.version.cuda, torch.cuda.get_device_capability())"

You want 12.8 and (12, 0) printed.

2. Install the ComfyUI-GGUF custom node

From the city96/ComfyUI-GGUF README, clone into ComfyUI's custom_nodes directory:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf

On Windows portable ComfyUI, use the embedded interpreter instead:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt

Restart ComfyUI after install — the Unet Loader (GGUF) node appears under the bootleg category.

3. Download the Q8_0 GGUF UNet

Pick the Q8_0 quant from the unsloth/ERNIE-Image-Turbo-GGUF repo — ernie-image-turbo-Q8_0.gguf, 8.69 GB on disk. The repo lists a full quant ladder from Q2_K (3.18 GB) through BF16 (16.07 GB); Q8_0 is the best quality-vs-size trade-off for a 16GB card. The card credits city96's ComfyUI-GGUF as the loader tooling and links back to the canonical Baidu/ERNIE-Image-Turbo upstream.

# from your ComfyUI root
huggingface-cli download unsloth/ERNIE-Image-Turbo-GGUF \
  ernie-image-turbo-Q8_0.gguf \
  --local-dir ComfyUI/models/unet

Per the ComfyUI-GGUF README, GGUF UNet files live in ComfyUI/models/unet.

4. Download the text encoder and VAE

The GGUF UNet still needs the auxiliary files the workflow expects. Pull them from the Comfy-Org/ERNIE-Image repackager (the ComfyUI core team's repackaging into ComfyUI's expected layout):

# from your ComfyUI root — text encoder (Ministral-3-3B, 7.72 GB)
huggingface-cli download Comfy-Org/ERNIE-Image \
  text_encoders/ministral-3-3b.safetensors \
  --local-dir ComfyUI/models/

# optional prompt enhancer (6.88 GB) — only if you enable use_pe
huggingface-cli download Comfy-Org/ERNIE-Image \
  text_encoders/ernie-image-prompt-enhancer.safetensors \
  --local-dir ComfyUI/models/

# VAE (Flux2 VAE, 0.34 GB)
huggingface-cli download Comfy-Org/ERNIE-Image \
  vae/flux2-vae.safetensors \
  --local-dir ComfyUI/models/

The official ComfyUI ERNIE-Image tutorial lists the same three Turbo auxiliary files — ministral-3-3b.safetensors (text encoder), ernie-image-prompt-enhancer.safetensors (prompt enhancer text encoder), and flux2-vae.safetensors (VAE) — under the expected layout:

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └── ernie-image-turbo.safetensors
│   ├── 📂 text_encoders/
│   │   ├── ministral-3-3b.safetensors
│   │   └── ernie-image-prompt-enhancer.safetensors
│   └── 📂 vae/
│       └── flux2-vae.safetensors

(You replace the diffusion_models/ernie-image-turbo.safetensors slot with the Q8_0 GGUF in models/unet loaded via the GGUF node — see step 5.)

5. Load the Turbo workflow template

The official ComfyUI tutorial documents the base ERNIE-Image Get-started flow as four steps: "1. Update ComfyUI to the latest version or use Comfy Cloud", "2. Go to Template and search for ERNIE-Image", "3. Select the ERNIE-Image workflow", "4. Download any missing models, update the prompt, and click Run." For Turbo specifically, the same tutorial page provides a separate "Download the ERNIE-Image-Turbo text-to-image workflow JSON file" link (the page describes Turbo as "a faster variant optimized with DMD and RL, generating images in just 8 steps compared to the ~50 steps required by the standard model"). Download that Turbo JSON and load it in ComfyUI.

In the loaded Turbo template, swap the default Load Diffusion Model node for the Unet Loader (GGUF) node from ComfyUI-GGUF, pointing it at the Q8_0 file you downloaded in step 3. The text encoder, VAE, and sampler graph stay as the template ships them.

Running

With the workflow loaded and the GGUF loader wired in:

Set resolution to one of the Baidu-recommended sizes: 1024×1024, 848×1264, 1264×848, 768×1376, 896×1200, 1376×768, or 1200×896.
Set sampler steps to 8 and guidance scale (CFG) to 1.0 — Turbo is step-distilled (DMD + RL per the Baidu HF card) and tuned for 8-step generation. Higher CFG degrades output.
Optionally enable the prompt enhancer (use_pe=True in diffusers terminology; in ComfyUI this is the toggle on the ERNIE prompt-enhancer node in the official template). It adds ~6.88 GB of resident VRAM but improves complex-prompt fidelity.
Hit Queue Prompt.

First run is slow due to weight load; subsequent runs reuse the cached UNet.

Results

Speed: Not quoted. No community benchmark on the RTX 5080 (or any same-config ERNIE-Image-Turbo run) is currently cited in the sources reviewed, and the RTX 5080's memory bandwidth (~960 GB/s) is roughly 2× the RTX 5060 Ti's (~448 GB/s), so a 5060 Ti figure would not transfer. The /check/ernie-image-turbo/rtx-5080 page will populate once a benchmark lands — to contribute one, see the submission form.
VRAM usage: Lower bound is the Q8_0 weight file at 8.69 GB (unsloth GGUF tree); the Ministral-3B text encoder (7.72 GB), Flux2 VAE (0.34 GB), the optional prompt enhancer (6.88 GB), and activation memory add to that (not all resident simultaneously — the text encoders run once per generation, then offload). The 12 GB recipe minimum is the FP8/GGUF-path floor documented in the SarcasticTOFU Civitai workflow notes, used here as a conservative safety floor until a measured Q8_0 benchmark lands at /check/.
Quality notes: 8-step distilled output (DMD + RL). For the cleanest fidelity stay at the recommended 1024×1024 or 848×1264 resolutions. Higher-bit quants (BF16 16.07 GB) won't fit a 16 GB card alongside the text encoders without offload — Q8_0 is the practical ceiling on this tier.

For the full benchmark data once it lands, see /check/ernie-image-turbo/rtx-5080.

Troubleshooting

Out of memory after the first generation

The Q8_0 GGUF weights are 8.69 GB on disk, but text-encoder + VAE + activations push real-time peak meaningfully higher. If you OOM at 1264×848 or larger:

Drop one quant tier: the unsloth repo ships ernie-image-turbo-Q6_K.gguf (6.79 GB), ernie-image-turbo-Q5_K_M.gguf (5.93 GB), ernie-image-turbo-Q4_K_M.gguf (5.02 GB), and ernie-image-turbo-Q4_0.gguf (4.76 GB) in the same repo — drop-in replacements at the GGUF loader.
Disable the prompt enhancer (use_pe=False) to free ~6.88 GB of resident text-encoder memory.
Lower output resolution to 1024×1024.
Restart ComfyUI between runs to reset accumulated VRAM if your driver is leaking allocations.

Blackwell sm_120 kernel missing / "no kernel image is available for execution on the device"

The RTX 5080 is Blackwell sm_120 — kernels for this architecture first ship in CUDA 12.8 (cu128) PyTorch wheels. If your install uses the older cu126 default you'll see kernel-missing errors at the first inference step. Reinstall PyTorch per step 1. The same gap affects FlashAttention 2 — FA2 wheels for sm_120 are still incomplete as of mid-2026 (see Dao-AILab/flash-attention#2168). ComfyUI's default attention path (PyTorch SDPA) covers sm_120, so the stock GGUF workflow is unaffected; only manual flash_attn_func calls hit the gap.

The `Unet Loader (GGUF)` node isn't visible after install

Per the ComfyUI-GGUF README, the node lives under the bootleg category. If it's missing from the node menu entirely:

Confirm the clone landed in ComfyUI/custom_nodes/ComfyUI-GGUF/ (not nested one level deeper).
Verify pip install --upgrade gguf ran in the same Python environment ComfyUI uses (use the embedded interpreter on Windows portable).
Restart ComfyUI fully (not just refresh the browser).

The `Load Diffusion Model` node throws "unsupported format" on a `.gguf` file

You're using the default loader, not the GGUF one. The stock ComfyUI Load Diffusion Model node only reads safetensors. Replace it with Unet Loader (GGUF) from the bootleg category — that's the whole point of installing the custom node in step 2.