self-hosted/ai
§01·recipe · image

ERNIE-Image-Turbo on RTX 5060 Ti: 8-step text-to-image via GGUF in ComfyUI

imageintermediate12GB+ VRAMMay 18, 2026
models
tools
prerequisites
  • NVIDIA RTX 5060 Ti (16GB VRAM) or any 12GB+ NVIDIA GPU
  • Python 3.10+
  • ComfyUI (latest) with ComfyUI-Manager

What You'll Build

A working ComfyUI text-to-image pipeline that runs Baidu's 8B ERNIE-Image-Turbo on a 16GB RTX 5060 Ti using the unsloth Q8_0 GGUF quant (8.69 GB on disk) loaded through city96's ComfyUI-GGUF custom node. 8 inference steps per image, full 1024×1024 native resolution, no CPU offload required at Q8_0.

Hardware data: RTX 5060 Ti (16GB VRAM) · 8 inference steps · See benchmark data

Note on empirical VRAM: the /check/ page is currently verdict: unknown — no community benchmark has landed yet. The 12 GB minimum below is the floor that the SarcasticTOFU Civitai workflow documents for its FP8 path on the same workflow — picked as a conservative safety floor for the GGUF path pending a measured Q8_0 benchmark.

Requirements

ComponentMinimumTested
GPU12GB VRAM NVIDIA (per Civitai workflow notes)RTX 5060 Ti (16GB)
RAM16GB system RAM
Storage~14 GB for Q8_0 UNet + text encoders + VAE
SoftwareComfyUI (latest), ComfyUI-Manager, Python 3.10+

The unquantized Baidu release "can run on consumer GPUs with 24G VRAM" per the official ERNIE-Image-Turbo card — Q8_0 brings that down to where a 16GB GPU has comfortable headroom for the auxiliary text encoders and VAE.

Installation

1. Install the ComfyUI-GGUF custom node

From the city96/ComfyUI-GGUF README, clone into ComfyUI's custom_nodes directory:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf

On Windows portable ComfyUI, use the embedded interpreter instead:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt

Restart ComfyUI after install — the Unet Loader (GGUF) node will appear under the bootleg category.

2. Download the Q8_0 GGUF UNet

Pick the Q8_0 quant from the unsloth/ERNIE-Image-Turbo-GGUF repo — 8.69 GB on disk. The repo lists the full quant ladder Q2_K (3.18 GB) through BF16 (16.1 GB); Q8_0 is the best quality-vs-size trade-off for a 16GB card.

# from your ComfyUI root
huggingface-cli download unsloth/ERNIE-Image-Turbo-GGUF \
  --include "*Q8_0*" \
  --local-dir ComfyUI/models/unet

Per the ComfyUI-GGUF README: GGUF UNet files live in ComfyUI/models/unet.

3. Download the text encoders and VAE

Per the official ComfyUI ERNIE-Image tutorial, Turbo needs three auxiliary files in addition to the UNet:

FileComfyUI subfolder
ministral-3-3b.safetensors (text encoder)ComfyUI/models/text_encoders/
ernie-image-prompt-enhancer.safetensors (optional, for use_pe=True)ComfyUI/models/text_encoders/
flux2-vae.safetensorsComfyUI/models/vae/

The fastest path is to let ComfyUI's template-driven flow auto-download these (see next step). Manual download links are surfaced in the ERNIE-Image-Turbo template's missing-model dialog.

4. Load the Turbo workflow template

Per the official ComfyUI docs: "Update ComfyUI to the latest version or use Comfy Cloud, go to Template and search for ERNIE-Image, select the ERNIE-Image-Turbo workflow, download any missing models, update the prompt, and click Run."

In the loaded template, swap the default Load Diffusion Model node for the Unet Loader (GGUF) node from ComfyUI-GGUF, pointing it at the Q8_0 file you downloaded in step 2. The text encoder, VAE, and sampler graph stay as the template ships them.

Running

With the workflow loaded and the GGUF loader wired in:

  1. Set resolution to one of the model card's recommended sizes: 1024×1024, 848×1264, 1264×848, 768×1376, or 1376×768.
  2. Set sampler steps to 8 and CFG / guidance scale to 1.0 — Turbo is step-distilled (DMD + RL) and explicitly rejects higher CFG values.
  3. Optionally enable the prompt enhancer (use_pe=True in Diffusers terminology — in ComfyUI, this is the toggle on the ERNIE prompt-enhancer node in the official template).
  4. Hit Queue Prompt.

First run will be slow due to weight load; subsequent runs reuse the cached UNet.

Results

  • Speed: Not quoted — no community benchmark on a 16GB-class GPU is currently cited in the sources reviewed. The /check/ page will populate once a benchmark report lands.
  • VRAM usage: Lower bound is the Q8_0 weight file at 8.69 GB (unsloth card); the Ministral-3B text encoder, Flux2-VAE, and activation memory add to that. The recipe minimum of 12 GB is the FP8-path floor documented in the SarcasticTOFU Civitai workflow notes, used here as a conservative safety floor for the GGUF path until a measured Q8_0 benchmark lands at /check/.
  • Quality notes: 8-step distilled output; for the cleanest fidelity stay at the recommended 1024×1024 or 848×1264 resolutions. Higher-bit quants (BF16 16.1 GB) won't fit a 16 GB card without offload — Q8_0 is the practical ceiling on this tier.

For the full benchmark data, see /check/ernie-image-turbo/rtx-5060-ti.

Troubleshooting

Out of memory after the first generation

The Q8_0 GGUF weights are 8.69 GB on disk, but text-encoder + VAE + activations push real-time peak meaningfully higher. If you OOM at 1264×848 or larger:

  1. Drop one quant tier: unsloth ships Q6_K (6.79 GB), Q5_K_M (5.93 GB), Q4_K_M (5.02 GB), and Q4_0 (4.76 GB) in the same repo — drop-in replacements at the GGUF loader.
  2. Lower output resolution to 1024×1024.
  3. Restart ComfyUI between runs to reset accumulated VRAM if your driver is leaking allocations.

The Unet Loader (GGUF) node isn't visible after install

Per the ComfyUI-GGUF README, the node lives under the bootleg category. If it's missing from the node menu entirely:

  • Confirm the clone landed in ComfyUI/custom_nodes/ComfyUI-GGUF/ (not nested one level deeper).
  • Verify pip install --upgrade gguf ran in the same Python environment ComfyUI uses (use the embedded interpreter on Windows portable).
  • Restart ComfyUI fully (not just refresh the browser).

The Load Diffusion Model node throws "unsupported format" on a .gguf file

You're using the default loader, not the GGUF one. The stock ComfyUI Load Diffusion Model node only reads safetensors. Replace it with Unet Loader (GGUF) from the bootleg category — that's the whole point of installing the custom node in step 1.

Auxiliary files (Ministral-3B / Flux2-VAE) didn't auto-download

Open the workflow in the ComfyUI Templates menu first (per the official tutorial) — that flow surfaces the missing-model dialog with the right Hugging Face links. Loading a third-party JSON workflow file directly will skip this dialog and silently fail at render time.