self-hosted/ai
§01·recipe · image

ERNIE-Image-Turbo on RTX 3060: 8-step text-to-image via GGUF in ComfyUI

imageintermediate10GB+ VRAMJun 14, 2026

This intermediate recipe sets up ERNIE-Image-Turbo on the RTX 3060, needing about 10 GB of VRAM.

models
tools
prerequisites
  • NVIDIA RTX 3060 (12GB VRAM) or any 12GB+ NVIDIA GPU
  • Python 3.10+
  • ComfyUI (latest) with ComfyUI-Manager

What You'll Build

A working ComfyUI text-to-image pipeline that runs Baidu's 8B ERNIE-Image-Turbo on a 12GB RTX 3060 using a step-down GGUF quant from the unsloth/ERNIE-Image-Turbo-GGUF repo, loaded through city96's ComfyUI-GGUF custom node. Eight inference steps per image at full 1024×1024 native resolution.

Hardware data: RTX 3060 (12GB VRAM) · 8 inference steps · GGUF Q6_K / Q5_K_M · See benchmark data

ℹ️ Why a Q6_K/Q5_K_M GGUF and not Q8_0 or the full BF16 release. Baidu's card states ERNIE-Image-Turbo "can run on consumer GPUs with 24G VRAM" (HF card, "Practical deployment" highlight), and a user reports OOM during inference even on a 24 GB card on the SGLang/Diffusers paths (see Troubleshooting). On a 12 GB card the usable budget after a display is closer to 11 GB, so this recipe leads with the Q6_K (6.79 GB) or Q5_K_M (5.93 GB) GGUF rather than the Q8_0 (8.69 GB) the 16 GB siblings use — the smaller diffusion-model weights leave headroom for the Ministral-3B text encoder and activations. Q8_0 sits right at the 12 GB budget once a display is attached, so it's kept as a headless-only / 16GB note below.

ℹ️ GGUF runs identically on the RTX 3060's Ampere GPU — and that's the whole point. GGUF weights are dequantized by the loader inside ComfyUI, so the path is architecture-independent: the exact same Q6_K file and the same workflow that fit a 12 GB Ada RTX 4070 fit the 12 GB Ampere RTX 3060. There is no FP8 step in this recipe — Ampere (sm_86) lacks FP8 tensor cores, so an FP8 weight path would only dequantize on the fly with no speed benefit. The GGUF Q-quant route avoids that question entirely and is the recommended path on this card.

Requirements

ComponentMinimumTested
GPU12GB VRAM NVIDIARTX 3060 (12GB, Ampere GA106 sm_86, 3584 CUDA, 360 GB/s — per TechPowerUp)
RAM16GB system RAM
Storage~15 GB for Q6_K UNet (6.79 GB) + text encoder (7.72 GB) + VAE (0.34 GB)
SoftwareComfyUI (latest), ComfyUI-Manager, Python 3.10+, PyTorch with stable CUDA 12.x wheels

The unquantized Baidu release "can run on consumer GPUs with 24G VRAM" per the official ERNIE-Image-Turbo card — the GGUF quant brings that down to where a 12GB card has room for the diffusion-model weights, the Ministral-3B text encoder, the Flux2 VAE, and activation memory. The sarcastictofu Civitai workflow (a Base-or-Turbo ERNIE-Image flow) presents the GGUF path as the lower-resource recommendation and reserves its "12GB or higher VRAM" floor for the optional FP8 path — which this recipe does not use. On the RTX 3060 the GGUF tiers below keep the peak inside the 12 GB envelope.

Installation

1. Install PyTorch (RTX 3060 is Ampere sm_86 — stock wheels work)

The RTX 3060 is Ampere (GA106), compute capability sm_86. sm_86 kernels ship in the default stable PyTorch CUDA wheels — no nightly, no --pre, and no special --index-url is required (this is the one place a Blackwell RTX 50-series recipe needs a cu128 nightly wheel; on Ampere the stock CUDA 12.x build already covers your card). The standard ComfyUI install already pulls a working build:

pip install torch torchvision torchaudio

Verify the runtime sees the device:

python -c "import torch; print(torch.version.cuda, torch.cuda.get_device_capability())"

You want a CUDA 12.x version and (8, 6) printed.

2. Install the ComfyUI-GGUF custom node

Per the city96/ComfyUI-GGUF README, clone into ComfyUI's custom_nodes directory and install the gguf Python package:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
pip install --upgrade gguf

On Windows portable ComfyUI, use the embedded interpreter instead:

git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt

Restart ComfyUI after install — the GGUF Unet loader node appears under the bootleg category.

Note on the GGUF source. city96 publishes the ComfyUI-GGUF loader node, not the ERNIE quant weights — there is no city96/ERNIE-Image-Turbo-gguf repo (that path 401-shadows on HuggingFace). The quant weights come from the unsloth/ERNIE-Image-Turbo-GGUF repo in step 3; city96's node is only the runtime that loads them.

3. Download the GGUF diffusion-model weights

Pick a Q6_K or Q5_K_M quant from the unsloth/ERNIE-Image-Turbo-GGUF repo. The unsloth card is a GGUF quant of the canonical baidu/ERNIE-Image-Turbo upstream (linked via its base_model) and credits city96's ComfyUI-GGUF as the loader tooling. On a 12 GB RTX 3060, lead with one of:

  • ernie-image-turbo-Q6_K.gguf — 6.79 GB on disk (best quality that still leaves comfortable display headroom)
  • ernie-image-turbo-Q5_K_M.gguf — 5.93 GB on disk (extra headroom if you also run the prompt enhancer)
# from your ComfyUI root — Q6_K is the recommended 12 GB tier
huggingface-cli download unsloth/ERNIE-Image-Turbo-GGUF \
  ernie-image-turbo-Q6_K.gguf \
  --local-dir ComfyUI/models/unet

Per the ComfyUI-GGUF README, GGUF diffusion-model files live in ComfyUI/models/unet.

Q8_0 is a 16 GB / headless tier, not a 12 GB tier. The same repo ships ernie-image-turbo-Q8_0.gguf (8.69 GB on disk). On a 12 GB card with a display attached (~11 GB usable), the Q8_0 weights plus the text encoder and activations push the real-time peak right up to the budget — Q8_0 is the right choice on a 16 GB card or a headless 12 GB Linux box, not a 12 GB desktop. Stay at Q6_K / Q5_K_M on the RTX 3060.

4. Download the text encoder and VAE

The GGUF diffusion model still needs the auxiliary files the workflow expects. Pull them from the Comfy-Org/ERNIE-Image repackager (the ComfyUI core team's repackaging into ComfyUI's expected layout):

# from your ComfyUI root — text encoder (Ministral-3-3B, 7.72 GB)
huggingface-cli download Comfy-Org/ERNIE-Image \
  text_encoders/ministral-3-3b.safetensors \
  --local-dir ComfyUI/models/

# VAE (Flux2 VAE, 0.34 GB)
huggingface-cli download Comfy-Org/ERNIE-Image \
  vae/flux2-vae.safetensors \
  --local-dir ComfyUI/models/

# optional prompt enhancer (6.88 GB) — skip on 12 GB unless you disable it per-run (see Running)
huggingface-cli download Comfy-Org/ERNIE-Image \
  text_encoders/ernie-image-prompt-enhancer.safetensors \
  --local-dir ComfyUI/models/

The official ComfyUI ERNIE-Image tutorial lists the same Turbo auxiliary files — ministral-3-3b.safetensors (text encoder), ernie-image-prompt-enhancer.safetensors (prompt enhancer text encoder), and flux2-vae.safetensors (VAE) — under this layout:

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 unet/
│   │   └── ernie-image-turbo-Q6_K.gguf      ← the GGUF diffusion model from step 3
│   ├── 📂 text_encoders/
│   │   ├── ministral-3-3b.safetensors
│   │   └── ernie-image-prompt-enhancer.safetensors
│   └── 📂 vae/
│       └── flux2-vae.safetensors

(The tutorial's default layout puts a full ernie-image-turbo.safetensors in diffusion_models/; this recipe replaces that slot with the GGUF in models/unet loaded via the GGUF node — see step 5.)

5. Load the Turbo workflow template

The official ComfyUI tutorial documents the base ERNIE-Image get-started flow as: update ComfyUI to the latest version (or use Comfy Cloud), open the Template menu and search for ERNIE-Image, select the ERNIE-Image workflow, then download any missing models, update the prompt, and click Run. For the Turbo variant the same tutorial page is explicit that it is a faster variant optimized with DMD and RL, generating images in just 8 steps compared to the ~50 steps required by the standard model, and it offers a separate "Download the ERNIE-Image-Turbo text-to-image workflow JSON file" link. (Baidu's own card confirms this: the Turbo checkpoint is "optimized by DMD and RL" and produces output "in only 8 inference steps" — see the HF card.) Download that Turbo JSON and load it in ComfyUI.

In the loaded Turbo template, swap the default Load Diffusion Model node for the GGUF Unet loader node (the bootleg category from ComfyUI-GGUF), pointing it at the Q6_K file you downloaded in step 3. The text encoder, VAE, and sampler graph stay as the template ships them.

Running

With the workflow loaded and the GGUF loader wired in:

  1. Set resolution to one of the Baidu-recommended sizes: 1024×1024, 848×1264, 1264×848, 768×1376, 896×1200, 1376×768, or 1200×896.
  2. Set sampler steps to 8 and guidance scale (CFG) to 1.0 — Turbo is step-distilled (DMD + RL per the Baidu HF card) and tuned for 8-step generation. Higher CFG degrades output.
  3. On a 12 GB card, leave the prompt enhancer disabled (use_pe=False in diffusers terms; in ComfyUI this is the toggle on the ERNIE prompt-enhancer node). It loads a second ~6.88 GB text encoder and is the most common way to blow the 12 GB budget. Enable it only if you drop to Q5_K_M and have closed other VRAM consumers.
  4. Hit Queue Prompt.

First run is slow due to weight load; subsequent runs reuse the cached diffusion model.

Results

  • Speed: Not quoted. No community benchmark on the RTX 3060 for ERNIE-Image-Turbo is currently cited, and /check/ernie-image-turbo/rtx-3060 reports no benchmark for this pair yet (verdict: unknown). The RTX 3060 is the weakest of the 12 GB cards (~360 GB/s memory bandwidth, 3584 CUDA cores per TechPowerUp), so a figure from any faster sibling — the Ada RTX 4070 (~504 GB/s) or the published 16 GB siblings — would be a loose upper bound at best, never an honest transfer. The /check page populates once a benchmark lands — to contribute one, see the submission form.
  • VRAM usage: This is a derived envelope, not a measured peak. The diffusion-model weights are 6.79 GB at Q6_K (or 5.93 GB at Q5_K_M) per the unsloth GGUF tree. The Flux2 VAE adds 0.34 GB per the Comfy-Org repackager, and the Ministral-3B text encoder (7.72 GB) runs once per generation then frees before the diffusion sampling pass, so the sampling-time peak is dominated by the GGUF weights + VAE + activations — roughly a ~10 GB plan on a 12 GB card with the prompt enhancer off. A measured Q6_K peak replaces this estimate once one lands at /check/.
  • Quality notes: 8-step distilled output (DMD + RL). For the cleanest fidelity stay at the recommended 1024×1024 or 848×1264 resolutions. Q6_K is the highest GGUF tier that fits a 12 GB display card with headroom; Q8_0 (8.69 GB) and the BF16 single-file (16.07 GB) are 16 GB / headless tiers.

For the full benchmark data once it lands, see /check/ernie-image-turbo/rtx-3060.

Troubleshooting

Out of memory during inference

ERNIE-Image-Turbo's unquantized paths are heavy: a user reports that on a 24 GB RTX 4090 the model loads but hits an out-of-memory error during inference on both the SGLang and Diffusers paths (baidu/ERNIE-Image Issue #4, reporter animebing); a contributor in that thread suggests pipe.enable_model_cpu_offload() for the diffusers path. On a 12 GB RTX 3060 the GGUF route sidesteps that, but if you still OOM:

  1. Disable the prompt enhancer (use_pe=False) to free the ~6.88 GB second text encoder.
  2. Drop one quant tier: the unsloth repo ships ernie-image-turbo-Q5_K_M.gguf (5.93 GB), ernie-image-turbo-Q4_K_M.gguf (5.02 GB), and ernie-image-turbo-Q4_0.gguf (4.76 GB) — drop-in replacements at the GGUF Unet loader.
  3. Lower output resolution to 1024×1024.
  4. Restart ComfyUI between runs to reset accumulated VRAM if your driver is leaking allocations.

Q8_0 OOMs on this card

The same unsloth repo ships ernie-image-turbo-Q8_0.gguf (8.69 GB), which the 16 GB siblings use as their default. On a 12 GB RTX 3060 with a display attached (~11 GB usable), the Q8_0 weights plus the resident text encoder and activations push the peak right up to the budget and OOM is likely. Stay at Q6_K (6.79 GB) or Q5_K_M (5.93 GB); reserve Q8_0 for a 16 GB card or a headless 12 GB Linux box with no display claiming VRAM.

Don't reach for an FP8 weight file on this card

The RTX 3060's Ampere GPU (sm_86) has no FP8 tensor cores — FP8 first shipped on Hopper and Ada. An FP8 ERNIE-Image weight file will load, but the runtime dequantizes it to BF16/FP16 at compute time, so you get the memory footprint of FP8 with none of the speed benefit a 40-series card sees. The GGUF Q-quant path in this recipe is the right route on Ampere; there is no reason to chase the FP8 alternative the Civitai workflow lists for 40-series users.

The GGUF Unet loader node isn't visible after install

Per the ComfyUI-GGUF README, the node lives under the bootleg category. If it's missing entirely:

  • Confirm the clone landed in ComfyUI/custom_nodes/ComfyUI-GGUF/ (not nested one level deeper).
  • Verify pip install --upgrade gguf ran in the same Python environment ComfyUI uses (use the embedded interpreter on Windows portable).
  • Restart ComfyUI fully (not just a browser refresh).

The Load Diffusion Model node throws "unsupported format" on a .gguf file

You're using the default loader, not the GGUF one. The stock ComfyUI Load Diffusion Model node only reads safetensors. Replace it with the GGUF Unet loader from the bootleg category — that's the whole point of installing the custom node in step 2.

common questions
How much VRAM does ERNIE-Image-Turbo need?

About 10 GB — the minimum this recipe targets.

Which GPUs is ERNIE-Image-Turbo tested on?

RTX 3060 (12 GB).

How hard is this setup?

Intermediate — follow the steps above.