How much VRAM does Z-Image Turbo need?

About 16 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

Z-Image Turbo on RTX 5060 Ti: 8-Step Text-to-Image with Diffusers or ComfyUI

What You'll Build

A local install of Z-Image-Turbo — Alibaba Tongyi-MAI's 6B-parameter distilled image generation model — running text-to-image at 1024×1024 in 8 inference steps on a 16GB consumer GPU. The recipe covers two paths: a Python script via HuggingFace diffusers, and the official ComfyUI workflow.

Hardware data: RTX 5060 Ti (16GB VRAM) · 8 NFEs at 1024×1024 · See benchmark data

Note on variants: The Tongyi-MAI Z-Image family ships three weight sets — Z-Image (Base), Z-Image-Turbo, and Z-Image (Distilled). This recipe targets Z-Image-Turbo, the consumer-friendly distilled variant. Fine-tunes like Juggernaut-Z (RunDiffusion) are a separate model and have their own recipes.

Requirements

Component	Minimum	Tested
GPU	16GB VRAM consumer card	RTX 5060 Ti (16GB)
RAM	16GB system RAM	—
Storage	~13GB (diffusion model + text encoder + VAE)	—
Software	Python 3.10+, PyTorch with CUDA + bf16 support	ComfyUI nightly / `diffusers` @ main

Z-Image-Turbo "fits comfortably within 16G VRAM consumer devices" per the official Tongyi-MAI model card. The RTX 5060 Ti's 16GB matches that target.

Installation

Path A — HuggingFace diffusers (Python script)

Z-Image support is in diffusers main; install from source per the official model card and the Tongyi-MAI/Z-Image GitHub README:

pip install git+https://github.com/huggingface/diffusers
pip install torch transformers accelerate safetensors

Path B — ComfyUI (official workflow)

Per the official ComfyUI tutorial, update ComfyUI to the latest nightly via ComfyUI Manager, then place three files into the standard model directories:

# from your ComfyUI root
cd models/diffusion_models
wget https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors

cd ../text_encoders
wget https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/text_encoders/qwen_3_4b.safetensors

cd ../vae
wget https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/vae/ae.safetensors

These are the Comfy-Org-packaged split-file mirror of the Tongyi-MAI weights, packaged for the ComfyUI loader graph. Load the workflow JSON from Comfy-Org/workflow_templates by dragging it into ComfyUI.

Running

Path A — diffusers snippet

The inference snippet below is verbatim from the Tongyi-MAI HF model card. Z-Image-Turbo uses 8 NFEs (the snippet uses num_inference_steps=9 and guidance_scale=0.0 per the card):

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

prompt = "A photo of a city at night, neon signs reflecting on wet pavement"
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("z_image_turbo_out.png")

Path B — ComfyUI

After dropping the workflow JSON into ComfyUI, edit the prompt node and hit Queue Prompt. The preconfigured workflow runs with the 8-NFE schedule out of the box.

Results

Speed: No community benchmark on RTX 5060 Ti yet — the official Tongyi-MAI card cites "sub-second inference latency on enterprise-grade H800 GPUs", which is not directly comparable to a consumer 5060 Ti. Once a community-submitted run lands, it will appear on /check/z-image-turbo/rtx-5060-ti. If you run it, please submit your numbers.
VRAM usage: Designed to "fit comfortably within 16G VRAM consumer devices" per the official model card — i.e. the bf16 build is the headline configuration for a 16GB card like the RTX 5060 Ti. Live measurements: /check/z-image-turbo/rtx-5060-ti.
Quality notes: Architecture is "Scalable Single-Stream DiT (S3-DiT)" with text, visual semantic, and VAE tokens concatenated into a unified sequence — design optimized for 8-NFE generation rivaling full-step competitors per the model card.

For the full benchmark data, see /check/z-image-turbo/rtx-5060-ti.

Troubleshooting

Out of memory at first generation (diffusers path)

If the bf16 pipeline doesn't fit alongside other GPU-resident apps, enable CPU offload — the Tongyi-MAI/Z-Image GitHub repo and community installs recommend pipe.enable_model_cpu_offload() after from_pretrained to move idle parts to system RAM:

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.enable_model_cpu_offload()
# do NOT call pipe.to("cuda") when using offload

ComfyUI doesn't recognize Z-Image nodes

The Z-Image loader nodes ship in ComfyUI's nightly builds, not the stable release. Update via ComfyUI Manager → "Update ComfyUI" → restart. Verified path documented on the official ComfyUI Z-Image tutorial.

Confusion with Juggernaut-Z

Juggernaut-Z is a RunDiffusion fine-tune of Z-Image Base, distributed under RunDiffusion/Juggernaut-Z-Image — a different model with its own slug. If you want the original Tongyi-MAI base or turbo weights, stick to the Tongyi-MAI/Z-Image-* repos linked above.