What You'll Build
A local install of Z-Image-Turbo — Alibaba Tongyi-MAI's 6B-parameter distilled image generation model — running text-to-image at 1024×1024 in 8 inference steps on a 16GB consumer GPU. The recipe covers two paths: a Python script via HuggingFace diffusers, and the official ComfyUI workflow.
Hardware data: RTX 5060 Ti (16GB VRAM) · 8 NFEs at 1024×1024 · See benchmark data
Note on variants: The Tongyi-MAI Z-Image family ships three weight sets — Z-Image (Base), Z-Image-Turbo, and Z-Image (Distilled). This recipe targets Z-Image-Turbo, the consumer-friendly distilled variant. Fine-tunes like Juggernaut-Z (RunDiffusion) are a separate model and have their own recipes.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM consumer card | RTX 5060 Ti (16GB) |
| RAM | 16GB system RAM | — |
| Storage | ~13GB (diffusion model + text encoder + VAE) | — |
| Software | Python 3.10+, PyTorch with CUDA + bf16 support | ComfyUI nightly / diffusers @ main |
Z-Image-Turbo "fits comfortably within 16G VRAM consumer devices" per the official Tongyi-MAI model card. The RTX 5060 Ti's 16GB matches that target.
Installation
Path A — HuggingFace diffusers (Python script)
Z-Image support is in diffusers main; install from source per the official model card and the Tongyi-MAI/Z-Image GitHub README:
pip install git+https://github.com/huggingface/diffusers
pip install torch transformers accelerate safetensors
Path B — ComfyUI (official workflow)
Per the official ComfyUI tutorial, update ComfyUI to the latest nightly via ComfyUI Manager, then place three files into the standard model directories:
# from your ComfyUI root
cd models/diffusion_models
wget https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors
cd ../text_encoders
wget https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/text_encoders/qwen_3_4b.safetensors
cd ../vae
wget https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/vae/ae.safetensors
These are the Comfy-Org-packaged split-file mirror of the Tongyi-MAI weights, packaged for the ComfyUI loader graph. Load the workflow JSON from Comfy-Org/workflow_templates by dragging it into ComfyUI.
Running
Path A — diffusers snippet
The inference snippet below is verbatim from the Tongyi-MAI HF model card. Z-Image-Turbo uses 8 NFEs (the snippet uses num_inference_steps=9 and guidance_scale=0.0 per the card):
import torch
from diffusers import ZImagePipeline
pipe = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=False,
)
pipe.to("cuda")
prompt = "A photo of a city at night, neon signs reflecting on wet pavement"
image = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=9,
guidance_scale=0.0,
generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("z_image_turbo_out.png")
Path B — ComfyUI
After dropping the workflow JSON into ComfyUI, edit the prompt node and hit Queue Prompt. The preconfigured workflow runs with the 8-NFE schedule out of the box.
Results
- Speed: No community benchmark on RTX 5060 Ti yet — the official Tongyi-MAI card cites "sub-second inference latency on enterprise-grade H800 GPUs", which is not directly comparable to a consumer 5060 Ti. Once a community-submitted run lands, it will appear on /check/z-image/rtx-5060-ti. If you run it, please submit your numbers.
- VRAM usage: Designed to "fit comfortably within 16G VRAM consumer devices" per the official model card — i.e. the bf16 build is the headline configuration for a 16GB card like the RTX 5060 Ti. Live measurements: /check/z-image/rtx-5060-ti.
- Quality notes: Architecture is "Scalable Single-Stream DiT (S3-DiT)" with text, visual semantic, and VAE tokens concatenated into a unified sequence — design optimized for 8-NFE generation rivaling full-step competitors per the model card.
For the full benchmark data, see /check/z-image/rtx-5060-ti.
Troubleshooting
Out of memory at first generation (diffusers path)
If the bf16 pipeline doesn't fit alongside other GPU-resident apps, enable CPU offload — the Tongyi-MAI/Z-Image GitHub repo and community installs recommend pipe.enable_model_cpu_offload() after from_pretrained to move idle parts to system RAM:
pipe = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=False,
)
pipe.enable_model_cpu_offload()
# do NOT call pipe.to("cuda") when using offload
ComfyUI doesn't recognize Z-Image nodes
The Z-Image loader nodes ship in ComfyUI's nightly builds, not the stable release. Update via ComfyUI Manager → "Update ComfyUI" → restart. Verified path documented on the official ComfyUI Z-Image tutorial.
Confusion with Juggernaut-Z
Juggernaut-Z is a RunDiffusion fine-tune of Z-Image Base, distributed under RunDiffusion/Juggernaut-Z-Image — a different model with its own slug. If you want the original Tongyi-MAI base or turbo weights, stick to the Tongyi-MAI/Z-Image-* repos linked above.