What You'll Build
A local install of Juggernaut Z V1 — Team Juggernaut's photoreal fine-tune of Tongyi-MAI's 6B Z-Image Base, trained by KandooAI and released through RunDiffusion. The recipe covers two paths: a Python script via HuggingFace diffusers, and a ComfyUI workflow using the official RunDiffusion node graph. Per the HF model card, Juggernaut Z is tuned for "stronger lighting, sharper focus, more refined skin texture, and more cinematic atmosphere" relative to the upstream Base.
Hardware data: RTX 5070 Ti (16GB VRAM) · BF16 / FP8 / GGUF variants available · See benchmark data
⚠️ License: CC BY-NC 4.0 (non-commercial). Per the HF model card, Juggernaut Z is licensed for non-commercial use only — you may not use the model or its outputs in a commercial workflow without a license. Commercial licensing is via
juggernaut@rundiffusion.com. The Civitai release page lists Apache 2.0 in error — the HF canonical card is the source of truth.
Not Z-Image Turbo. Juggernaut Z is built on Z-Image Base (not the distilled Turbo). That means a different step/CFG profile — Juggernaut Z's default is 35 steps at guidance scale 6 per the HF model card, not the 8-NFE pattern of the Z-Image-Turbo on RTX 5070 Ti recipe. Use the settings below.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM consumer card (bf16/fp16); ~8GB with FP8 or GGUF Q4–Q5 | RTX 5070 Ti (16GB) |
| RAM | 16GB system RAM | — |
| Storage | ~12.3GB for bf16 / fp16 weights; ~6.15GB for fp8; ~4.83GB for Q4_K_S GGUF | — |
| Software | Python 3.10+, PyTorch with cu128 (CUDA 12.8) + bf16 support, diffusers ≥ 0.37.1 | ComfyUI with RES4LFY node / diffusers ≥ 0.37.1 |
The headline 16 GB tier is anchored on the BF16 weights themselves: the Juggernaut-Z-Image repo file listing ships the bf16 checkpoint at 12.31 GB on disk, leaving ~3 GB of headroom on a 16 GB card for the activations / VAE / latents. The same repo also ships an FP8 e4m3fn safetensors variant (6.15 GB) and a full set of GGUF quantizations (Q4_K_S 4.83 GB through Q8_0 7.34 GB) for tighter VRAM budgets. As context, Tongyi-MAI describes the distilled sibling on its Z-Image-Turbo card as fitting comfortably within "16G VRAM consumer devices" — Juggernaut Z is a fine-tune of Z-Image Base (not Turbo) but shares the same Single-Stream Diffusion Transformer architecture per the Z-Image Base card, so the 16 GB tier framing applies to the bf16 build here too.
The RTX 5070 Ti is a Blackwell GB203 sm_120 card. Install a PyTorch build compiled against CUDA 12.8 (cu128) — earlier cu121/cu126 wheels do not ship sm_120 kernels and will fall back to slow paths or fail to launch on this GPU:
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
Installation
Path A — HuggingFace diffusers (Python script)
Per the Juggernaut-Z-Image model card, Juggernaut Z loads through the standard DiffusionPipeline once diffusers is recent enough to know about ZImagePipeline:
pip install -U "diffusers>=0.37.1" transformers accelerate safetensors
Path B — ComfyUI (RunDiffusion workflow)
The official RunDiffusion ComfyUI guide ships a IMG-JuggernautZ-Txt2Img.json workflow that expects the RES4LFY custom node. Install order:
# 1. Open ComfyUI Manager → Custom Nodes Manager → install "RES4LFY", then restart ComfyUI.
# 2. Download a Juggernaut Z checkpoint to ComfyUI/models/checkpoints/
# Pick ONE based on your VRAM budget. URLs from the official RunDiffusion repo:
# https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/tree/main
# bf16 (12.31 GB on disk — fits 16GB VRAM with room to spare):
wget -P ComfyUI/models/checkpoints/ \
https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/resolve/main/Juggernaut_Z_V1_by_RunDiffusion.safetensors
# fp8 e4m3fn (6.15 GB on disk — for ≤12 GB cards):
wget -P ComfyUI/models/checkpoints/ \
https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/resolve/main/Juggernaut_Z_V1_FP8_e4m3fn.safetensors
Load the IMG-JuggernautZ-Txt2Img.json workflow into ComfyUI by dragging the file onto the canvas (download from the RunDiffusion guide linked above).
Running
Path A — diffusers snippet
The inference snippet below is verbatim from the Juggernaut-Z-Image HF model card:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"RunDiffusion/Juggernaut-Z-Image",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
"a cinematic portrait, dramatic lighting",
guidance_scale=6.0,
num_inference_steps=35,
).images[0]
image.save("output.png")
The HF model card's Recommended Settings table lists the defaults as CFG 6 (range 6–9) and Steps 35 (range 25–45). from_pretrained only downloads the files declared in model_index.json, so it will not pull the standalone .safetensors / .gguf variants at the repo root.
Path B — ComfyUI
After loading the official workflow JSON, edit the prompt node and hit Queue Prompt. The official guide recommends starting at a moderate resolution — 1024×1024, 832×1216, or 1216×832 — before scaling up. The Civitai release page for Juggernaut Z v1.0 additionally documents a two-pass setup the model author tunes for sharpness:
- First pass: sampler
Res_2s, schedulerBeta, 22 steps, denoise 1.00 - Second pass: sampler
Res_2s, schedulerNormal, 3 steps, denoise 0.15 - Recommended resolution: 960×1440 (or a similar pixel area); the author notes that low resolutions like 1024×1024 can sometimes look grainy or noisy with this fine-tune
Results
- Speed: No RTX 5070 Ti-named benchmark for Juggernaut Z is published yet, and the backend has no measurement for this pair. The RTX 5070 Ti is the same Blackwell GB203 sm_120 die as the RTX 5080 with the same 16 GB GDDR7 tier; its ~896 GB/s of memory bandwidth and 8960 CUDA cores sit modestly below the 5080's, so quoting a 5080 (or any other card's) per-step time as if it were measured here would be a guess, not a measurement — no speed figure is quoted. When a community benchmark lands it will appear on /check/juggernaut-z/rtx-5070-ti. If you run it, please submit your numbers.
- VRAM usage: The bf16 Juggernaut Z checkpoint is 12.31 GB on disk per the HF repo listing; the 5070 Ti's 16 GB absorbs the weights plus activations / VAE / latents with ~3 GB of headroom. Live measurements: /check/juggernaut-z/rtx-5070-ti.
- Quality notes: Per the HF card, Juggernaut Z is licensed CC BY-NC 4.0 (non-commercial; commercial licensing via
juggernaut@rundiffusion.com). It is tuned for "stronger lighting, sharper focus, more refined skin texture, and more cinematic atmosphere" relative to Z-Image Base, and the card flags composition as an area with "further work planned for v2".
For the full benchmark data, see /check/juggernaut-z/rtx-5070-ti.
Troubleshooting
ComfyUI errors out with a missing custom node
The official Juggernaut Z workflow requires the RES4LFY node; install it from ComfyUI Manager → Custom Nodes, then restart ComfyUI. Documented in the RunDiffusion ComfyUI guide.
DiffusionPipeline raises "Cannot find pipeline class ZImagePipeline"
ZImagePipeline ships in diffusers 0.37.1 and later (the card states it was verified against diffusers 0.37.1 and 0.38.0). Upgrade with pip install -U "diffusers>=0.37.1" per the HF model card. If your environment is pinned to an older release, install from main: pip install git+https://github.com/huggingface/diffusers.
Torch fails to launch or runs slowly on the RTX 5070 Ti
The RTX 5070 Ti is Blackwell GB203 sm_120. Install a PyTorch build compiled against CUDA 12.8 (--index-url https://download.pytorch.org/whl/nightly/cu128) — cu121/cu126 wheels lack sm_120 kernels. If a custom node or sample snippet hardcodes attn_implementation="flash_attention_2", switch it to "sdpa" or "eager": FlashAttention-2 wheels do not yet ship sm_120 kernels (Dao-AILab#2168).
1024×1024 outputs look noisy or grainy
The Juggernaut Z author flags this on the Civitai release notes: use 960×1440 (or a similar pixel area) instead, or apply the documented two-pass schedule (22 steps Res_2s/Beta at denoise 1.00, then 3 steps Res_2s/Normal at denoise 0.15).
Tight on VRAM (≤ 12 GB card)
Download the FP8 e4m3fn safetensors (6.15 GB) or one of the GGUF Q4–Q5 quantizations (4.83–5.68 GB) from the HF repo instead of the bf16 build. Blackwell sm_120 has native FP8 tensor cores, so the FP8 path runs at hardware speed on the 5070 Ti. GGUF requires a GGUF-aware loader node in ComfyUI.