What You'll Build
A local install of Juggernaut Z V1 - Team Juggernaut / KandooAI's cinematic, photoreal fine-tune of Tongyi-MAI's Z-Image Base, released through RunDiffusion. This recipe covers two paths on an RTX 4090 (Ada Lovelace, sm_89, 24 GB VRAM): a Python script via HuggingFace diffusers, and a ComfyUI workflow using the official RunDiffusion node graph. With 24 GB of VRAM, the BF16 weights fit with substantial headroom - enabling higher native resolutions (960x1440 / 1440x960 are the author-recommended portrait/landscape presets) and num_images_per_prompt >= 2 batches that 16 GB-tier cards cannot absorb. Juggernaut Z is tuned for stronger lighting, sharper focus, refined skin texture, and a more cinematic atmosphere than the upstream Base.
Hardware data: RTX 4090 (24 GB VRAM, 1008 GB/s memory bandwidth, Ada sm_89) - runs at BF16 (12.3 GB single-file checkpoint per the HF repo file listing) with ample headroom for higher-resolution presets and small batches - See benchmark data
Warning - License: CC BY-NC 4.0 (non-commercial). Per the HF model card, Juggernaut Z is licensed for non-commercial use only. Commercial licensing is via
juggernaut@rundiffusion.com. The Civitai release page lists Apache 2.0 in error - the HF canonical card is the source of truth (per the 2026-05-19 "HF model card wins on conflicts" rule the orchestrator already encoded on the 16 GB sibling recipe).
Note - Not Z-Image Turbo, not Juggernaut X / Reborn / vanilla Z-Image Base. Juggernaut Z is built on Z-Image Base (not the distilled Turbo) and is a distinct RunDiffusion fine-tune from prior Juggernaut releases (Juggernaut X / Reborn target SDXL; vanilla Z-Image Base is the unfinetuned Tongyi-MAI upstream). The defaults are different: Juggernaut Z's HF card recommends 35 steps at guidance scale 6 (valid ranges: 25-45 steps, 6-9 CFG) per the model card, not the 8-NFE / CFG 0.0 pattern used by the Z-Image Turbo on RTX 4090 recipe.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16 GB VRAM consumer card (bf16/fp16); ~8 GB with FP8 or GGUF Q4-Q5 | RTX 4090 (24 GB) |
| RAM | 16 GB system RAM (32 GB recommended for batched generation) | - |
| Storage | ~13 GB for BF16 / FP16 weights; ~6 GB for FP8; ~5 GB for Q4_K_S GGUF | - |
| Software | Python 3.10+, PyTorch with CUDA + bf16 support, diffusers >= 0.37.1 | ComfyUI with RES4LFY node / diffusers >= 0.37.1 |
The 16 GB minimum is anchored on the BF16 weights themselves: the Juggernaut-Z-Image repo file listing ships the canonical BF16 single-file checkpoint at 12.3 GB on disk (the FP16 variant is also 12.3 GB; the Diffusers component layout in transformer/, text_encoder/, vae/ resolves the same weight tensors). With 24 GB on an RTX 4090, the BF16 weights leave ~11.7 GB of headroom for the text encoder, VAE, latents, and activations - enough to comfortably run the author-recommended 960x1440 / 1440x960 presets and num_images_per_prompt=2 batches that 16 GB cards cannot. The repo also ships an FP8 e4m3fn safetensors variant (6.15 GB on disk) and a full GGUF ladder (Q4_K_S 4.83 GB through Q8_0 7.34 GB) for memory-constrained setups, but on a 4090 there is no reason to drop precision below BF16.
This is a derived runtime envelope based on the cited on-disk weight sizes, not a measured peak - no community benchmark for Juggernaut Z on an RTX 4090 has been published as of writing. Live measurements (when they land via /contribute) will appear at /check/juggernaut-z/rtx-4090.
Unlike Blackwell-class cards (sm_120), the RTX 4090 is Ada Lovelace sm_89 - the default pip install torch already ships full sm_89 kernel coverage for FlashAttention-2, xformers, and the standard attention backends. No cu128-specific wheel selection is required.
Installation
Path A - HuggingFace diffusers (Python script)
Per the Juggernaut-Z-Image model card, Juggernaut Z loads through the standard DiffusionPipeline once diffusers is recent enough to know about ZImagePipeline:
pip install -U "diffusers>=0.37.1" transformers accelerate safetensors
Path B - ComfyUI (RunDiffusion workflow)
The official RunDiffusion ComfyUI guide ships a IMG-JuggernautZ-Txt2Img.json workflow that expects the RES4LFY custom node. Install order:
# 1. Open ComfyUI Manager -> Custom Nodes Manager -> install "RES4LFY", then restart ComfyUI.
# 2. Download the Juggernaut Z BF16 checkpoint to ComfyUI/models/checkpoints/
# On a 24 GB RTX 4090 there is no reason to drop below BF16. URLs are from the
# official RunDiffusion repo: https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/tree/main
wget -P ComfyUI/models/checkpoints/ \
https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/resolve/main/Juggernaut_Z_V1_by_RunDiffusion.safetensors
Load the IMG-JuggernautZ-Txt2Img.json workflow into ComfyUI by dragging the file onto the canvas (download from the RunDiffusion guide linked above).
Running
Path A - diffusers snippet
The inference snippet below is verbatim from the Juggernaut-Z-Image HF model card, with two 4090-specific knobs surfaced as comments:
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"RunDiffusion/Juggernaut-Z-Image",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
"a cinematic portrait, dramatic lighting",
guidance_scale=6.0, # default per HF card; valid range 6-9
num_inference_steps=35, # default per HF card; valid range 25-45
# On a 4090 24 GB you can also pass height=1440, width=960 for the
# author-recommended portrait preset, or num_images_per_prompt=2 for batched runs.
).images[0]
image.save("output.png")
The HF model card lists the default sampler settings as guidance_scale=6 (valid range 6-9) and num_inference_steps=35 (valid range 25-45).
Path B - ComfyUI
After loading the official workflow JSON, edit the prompt node and hit Queue Prompt. The Civitai release page for Juggernaut Z v1.0 documents an alternative two-pass setup that the model author tunes for sharpness:
- First pass: sampler
Res_2s, schedulerBeta, 22 steps, denoise 1.00 - Second pass: sampler
Res_2s, schedulerNormal, 3 steps, denoise 0.15 - Recommended resolutions: 960x1440 (portrait) or 1440x960 (landscape) - the Civitai notes call out that 1024x1024 "will sometimes look too grainy/noisy" with this fine-tune
The 4090's 24 GB makes both presets straightforward at BF16; you can also bump to a small batch (num_images_per_prompt=2 in diffusers, or duplicate the sampler output node in the ComfyUI graph) without running into the per-step VAE allocation that pushes a 16 GB sibling card to its limit.
Results
- Speed: No community benchmark for Juggernaut Z on RTX 4090 has been published as of writing. The closely-related Z-Image Turbo (same Scalable Single-Stream DiT architecture, distilled to 8 steps) was independently measured at ~2.3 s for a 1024x1024 image on an RTX 4090 per the release-day reporting cited in the Z-Image Turbo on RTX 4090 recipe, but that figure is not transferable to Juggernaut Z: Juggernaut Z runs the un-distilled Base path at 35 steps (vs Turbo's 8), so its per-image wall-clock will be substantially longer at the same resolution. We deliberately omit a specific seconds-per-image number here rather than fabricate one. When a measured 4090 benchmark lands via /contribute, it will appear on /check/juggernaut-z/rtx-4090.
- VRAM usage: Derived envelope of ~13-15 GB at BF16, 1024x1024, batch size 1 - based on the cited 12.3 GB on-disk BF16 weights per the HF repo file listing plus typical Z-Image-class text encoder + VAE + latent overhead. The RTX 4090's 24 GB absorbs this comfortably with ~9-11 GB of headroom for higher resolutions (960x1440, 1440x960, or up to 2048x2048 within the Z-Image Base card spec) or small batches. This is a derived envelope - not a measured peak - and will be replaced by live data once a community measurement is submitted at /check/juggernaut-z/rtx-4090.
- Quality notes: Per the HF card, Juggernaut Z is licensed CC BY-NC 4.0 (non-commercial; commercial licensing via
juggernaut@rundiffusion.com). Tuned for "stronger lighting, sharper focus, more refined skin texture, and more cinematic atmosphere" relative to Z-Image Base.
For the full benchmark data, see /check/juggernaut-z/rtx-4090.
Troubleshooting
ComfyUI errors out with a missing custom node
The official Juggernaut Z workflow requires the RES4LFY node; install it from ComfyUI Manager -> Custom Nodes, then restart ComfyUI. Documented in the RunDiffusion ComfyUI guide.
DiffusionPipeline raises "Cannot find pipeline class ZImagePipeline"
ZImagePipeline ships in diffusers 0.37.1 and later. Upgrade with pip install -U "diffusers>=0.37.1" per the HF model card requirements. If your environment is pinned to an older release, install from main: pip install git+https://github.com/huggingface/diffusers.
1024x1024 outputs look noisy or grainy
The Juggernaut Z author flags this on the Civitai release notes: use 960x1440 / 1440x960 instead, or apply the documented two-pass schedule (22 steps Res_2s/Beta at denoise 1.00, then 3 steps Res_2s/Normal at denoise 0.15). The 4090's 24 GB makes the non-square presets free of any memory penalty over the 1024-square default.
Want to push higher resolution or batch size
The 4090's 24 GB unlocks two axes the 16 GB sibling cards cannot run:
- Higher resolutions - the Z-Image Base card documents the supported range as 512x512 to 2048x2048 (total pixel area, any aspect ratio). On a 4090 you can request 1536x1536 or 1440x1920 directly through
pipe(..., height=H, width=W)without offload. - Small batches - pass
num_images_per_prompt=2in the diffusers call, or replicate the sampler output in ComfyUI, to render two variants per Queue Prompt. Memory scales roughly linearly with batch size for this architecture; batch=2 at 1024x1024 BF16 stays well within the 24 GB envelope.
If you do measure peak VRAM at these settings, please submit your numbers so /check/juggernaut-z/rtx-4090 can pick them up.