How much VRAM does Juggernaut Z need?

About 16 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

Juggernaut Z on RTX 4090: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI

What You'll Build

A local install of Juggernaut Z V1 - Team Juggernaut / KandooAI's cinematic, photoreal fine-tune of Tongyi-MAI's Z-Image Base, released through RunDiffusion. This recipe covers two paths on an RTX 4090 (Ada Lovelace, sm_89, 24 GB VRAM): a Python script via HuggingFace diffusers, and a ComfyUI workflow using the official RunDiffusion node graph. With 24 GB of VRAM, the BF16 weights fit with substantial headroom - enabling higher native resolutions (960x1440 / 1440x960 are the author-recommended portrait/landscape presets) and num_images_per_prompt >= 2 batches that 16 GB-tier cards cannot absorb. Juggernaut Z is tuned for stronger lighting, sharper focus, refined skin texture, and a more cinematic atmosphere than the upstream Base.

Hardware data: RTX 4090 (24 GB VRAM, 1008 GB/s memory bandwidth, Ada sm_89) - runs at BF16 (12.3 GB single-file checkpoint per the HF repo file listing) with ample headroom for higher-resolution presets and small batches - See benchmark data

Warning - License: CC BY-NC 4.0 (non-commercial). Per the HF model card, Juggernaut Z is licensed for non-commercial use only. Commercial licensing is via juggernaut@rundiffusion.com. The Civitai release page lists Apache 2.0 in error - the HF canonical card is the source of truth (per the 2026-05-19 "HF model card wins on conflicts" rule the orchestrator already encoded on the 16 GB sibling recipe).

Note - Not Z-Image Turbo, not Juggernaut X / Reborn / vanilla Z-Image Base. Juggernaut Z is built on Z-Image Base (not the distilled Turbo) and is a distinct RunDiffusion fine-tune from prior Juggernaut releases (Juggernaut X / Reborn target SDXL; vanilla Z-Image Base is the unfinetuned Tongyi-MAI upstream). The defaults are different: Juggernaut Z's HF card recommends 35 steps at guidance scale 6 (valid ranges: 25-45 steps, 6-9 CFG) per the model card, not the 8-NFE / CFG 0.0 pattern used by the Z-Image Turbo on RTX 4090 recipe.

Requirements

Component	Minimum	Tested
GPU	16 GB VRAM consumer card (bf16/fp16); ~8 GB with FP8 or GGUF Q4-Q5	RTX 4090 (24 GB)
RAM	16 GB system RAM (32 GB recommended for batched generation)	-
Storage	~13 GB for BF16 / FP16 weights; ~6 GB for FP8; ~5 GB for Q4_K_S GGUF	-
Software	Python 3.10+, PyTorch with CUDA + bf16 support, `diffusers` >= 0.37.1	ComfyUI with RES4LFY node / `diffusers` >= 0.37.1

The 16 GB minimum is anchored on the BF16 weights themselves: the Juggernaut-Z-Image repo file listing ships the canonical BF16 single-file checkpoint at 12.3 GB on disk (the FP16 variant is also 12.3 GB; the Diffusers component layout in transformer/, text_encoder/, vae/ resolves the same weight tensors). With 24 GB on an RTX 4090, the BF16 weights leave ~11.7 GB of headroom for the text encoder, VAE, latents, and activations - enough to comfortably run the author-recommended 960x1440 / 1440x960 presets and num_images_per_prompt=2 batches that 16 GB cards cannot. The repo also ships an FP8 e4m3fn safetensors variant (6.15 GB on disk) and a full GGUF ladder (Q4_K_S 4.83 GB through Q8_0 7.34 GB) for memory-constrained setups, but on a 4090 there is no reason to drop precision below BF16.

This is a derived runtime envelope based on the cited on-disk weight sizes, not a measured peak - no community benchmark for Juggernaut Z on an RTX 4090 has been published as of writing. Live measurements (when they land via /contribute) will appear at /check/juggernaut-z/rtx-4090.

Unlike Blackwell-class cards (sm_120), the RTX 4090 is Ada Lovelace sm_89 - the default pip install torch already ships full sm_89 kernel coverage for FlashAttention-2, xformers, and the standard attention backends. No cu128-specific wheel selection is required.

Installation

Path A - HuggingFace diffusers (Python script)

Per the Juggernaut-Z-Image model card, Juggernaut Z loads through the standard DiffusionPipeline once diffusers is recent enough to know about ZImagePipeline:

pip install -U "diffusers>=0.37.1" transformers accelerate safetensors

Path B - ComfyUI (RunDiffusion workflow)

The official RunDiffusion ComfyUI guide ships a IMG-JuggernautZ-Txt2Img.json workflow that expects the RES4LFY custom node. Install order:

# 1. Open ComfyUI Manager -> Custom Nodes Manager -> install "RES4LFY", then restart ComfyUI.

# 2. Download the Juggernaut Z BF16 checkpoint to ComfyUI/models/checkpoints/
#    On a 24 GB RTX 4090 there is no reason to drop below BF16. URLs are from the
#    official RunDiffusion repo: https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/tree/main

wget -P ComfyUI/models/checkpoints/ \
  https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/resolve/main/Juggernaut_Z_V1_by_RunDiffusion.safetensors

Load the IMG-JuggernautZ-Txt2Img.json workflow into ComfyUI by dragging the file onto the canvas (download from the RunDiffusion guide linked above).

Running

Path A - diffusers snippet

The inference snippet below is verbatim from the Juggernaut-Z-Image HF model card, with two 4090-specific knobs surfaced as comments:

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "RunDiffusion/Juggernaut-Z-Image",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    "a cinematic portrait, dramatic lighting",
    guidance_scale=6.0,         # default per HF card; valid range 6-9
    num_inference_steps=35,     # default per HF card; valid range 25-45
    # On a 4090 24 GB you can also pass height=1440, width=960 for the
    # author-recommended portrait preset, or num_images_per_prompt=2 for batched runs.
).images[0]
image.save("output.png")

The HF model card lists the default sampler settings as guidance_scale=6 (valid range 6-9) and num_inference_steps=35 (valid range 25-45).

Path B - ComfyUI

After loading the official workflow JSON, edit the prompt node and hit Queue Prompt. The Civitai release page for Juggernaut Z v1.0 documents an alternative two-pass setup that the model author tunes for sharpness:

First pass: sampler Res_2s, scheduler Beta, 22 steps, denoise 1.00
Second pass: sampler Res_2s, scheduler Normal, 3 steps, denoise 0.15
Recommended resolutions: 960x1440 (portrait) or 1440x960 (landscape) - the Civitai notes call out that 1024x1024 "will sometimes look too grainy/noisy" with this fine-tune

The 4090's 24 GB makes both presets straightforward at BF16; you can also bump to a small batch (num_images_per_prompt=2 in diffusers, or duplicate the sampler output node in the ComfyUI graph) without running into the per-step VAE allocation that pushes a 16 GB sibling card to its limit.

Results

Speed: No community benchmark for Juggernaut Z on RTX 4090 has been published as of writing. The closely-related Z-Image Turbo (same Single-Stream Diffusion Transformer (DiT) architecture, distilled to 8 steps) was independently measured at ~2.3 s for a 1024x1024 image on an RTX 4090 per the release-day reporting cited in the Z-Image Turbo on RTX 4090 recipe, but that figure is not transferable to Juggernaut Z: Juggernaut Z runs the un-distilled Base path at 35 steps (vs Turbo's 8), so its per-image wall-clock will be substantially longer at the same resolution. We deliberately omit a specific seconds-per-image number here rather than fabricate one. When a measured 4090 benchmark lands via /contribute, it will appear on /check/juggernaut-z/rtx-4090.
VRAM usage: Derived envelope of ~13-15 GB at BF16, 1024x1024, batch size 1 - based on the cited 12.3 GB on-disk BF16 weights per the HF repo file listing plus typical Z-Image-class text encoder + VAE + latent overhead. The RTX 4090's 24 GB absorbs this comfortably with ~9-11 GB of headroom for higher resolutions (960x1440, 1440x960, or up to 2048x2048 within the Z-Image Base card spec) or small batches. This is a derived envelope - not a measured peak - and will be replaced by live data once a community measurement is submitted at /check/juggernaut-z/rtx-4090.
Quality notes: Per the HF card, Juggernaut Z is licensed CC BY-NC 4.0 (non-commercial; commercial licensing via juggernaut@rundiffusion.com). Tuned for "stronger lighting, sharper focus, more refined skin texture, and more cinematic atmosphere" relative to Z-Image Base.

For the full benchmark data, see /check/juggernaut-z/rtx-4090.

Troubleshooting

ComfyUI errors out with a missing custom node

The official Juggernaut Z workflow requires the RES4LFY node; install it from ComfyUI Manager -> Custom Nodes, then restart ComfyUI. Documented in the RunDiffusion ComfyUI guide.

`DiffusionPipeline` raises "Cannot find pipeline class ZImagePipeline"

ZImagePipeline ships in diffusers 0.37.1 and later. Upgrade with pip install -U "diffusers>=0.37.1" per the HF model card requirements. If your environment is pinned to an older release, install from main: pip install git+https://github.com/huggingface/diffusers.

1024x1024 outputs look noisy or grainy

The Juggernaut Z author flags this on the Civitai release notes: use 960x1440 / 1440x960 instead, or apply the documented two-pass schedule (22 steps Res_2s/Beta at denoise 1.00, then 3 steps Res_2s/Normal at denoise 0.15). The 4090's 24 GB makes the non-square presets free of any memory penalty over the 1024-square default.

Want to push higher resolution or batch size

The 4090's 24 GB unlocks two axes the 16 GB sibling cards cannot run:

Higher resolutions - the Z-Image Base card documents the supported range as 512x512 to 2048x2048 (total pixel area, any aspect ratio). On a 4090 you can request 1536x1536 or 1440x1920 directly through pipe(..., height=H, width=W) without offload.
Small batches - pass num_images_per_prompt=2 in the diffusers call, or replicate the sampler output in ComfyUI, to render two variants per Queue Prompt. Memory scales roughly linearly with batch size for this architecture; batch=2 at 1024x1024 BF16 stays well within the 24 GB envelope.

If you do measure peak VRAM at these settings, please submit your numbers so /check/juggernaut-z/rtx-4090 can pick them up.