self-hosted/ai
§01·recipe · image

Juggernaut Z on RTX 4080: Cinematic Photoreal Fine-Tune of Z-Image Base at BF16 via Diffusers or ComfyUI

imageintermediate16GB+ VRAMMay 29, 2026
models
tools
prerequisites
  • NVIDIA RTX 4080 (16GB VRAM) or any consumer GPU with 16GB VRAM and bf16 support
  • Python 3.10+
  • PyTorch with CUDA support and bfloat16 capability
  • ComfyUI (with the RES4LFY custom node) OR HuggingFace diffusers ≥ 0.37.1

What You'll Build

A local install of Juggernaut Z V1 — Team Juggernaut / KandooAI's photoreal fine-tune of Tongyi-MAI's Z-Image Base, released through RunDiffusion. The recipe covers two paths: a Python script via HuggingFace diffusers, and a ComfyUI workflow using the official RunDiffusion node graph. Per the HF model card, Juggernaut Z is tuned for "stronger lighting, sharper focus, more refined skin texture, and more cinematic atmosphere" relative to the upstream Base.

Hardware data: RTX 4080 (16GB VRAM) · BF16 / FP8 / GGUF variants available · See benchmark data

⚠️ License: CC BY-NC 4.0 (non-commercial). Per the HF model card, Juggernaut Z is licensed for non-commercial use only. Commercial licensing is via juggernaut@rundiffusion.com. The Civitai release page lists Apache 2.0 in error — the HF canonical card is the source of truth.

Not Z-Image Turbo. Juggernaut Z is built on Z-Image Base (not the distilled Turbo). That means a different step/CFG profile — Juggernaut Z's default is 35 steps at guidance scale 6 per the HF model card, not the 8-NFE / low-CFG pattern of the distilled Turbo. Use the settings below.

Requirements

ComponentMinimumTested
GPU16GB VRAM consumer card (bf16/fp16); ~8GB with FP8 or GGUF Q4–Q5RTX 4080 (16GB)
RAM16GB system RAM
Storage~12.3GB for bf16 / fp16 weights; ~6.2GB for fp8; ~4.8GB for Q4_K_S GGUF
SoftwarePython 3.10+, PyTorch with CUDA + bf16 support, diffusers ≥ 0.37.1ComfyUI with RES4LFY node / diffusers ≥ 0.37.1

The headline 16 GB tier is anchored on the BF16 weights themselves: the Juggernaut-Z-Image repo file listing ships the bf16 checkpoint at 12.31 GB on disk, leaving ~3 GB of headroom on a 16 GB card for the activations / VAE / latents. The same repo also ships an FP8 e4m3fn safetensors variant (6.15 GB) and GGUF quantizations (Q4_K_S 4.83 GB through Q8_0 7.34 GB) for tighter VRAM budgets. (As context, the upstream Tongyi-MAI Z-Image-Turbo model card describes Z-Image-Turbo specifically as fitting comfortably within "16G VRAM consumer devices"; Juggernaut Z is a fine-tune of Z-Image Base — not Turbo — but shares the Z-Image "Single-Stream Diffusion Transformer" architecture, so the 16 GB tier framing applies to both.)

The RTX 4080 is Ada Lovelace (AD103, sm_89). Its 4th-generation tensor cores have native E4M3 / E5M2 FP8 support, so the FP8 e4m3fn checkpoint runs natively rather than via dequant. Unlike Blackwell-class cards (sm_120), the RTX 4080 needs no special wheel selection — the default pip install torch already ships full sm_89 kernel coverage for FlashAttention-2, xformers, and the standard attention backends. No cu128-specific index URL is required.

Installation

Path A — HuggingFace diffusers (Python script)

Per the Juggernaut-Z-Image model card, Juggernaut Z loads through the standard DiffusionPipeline once diffusers is recent enough to know about ZImagePipeline — the card states it requires a version of diffusers that includes ZImagePipeline support, verified against diffusers 0.37.1 and 0.38.0:

pip install -U "diffusers>=0.37.1" transformers accelerate safetensors

Path B — ComfyUI (RunDiffusion workflow)

The official RunDiffusion ComfyUI guide ships a IMG-JuggernautZ-Txt2Img.json workflow that expects the RES4LFY custom node. Install order:

# 1. Open ComfyUI Manager → Custom Nodes Manager → install "RES4LFY", then restart ComfyUI.

# 2. Download a Juggernaut Z checkpoint to ComfyUI/models/checkpoints/
#    Pick ONE based on your VRAM budget. URLs from the official RunDiffusion repo:
#    https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/tree/main

# bf16 (12.31 GB on disk — fits 16GB VRAM with room to spare):
wget -P ComfyUI/models/checkpoints/ \
  https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/resolve/main/Juggernaut_Z_V1_by_RunDiffusion.safetensors

# fp8 e4m3fn (6.15 GB on disk — lower footprint, runs natively on Ada FP8 tensor cores):
wget -P ComfyUI/models/checkpoints/ \
  https://huggingface.co/RunDiffusion/Juggernaut-Z-Image/resolve/main/Juggernaut_Z_V1_FP8_e4m3fn.safetensors

Load the IMG-JuggernautZ-Txt2Img.json workflow into ComfyUI by dragging the file onto the canvas (download from the RunDiffusion guide linked above).

Running

Path A — diffusers snippet

The inference snippet below is verbatim from the Juggernaut-Z-Image HF model card:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "RunDiffusion/Juggernaut-Z-Image",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    "a cinematic portrait, dramatic lighting",
    guidance_scale=6.0,
    num_inference_steps=35,
).images[0]
image.save("output.png")

The HF model card lists the default sampler settings as guidance_scale=6 (valid range 6–9) and num_inference_steps=35 (valid range 25–45). Note that from_pretrained only downloads the files declared in model_index.json, so it will not pull the standalone .safetensors / .gguf variants at the repo root.

Path B — ComfyUI

After loading the official workflow JSON, edit the prompt node and hit Queue Prompt. The Civitai release page for Juggernaut Z v1.0 documents an alternative two-pass setup that the model author tunes for sharpness:

  • First pass: sampler Res_2s, scheduler Beta, 22 steps, denoise 1.00
  • Second pass: sampler Res_2s, scheduler Normal, 3 steps, denoise 0.15
  • Recommended resolutions: 960×1440 (portrait) or 1440×960 (landscape) — the Civitai notes call out that 1024×1024 "will sometimes look too grainy/noisy" with this fine-tune

Results

  • Speed: No community benchmark on RTX 4080 is published yet, and our backend has no ingested measurement for this pair (/check/juggernaut-z/rtx-4080 currently reports verdict: unknown). Generation time on a Z-Image-class DiT is dominated by memory bandwidth at these step counts; the RTX 4080 has ~716.8 GB/s of bandwidth, so per-step times will sit between the slower Ada cards and the RTX 4090 — but no first-party RTX 4080 figure exists to quote, so we omit a number rather than extrapolate one. If you run it, please submit your numbers.
  • VRAM usage: The bf16 Juggernaut Z checkpoint is 12.31 GB on disk per the HF repo listing; the RTX 4080's 16 GB absorbs the weights plus activations / VAE / latents with ~3 GB of headroom. Live measurements, once contributed, appear at /check/juggernaut-z/rtx-4080.
  • Quality notes: Per the HF card, Juggernaut Z is licensed CC BY-NC 4.0 (non-commercial; commercial licensing via juggernaut@rundiffusion.com). Tuned for "stronger lighting, sharper focus, more refined skin texture, and more cinematic atmosphere" relative to Z-Image Base.

For the full benchmark data, see /check/juggernaut-z/rtx-4080.

Troubleshooting

ComfyUI errors out with a missing custom node

The official Juggernaut Z workflow requires the RES4LFY node; install it from ComfyUI Manager → Custom Nodes, then restart ComfyUI. Documented in the RunDiffusion ComfyUI guide.

DiffusionPipeline raises "Cannot find pipeline class ZImagePipeline"

ZImagePipeline ships in diffusers 0.37.1 and later (the HF card verified against 0.37.1 and 0.38.0). Upgrade with pip install -U "diffusers>=0.37.1" per the HF model card requirements. If your environment is pinned to an older release, install from main: pip install git+https://github.com/huggingface/diffusers.

1024×1024 outputs look noisy or grainy

The Juggernaut Z author flags this on the Civitai release notes: use 960×1440 / 1440×960 instead, or apply the documented two-pass schedule (22 steps Res_2s/Beta at denoise 1.00, then 3 steps Res_2s/Normal at denoise 0.15).

You don't need quantization on a 16 GB RTX 4080 — but it's there if you want headroom

The bf16 build (12.31 GB on disk) fits the RTX 4080's 16 GB with ~3 GB to spare, so quantization is optional rather than required on this card. If you want extra headroom for higher resolutions or to colocate another model, download the FP8 e4m3fn safetensors (6.15 GB — runs natively on the 4080's FP8 tensor cores) or one of the GGUF Q4–Q5 quantizations (4.83–5.68 GB) from the HF repo instead of the bf16 build. GGUF requires a GGUF-aware loader node in ComfyUI.