How much VRAM does Chroma V48 need?

About 16 GB — the minimum this recipe targets.

How hard is this setup?

Intermediate — follow the steps above.

Chroma V48 on RTX 5060 Ti: Uncensored 8.9B Flux.1-Schnell De-Distillation via GGUF in ComfyUI

What You'll Build

A working ComfyUI setup that runs Chroma V48 — the 8.9B-parameter, Apache-2.0, uncensored re-derivation of Flux.1-Schnell published by Lodestone Rock — on an RTX 5060 Ti (16GB). Because the standalone lodestones/Chroma repo is marked deprecated and points downstream users to lodestones/Chroma1-HD ("Chroma1-HD is not the old Chroma-v.50 it has been retrained from v.48"), this recipe uses the Chroma1-HD GGUF redistribution for current installs.

Hardware data: RTX 5060 Ti (16GB VRAM) · Chroma1 family officially targets a 16GB minimum on consumer hardware · See benchmark data

⚠️ Headroom is tight. The Chroma1 family is explicitly resource-intensive: in the lodestones Chroma1-Radiance ComfyUI thread a 12 GB RTX 5070 user reports the model "barely completes generation at 98–99% VRAM usage" with quality degradation. 16 GB is the comfortable floor for the V48-lineage at standard resolutions; expect to use the Q8_0 GGUF (9.74 GB on disk) or smaller, plus the fp8 T5 XXL text encoder.

Requirements

Component	Minimum	Tested
GPU	16 GB VRAM (per the Chroma1-Radiance ComfyUI thread)	RTX 5060 Ti (16 GB)
RAM	16 GB system	—
Storage	~12 GB (Q8 weights + T5 XXL fp8 + FLUX VAE)	—
Software	ComfyUI + ComfyUI-GGUF custom node by city96	—

Installation

1. Update ComfyUI

A recent ComfyUI release is required; native Chroma1-Radiance support landed in ComfyUI v3.60 (the same custom-loader stack covers the Chroma1-HD GGUF flow). See the Chroma1-Radiance ComfyUI support thread for the version note.

2. Install the GGUF custom node

From ComfyUI/custom_nodes:

git clone https://github.com/city96/ComfyUI-GGUF
cd ComfyUI-GGUF
pip install -r requirements.txt

Restart ComfyUI after installation. ComfyUI-GGUF is the loader that consumes the .gguf Chroma1-HD weights.

3. Download the Chroma V48 (Chroma1-HD) GGUF weights

Pick one quantization from the silveroxides/Chroma1-HD-GGUF repository. File sizes per quantization (verbatim from the model card):

Quant	Size
Q4_K_S	5.43 GB
Q4_0	5.43 GB
Q4_K_M	5.57 GB
Q4_1	5.97 GB
Q5_K_S	6.51 GB
Q5_K_M	6.65 GB
Q6_K	7.65 GB
Q8_0	9.74 GB

Recommendation for the 5060 Ti: Q8_0 (9.74 GB) for the highest in-family quality that still leaves headroom for the text encoder, VAE, and intermediate activations on a 16 GB card. Drop to Q4_K_M (5.57 GB) if you want to stack acceleration LoRAs or push past 1024×1024.

Drop the downloaded .gguf into ComfyUI/models/diffusion_models/.

4. Download the T5 XXL text encoder and FLUX VAE

The official lodestones/Chroma README pins the same text-encoder and VAE files the FLUX ecosystem uses:

# T5 XXL (fp8 — use this on 16 GB; the fp16 variant doubles the footprint)
wget -P ComfyUI/models/clip/ \
  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors

# FLUX VAE (ae.safetensors from the FLUX.1 schnell release)
# Place into ComfyUI/models/vae/

URLs verbatim from the lodestones/Chroma README.

5. Load the Chroma workflow

The official ComfyUI workflow JSON ships in both the Chroma1-HD repo and the deprecated lodestones/Chroma repo (ComfyUI_Chroma1-HD_T2I-workflow.json). Download it, drag it onto the ComfyUI canvas, and swap the default Load Diffusion Model node for the Unet Loader (GGUF) node from ComfyUI-GGUF, pointing it at your downloaded .gguf.

Running

Set the workflow's text-encoder and VAE nodes to the files placed in step 4. Use a 1024×1024 latent for the first run; the Chroma1-Radiance ComfyUI thread notes the family uses ~30 inference steps as the standard ComfyUI template default — start there and adjust to taste.

Trigger: Queue Prompt
Output: PNG saved to ComfyUI/output/

The first generation pays a cold-load cost (weights → VRAM, text encoder → VRAM). Subsequent generations with the same model reuse the loaded weights.

Results

Speed: Omitted. The only first-party generation-time data point in the Chroma1-HD speed thread is on an RTX 5090 (32 GB) — not comparable enough to the 5060 Ti to quote without misleading. Once community benchmarks land, the /check/ endpoint will surface them.
VRAM usage: Plan for ≥ 16 GB. The Chroma1 family's own ComfyUI discussions treat 16 GB as the comfortable minimum and report quality degradation when squeezed onto 12 GB cards.
Quality notes: Chroma V48 is a Flux.1-Schnell de-distillation: it restores the multi-step diffusion behavior that Schnell distilled away, so it runs more like a Flux.1-Dev-class model than a 4-step turbo. Don't expect Schnell-tier speed.

For the full benchmark data, see /check/chroma-v48/rtx-5060-ti.

Troubleshooting

Noise artifacts when using `--fp8_e5m2-unet`

Per the Chroma1-Radiance ComfyUI thread, the --fp8_e5m2-unet ComfyUI flag produces noise artifacts on the Chroma1 family. Stick to the default loader (or --fp8_e4m3fn-unet if you need a fp8 path that isn't GGUF).

Quality regressions from FP8 weights or acceleration LoRAs

Same thread: FP8 weight precision and standard acceleration LoRAs visibly hurt prompt adherence and surface undercooked hands/faces on the Chroma1 family. The GGUF Q8_0 path documented above sidesteps both — Q8 GGUF is generally close to bf16 in the FLUX-family quantization literature, and it doesn't require the model-weight casts that the fp8 path does.

"v48", "Chroma1-HD", "Chroma1-Base", "Chroma1-Radiance" — which one is V48?

Per lodestones/Chroma1-Base's README, "Chroma1-Base is Chroma-v.48". Chroma1-HD is explicitly "retrained from v.48" as a finetune-ready base. The deprecated lodestones/Chroma repo's chroma-unlocked-v48-detail-calibrated.safetensors is the original V48 weight file. Chroma1-Radiance is a separate output-head variant (no FLUX VAE, different decoder) — close cousin, not the same architecture, so its discussion threads are referenced here as adjacent evidence rather than ground truth for V48 specifically.