self-hosted/ai
§01·recipe · video

Sulphur 2 on RTX 5080: Uncensored LTX-2.3 Video via ComfyUI GGUF

videoadvanced16GB+ VRAMMay 29, 2026
models
tools
prerequisites
  • NVIDIA RTX 5080 (16GB VRAM) or any 16GB consumer GPU
  • 32GB+ system RAM (the Gemma 3 12B text encoder is offloaded to CPU)
  • Python 3.10+ and a cu128 PyTorch wheel (Blackwell sm_120 — see Installation)
  • ComfyUI installed (latest version) with ComfyUI-LTXVideo, ComfyUI-GGUF, ComfyUI-KJNodes custom nodes
  • ~25GB free disk space for the Q4-tier GGUF + Gemma 3 12B QAT encoder + LTX VAE

What You'll Build

Generate uncensored text-to-video and image-to-video clips locally with Sulphur 2 — a 21B-param LTX-2.3 fine-tune from SulphurAI — on an RTX 5080 (16GB). The model card describes it as "An uncensored video generation model based on LTX 2.3 supporting both t2v and i2v natively". The upstream sulphur_dev_fp8mixed.safetensors weighs 27.16 GiB (upstream tree) — far too large to fit on 16GB VRAM, let alone alongside the Gemma 3 12B text encoder. This recipe uses the community Q4_K_S GGUF (12.29 GiB / 13.20 GB-decimal) from vantagewithai/Sulphur-2-Base-GGUF together with a quantized Gemma 3 12B QAT encoder.

Hardware data: RTX 5080 (16GB VRAM) · Q4_K_S GGUF + Gemma 3 12B QAT-Q4 encoder · See benchmark data

⚠️ Tight on 16 GB. The upstream sulphur_dev_bf16.safetensors (42.97 GiB) and sulphur_dev_fp8mixed.safetensors (27.16 GiB) shipped on SulphurAI/Sulphur-2-base do not fit in 16GB VRAM. The GGUF path below is the only one that runs on this card, and even then the encoder + DiT + VAE stack lives within 1–2 GiB of the ceiling — start with low frame counts and resolution, then scale up carefully.

Requirements

ComponentMinimumTested
GPU16GB VRAM (Blackwell sm_120 or any 16GB card)RTX 5080 (16GB)
RAM32GB32GB
Storage~25GBQ4_K_S 12.29 GiB + Gemma 3 QAT encoder 6.92 GiB + LTX VAE 2.28 GiB (Kijai tree API)
SoftwareComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodesPython 3.10+, cu128 PyTorch wheel

Sulphur 2 inherits the LTX-2.3 architecture (architecture: ltxv per the vantagewithai GGUF card) and the same Gemma 3 12B text-encoder requirement that ships with LTX-2.3. On 16GB cards, the quantized GGUF path is the only one that fits. The RTX 5080 is Blackwell sm_120: its FP8 tensor cores run native FP8 matmul at hardware speed, but the FlashAttention-2 sm_120 kernel gap still applies — see Installation for the cu128 wheel mandate.

Installation

1. Install ComfyUI and the LTX-Video custom nodes

The RTX 5080 is a Blackwell sm_120 card. Stock pip install torch ships the cu128 build (with sm_120 kernels) by default as of mid-2026; if you pin a CUDA index explicitly, use the cu128 channel — do not fall back to cu126/cu121, which lack sm_120 kernels and will fail at first inference.

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate

# Blackwell sm_120 needs the cu128 PyTorch wheel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

cd custom_nodes
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt

git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt

git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt

The canonical Sulphur-2 workflow uses LTXV-prefixed nodes (LTXVConcatAVLatent, LTXVCropGuides, LTXVPreprocess, SamplerCustomAdvanced) — all provided by ComfyUI-LTXVideo, confirmed by inspecting workflows/ltx23_t2v distilled.json on the upstream repo.

2. Download the Q4_K_S Sulphur-2 GGUF

# Q4_K_S — 12.29 GiB (13.20 GB-decimal), the sweet spot for 16GB VRAM
huggingface-cli download vantagewithai/Sulphur-2-Base-GGUF \
  sulphur_dev-Q4_K_S.gguf \
  --local-dir ComfyUI/models/unet/

Quant-tier file-size reference (exact GiB from the HF tree API; GB-decimal matches the vantagewithai card per-tier table — 21B params, architecture: ltxv):

QuantFile size (GiB)GB-decimalFits 16GB GPU?
Q3_K_S9.6310.34yes (headroom for encoder + activations)
Q3_K_M10.3711.13yes (more headroom than Q4_K_S)
Q4_012.0912.98yes
Q4_K_S12.2913.20yes — recommended
Q4_112.9513.90yes — tight
Q4_K_M13.3114.30tight — possible only at low frames / resolution
Q5_K_S14.0115.04no (no room for encoder + activations)
Q5_K_M15.0316.14no
Q6_K16.5517.77no (weights alone exceed VRAM)
Q8_021.1922.76no

The 16GB ceiling is anchored on the closest published consumer-16GB LTX-family measurement: a 16GB ComfyUI user running the architecturally-related LTX-2 19B distilled stack reported a peak of 14926 MiB (~14.6 GiB) during sampling, then OOM'd at the upscale step (Comfy-Org/ComfyUI#11726). Sulphur-2's 21B distilled weights at Q4_K_S (12.29 GiB) are a similar size to the LTX-2 distilled weights in that report; the peak across DiT + encoder + VAE on a 16GB card lives in the 13.5–15 GiB band, leaving only 1–2 GiB of headroom against the 16 GB ceiling. Q4_K_M (13.31 GiB) is feasible only at very low frame counts; Q5 and above will OOM.

3. Download the quantized Gemma 3 12B text encoder

Sulphur 2 inherits LTX-2.3's Gemma 3 12B text encoder. The full unquantized Gemma 3 12B will OOM on 16GB cards when loaded alongside the Sulphur 2 weights — this exact failure has been reported on the RTX 5080 16GB: peak 29068 MiB with the LTX-2 19B-dev-fp8 stack and an unquantized Gemma 3 12B encoder (Lightricks/ComfyUI-LTXVideo#303, GPU: RTX 5080 (16GB VRAM)). Use the QAT-Q4 GGUF instead:

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
  --local-dir ComfyUI/models/text_encoders/

huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
  mmproj-BF16.gguf \
  --local-dir ComfyUI/models/text_encoders/

The QAT-Q4 encoder is 6.92 GiB and the mmproj projector is 0.80 GiB (unsloth tree API); both are loaded by ComfyUI-GGUF's Gemma encoder node.

4. Download the LTX video VAE (Kijai community mirror)

Sulphur 2 reuses the upstream LTX video VAE — SulphurAI/Sulphur-2-base does not expose the VAE as a standalone file (it ships sulphur_dev_bf16, sulphur_dev_fp8mixed, sulphur_distil_bf16, the rank-768 LoRA, the prompt enhancer, and workflows). The simplest path for the GGUF-only flow is the community mirror by Kijai, which exposes a standalone bf16 VAE — architecture: ltxv is shared across the LTX family:

huggingface-cli download Kijai/LTXV2_comfy \
  VAE/LTX2_video_vae_bf16.safetensors \
  --local-dir ComfyUI/models/vae/

The video VAE is 2.28 GiB; file listing confirmed at Kijai/LTXV2_comfy.

5. Download the canonical Sulphur-2 workflow JSON

The canonical Sulphur 2 ComfyUI workflow lives on the upstream SulphurAI repo:

huggingface-cli download SulphurAI/Sulphur-2-base \
  "workflows/ltx23_t2v distilled.json" \
  --local-dir ComfyUI/user/default/workflows/

The upstream README's quick-start recommends downloading a dev version (fp8mixed or bf16) plus the distill LoRA. If you loaded the GGUF in step 2, you do NOT need the LoRA or the full weights — the distill is already baked into the GGUF. The README explicitly warns against mixing the two: "I'm aware the workflows contain sulphur_final right now, just use the lora or use the full models, don't use both at the same time."

Running

Launch ComfyUI:

python main.py --listen

Open the browser UI, then load the workflow downloaded in step 5:

ComfyUI/user/default/workflows/ltx23_t2v distilled.json

In the loaded graph, swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF (point it at sulphur_dev-Q4_K_S.gguf), and point the text encoder at the GGUF Gemma 3 loader from the same custom node pack. The canonical workflow defaults are tuned for high-VRAM cards — drop them on a 16GB card:

ParameterCanonical defaultRecommended on 16GBSource
Frame count18 (the LTXVPreprocess widget value in the shipped workflow)start at 65 maxLTXVPreprocess widget in ltx23_t2v distilled.json
Resolution (longer edge)1536 pxdrop to 832 px for first runResizeImagesByLongerEdge widget in the same file

Once the workflow loads cleanly at 832 px / 65 frames, scale up only while peak VRAM stays comfortably below 16 GiB in nvidia-smi.

Optional: prompt enhancer

The upstream Sulphur 2 ships a Q8_0 prompt enhancer intended for LM Studio. Per the SulphurAI README, the two files live under prompt_enhancer/ on the repo:

huggingface-cli download SulphurAI/Sulphur-2-base \
  prompt_enhancer/sulphur_prompt_enhancer_model-q8_0.gguf \
  prompt_enhancer/mmproj-BF16.gguf \
  --local-dir ComfyUI/models/prompt_enhancer/

To use it: inside your LM Studio model folder, create a folder named Sulphur, then promptenhancer inside that, and drop both files in; load the model from LM Studio's UI. Per the README, "There is no system prompt for it, just send the text (and an image) you'd like to be enhanced."

Results

  • Speed: Omitted — no published Sulphur-2 benchmark on an RTX 5080 (or any 16GB card) at the time of writing, and the closest 16GB LTX-family datapoints (#11726, #303) report VRAM, not wall-time. The RTX 5080's memory bandwidth (~960 GB/s) is roughly 2.1× the RTX 5060 Ti's (~448 GB/s) and its FP16 compute roughly 2.3× higher, so this pair will generate materially faster than its 16GB Blackwell sibling — but quoting a number would mean extrapolating across that bandwidth gap, which we don't do. Empirical RTX 5080 wall-time will land at /check/sulphur-2/rtx-5080 once a benchmark is contributed via /contribute.
  • VRAM usage: The closest cited consumer-16GB peak from the LTX family is 14926 MiB (~14.6 GiB) during sampling on a 16GB ComfyUI user running the LTX-2 19B distilled stack (Comfy-Org/ComfyUI#11726). Sulphur-2 at Q4_K_S (12.29 GiB weights) is similar in size, so plan on a runtime peak in the 13.5–15 GiB band with the QAT-Q4 Gemma encoder — within the 16 GiB ceiling but with very little headroom. If you load the unquantized Gemma 3 12B encoder instead, the peak jumps to 29068 MiB on this exact card (Lightricks/ComfyUI-LTXVideo#303, measured on RTX 5080 16GB) — that's why the QAT encoder in step 3 is mandatory.
  • Quality notes: The Sulphur 2 GGUF is a quantization of the distilled checkpoint — expect the same short-step distilled sampling profile as LTX-2.3 distilled. Q3 tier and below shows noticeable quality regression; Q4_K_S is the recommended balance.

For up-to-date benchmark data on this pair, see /check/sulphur-2/rtx-5080.

Troubleshooting

"Can I run this at full bf16 or fp8mixed on 16 GB?" — No

The upstream sulphur_dev_bf16.safetensors is 42.97 GiB and sulphur_dev_fp8mixed.safetensors is 27.16 GiB (upstream tree). Both weights alone exceed 16 GiB by a wide margin before the encoder, VAE, and activations enter VRAM — and the RTX 5080's native FP8 tensor cores do not change that (FP8 compute is a speed/efficiency win, not a memory escape hatch; the fp8mixed file is still 27 GiB on disk and in VRAM). The Q4_K_S GGUF in step 2 is the only path that runs on this card. If you have a 24GB+ card, see the RTX 5090 recipe for the native fp8mixed flow.

OOM when loading the text encoder

Same root cause as documented upstream — the default unquantized Gemma 3 12B encoder OOMs on 16GB cards when loaded alongside the Sulphur 2 weights (Lightricks/ComfyUI-LTXVideo#303 reports peak 29068 MiB on RTX 5080 (16GB VRAM) with the LTX-2 19B-dev-fp8 pipeline). Replace it with gemma-3-12b-it-qat-UD-Q4_K_XL.gguf from Unsloth (step 3 above), and enable CPU offload for the Gemma encoder via the KJNodes model-offload nodes — keep the encoder unloaded from VRAM while the DiT is sampling.

flash_attention_2 crash on first inference (Blackwell sm_120)

The RTX 5080 is Blackwell sm_120. FlashAttention-2 wheels still do not ship sm_120 kernels as of mid-2026 (Dao-AILab/flash-attention#2168 tracks the RTX 50-series CUDA error), so any node or snippet that hardcodes attn_implementation="flash_attention_2" will crash at the first attention call. ComfyUI's LTXV nodes default to SDPA and are unaffected, but if you wire in a custom node that forces FA2, switch it to sdpa (or eager). Pinning the cu128 PyTorch wheel (Installation step 1) is also required — older CUDA channels lack sm_120 kernels entirely.

"sulphur_final" referenced in the workflow but missing locally

The upstream workflow JSON contains a sulphur_final checkpoint reference that does not exist as a published file. Per the SulphurAI README: "I'm aware the workflows contain sulphur_final right now, just use the lora or use the full models, don't use both at the same time." If you used the GGUF in step 2, point the loader at sulphur_dev-Q4_K_S.gguf instead and delete or bypass any LoRA node — the distill is already baked into the GGUF weights.

Gemma GGUF loader fails or outputs gibberish

The Gemma 3 GGUF loader in ComfyUI-GGUF needed PRs #399 and #402 merged for the LTX-2 family path (Sulphur-2 inherits this via the LTX-2.3 lineage it builds on); pull the latest city96/ComfyUI-GGUF main — both PRs are now merged (Kijai/LTXV2_comfy discussion #7).

"Can I run this on 8GB VRAM?" — No, not realistically

A community walkthrough specifically about Sulphur-2 deployment (knightli.com) addresses 8GB directly: "but it is not realistic to expect high-resolution, long-video, complex workflows on 8GB." The Q3_K_S GGUF (9.63 GiB) leaves no room for the encoder + activations on an 8GB card, and aggressive offloading destroys throughput. 16GB is the practical floor — which is exactly what this recipe targets.

Slow generation

Keep the Gemma encoder offloaded with the KJNodes model-offload nodes; VRAM thrashing on a 16GB card kills wall time even on the 5080's fast GDDR7. Drop frame counts and resolution further if you observe steady-state VRAM oscillation in nvidia-smi. Empirical RTX 5080 wall-time numbers will appear at /check/sulphur-2/rtx-5080 once contributed.