What You'll Build
Generate uncensored text-to-video and image-to-video clips locally with Sulphur 2 — a 21B-param LTX-2.3 fine-tune from SulphurAI — on an RTX 5070 Ti (16GB). The model card describes it as "An uncensored video generation model based on LTX 2.3 supporting both t2v and i2v natively". The upstream sulphur_dev_fp8mixed.safetensors weighs 27.16 GiB (upstream tree) — far too large to fit on 16GB VRAM, let alone alongside the Gemma 3 12B text encoder. This recipe uses the community Q4_K_S GGUF (12.29 GiB / 13.20 GB-decimal) from vantagewithai/Sulphur-2-Base-GGUF together with a quantized Gemma 3 12B QAT encoder.
Hardware data: RTX 5070 Ti (16GB VRAM) · Q4_K_S GGUF + Gemma 3 12B QAT-Q4 encoder · See benchmark data
⚠️ Tight on 16 GB. The upstream
sulphur_dev_bf16.safetensors(42.97 GiB) andsulphur_dev_fp8mixed.safetensors(27.16 GiB) shipped on SulphurAI/Sulphur-2-base do not fit in 16GB VRAM. The GGUF path below is the only one that runs on this card, and even then the encoder + DiT + VAE stack lives within 1–2 GiB of the ceiling — start with low frame counts and resolution, then scale up carefully.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 16GB VRAM (Blackwell sm_120 or any 16GB card) | RTX 5070 Ti (16GB GDDR7, 256-bit, ~896 GB/s) |
| RAM | 32GB | 32GB |
| Storage | ~25GB | Q4_K_S 12.29 GiB + Gemma 3 QAT encoder 6.92 GiB + LTX VAE 2.28 GiB (Kijai tree API) |
| Software | ComfyUI + ComfyUI-LTXVideo + ComfyUI-GGUF + KJNodes | Python 3.10+, cu128 PyTorch wheel |
Sulphur 2 inherits the LTX-2.3 architecture (architecture: ltxv per the vantagewithai GGUF card) and the same Gemma 3 12B text-encoder requirement that ships with LTX-2.3. On 16GB cards, the quantized GGUF path is the only one that fits. The RTX 5070 Ti is Blackwell GB203 sm_120 (8960 CUDA cores, 16 GB GDDR7, PCIe Gen5 x16, 300 W): its FP8 tensor cores run native FP8 matmul at hardware speed, but the FlashAttention-2 sm_120 kernel gap still applies — see Installation for the cu128 wheel mandate.
Installation
1. Install ComfyUI and the LTX-Video custom nodes
The RTX 5070 Ti is a Blackwell GB203 sm_120 card. Stock pip install torch ships the cu128 build (with sm_120 kernels) by default as of mid-2026; if you pin a CUDA index explicitly, use the cu128 channel — do not fall back to cu126/cu121, which lack sm_120 kernels and will fail at first inference.
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv .venv
source .venv/bin/activate
# Blackwell sm_120 needs the cu128 PyTorch wheel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
cd custom_nodes
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git
pip install -r ComfyUI-LTXVideo/requirements.txt
git clone https://github.com/city96/ComfyUI-GGUF.git
pip install -r ComfyUI-GGUF/requirements.txt
git clone https://github.com/kijai/ComfyUI-KJNodes.git
pip install -r ComfyUI-KJNodes/requirements.txt
The canonical Sulphur-2 workflow uses LTXV-prefixed nodes (LTXVConcatAVLatent, LTXVCropGuides, LTXVPreprocess, SamplerCustomAdvanced) — all provided by ComfyUI-LTXVideo, confirmed by inspecting workflows/ltx23_t2v distilled.json on the upstream repo.
2. Download the Q4_K_S Sulphur-2 GGUF
# Q4_K_S — 12.29 GiB (13.20 GB-decimal), the sweet spot for 16GB VRAM
huggingface-cli download vantagewithai/Sulphur-2-Base-GGUF \
sulphur_dev-Q4_K_S.gguf \
--local-dir ComfyUI/models/unet/
Quant-tier file-size reference (exact GiB from the HF tree API; GB-decimal matches the vantagewithai card per-tier table — 21B params, architecture: ltxv):
| Quant | File size (GiB) | GB-decimal | Fits 16GB GPU? |
|---|---|---|---|
| Q3_K_S | 9.63 | 10.34 | yes (headroom for encoder + activations) |
| Q3_K_M | 10.37 | 11.13 | yes (more headroom than Q4_K_S) |
| Q4_0 | 12.09 | 12.98 | yes |
| Q4_K_S | 12.29 | 13.20 | yes — recommended |
| Q4_1 | 12.95 | 13.90 | yes — tight |
| Q4_K_M | 13.31 | 14.30 | tight — possible only at low frames / resolution |
| Q5_K_S | 14.01 | 15.04 | no (no room for encoder + activations) |
| Q5_K_M | 15.03 | 16.14 | no |
| Q6_K | 16.55 | 17.77 | no (weights alone exceed VRAM) |
| Q8_0 | 21.19 | 22.76 | no |
The 16GB ceiling is anchored on the closest published consumer-16GB LTX-family measurement: a 16GB ComfyUI user running the architecturally-related LTX-2 19B distilled stack reported a peak of 14926 MiB (~14.6 GiB) during sampling, then OOM'd at the upscale step (Comfy-Org/ComfyUI#11726). Sulphur-2's 21B distilled weights at Q4_K_S (12.29 GiB) are a similar size to the LTX-2 distilled weights in that report; the peak across DiT + encoder + VAE on a 16GB card lives in the 13.5–15 GiB band, leaving only 1–2 GiB of headroom against the 16 GB ceiling. Q4_K_M (13.31 GiB) is feasible only at very low frame counts; Q5 and above will OOM.
3. Download the quantized Gemma 3 12B text encoder
Sulphur 2 inherits LTX-2.3's Gemma 3 12B text encoder. The full unquantized Gemma 3 12B will OOM on 16GB cards when loaded alongside the Sulphur 2 weights — this exact failure has been reported on a 16GB Blackwell card: peak 29068 MiB with the LTX-2 19B-dev-fp8 stack and an unquantized Gemma 3 12B encoder (Lightricks/ComfyUI-LTXVideo#303, measured on RTX 5080 (16GB VRAM) — the same 16 GB GDDR7 / Blackwell envelope as the 5070 Ti). Use the QAT-Q4 GGUF instead:
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
gemma-3-12b-it-qat-UD-Q4_K_XL.gguf \
--local-dir ComfyUI/models/text_encoders/
huggingface-cli download unsloth/gemma-3-12b-it-qat-GGUF \
mmproj-BF16.gguf \
--local-dir ComfyUI/models/text_encoders/
The QAT-Q4 encoder is 6.92 GiB and the mmproj projector is 0.80 GiB (unsloth tree API); both are loaded by ComfyUI-GGUF's Gemma encoder node.
4. Download the LTX video VAE (Kijai community mirror)
Sulphur 2 reuses the upstream LTX video VAE — SulphurAI/Sulphur-2-base does not expose the VAE as a standalone file (it ships sulphur_dev_bf16, sulphur_dev_fp8mixed, sulphur_distil_bf16, the rank-768 LoRA, the prompt enhancer, and workflows). The simplest path for the GGUF-only flow is the community mirror by Kijai, which exposes a standalone bf16 VAE — architecture: ltxv is shared across the LTX family:
huggingface-cli download Kijai/LTXV2_comfy \
VAE/LTX2_video_vae_bf16.safetensors \
--local-dir ComfyUI/models/vae/
The video VAE is 2.28 GiB; file listing confirmed at Kijai/LTXV2_comfy.
5. Download the canonical Sulphur-2 workflow JSON
The canonical Sulphur 2 ComfyUI workflow lives on the upstream SulphurAI repo:
huggingface-cli download SulphurAI/Sulphur-2-base \
"workflows/ltx23_t2v distilled.json" \
--local-dir ComfyUI/user/default/workflows/
The upstream README's quick-start recommends downloading a dev version (fp8mixed or bf16) plus the distill LoRA. If you loaded the GGUF in step 2, you do NOT need the LoRA or the full weights — the distill is already baked into the GGUF. The README explicitly warns against mixing the two: "I'm aware the workflows contain sulphur_final right now, just use the lora or use the full models, don't use both at the same time".
Running
Launch ComfyUI:
python main.py --listen
Open the browser UI, then load the workflow downloaded in step 5:
ComfyUI/user/default/workflows/ltx23_t2v distilled.json
In the loaded graph, swap the default UNet loader for the Unet Loader (GGUF) node from ComfyUI-GGUF (point it at sulphur_dev-Q4_K_S.gguf), and point the text encoder at the GGUF Gemma 3 loader from the same custom node pack. The canonical workflow defaults are tuned for high-VRAM cards — drop them on a 16GB card:
| Parameter | Canonical default | Recommended on 16GB | Source |
|---|---|---|---|
| Frame count | 18 (the LTXVPreprocess widget value in the shipped workflow) | start at 65 max | LTXVPreprocess widget in ltx23_t2v distilled.json |
| Resolution (longer edge) | 1536 px | drop to 832 px for first run | ResizeImagesByLongerEdge widget in the same file |
Once the workflow loads cleanly at 832 px / 65 frames, scale up only while peak VRAM stays comfortably below 16 GiB in nvidia-smi.
Optional: prompt enhancer
The upstream Sulphur 2 ships a Q8_0 prompt enhancer intended for LM Studio. Per the SulphurAI README, the two files live under prompt_enhancer/ on the repo:
huggingface-cli download SulphurAI/Sulphur-2-base \
prompt_enhancer/sulphur_prompt_enhancer_model-q8_0.gguf \
prompt_enhancer/mmproj-BF16.gguf \
--local-dir ComfyUI/models/prompt_enhancer/
To use it: inside your LM Studio model folder, create a folder named Sulphur, then promptenhancer inside that, and drop both files in; load the model from LM Studio's UI. Per the README, "There is no system prompt for it, just send the text (and an image) you'd like to be enhanced".
Results
- Speed: Omitted — there is no published Sulphur-2 benchmark on an RTX 5070 Ti (or any 16GB card) at the time of writing, and the closest 16GB LTX-family datapoints (#11726, #303) report VRAM, not wall-time. We do not extrapolate wall-time from a different card across the compute/bandwidth gap, so no speed number is quoted here. Empirical RTX 5070 Ti wall-time will land at /check/sulphur-2/rtx-5070-ti once a benchmark is contributed via /contribute.
- VRAM usage: The closest cited consumer-16GB peak from the LTX family is
14926 MiB(~14.6 GiB) during sampling on a 16GB ComfyUI user running the LTX-2 19B distilled stack (Comfy-Org/ComfyUI#11726). Sulphur-2 at Q4_K_S (12.29 GiB weights) is similar in size, so plan on a runtime peak in the 13.5–15 GiB band with the QAT-Q4 Gemma encoder — within the 16 GiB ceiling but with very little headroom. If you load the unquantized Gemma 3 12B encoder instead, the peak jumps to29068 MiBon a 16GB Blackwell card (Lightricks/ComfyUI-LTXVideo#303, measured on RTX 5080 16GB — same 16 GB Blackwell envelope as the 5070 Ti) — that's why the QAT encoder in step 3 is mandatory. - Quality notes: The Sulphur 2 GGUF is a quantization of the distilled checkpoint — expect the same short-step distilled sampling profile as LTX-2.3 distilled. Q3 tier and below shows noticeable quality regression; Q4_K_S is the recommended balance.
For up-to-date benchmark data on this pair, see /check/sulphur-2/rtx-5070-ti.
Troubleshooting
"Can I run this at full bf16 or fp8mixed on 16 GB?" — No
The upstream sulphur_dev_bf16.safetensors is 42.97 GiB and sulphur_dev_fp8mixed.safetensors is 27.16 GiB (upstream tree). Both weights alone exceed 16 GiB by a wide margin before the encoder, VAE, and activations enter VRAM — and the RTX 5070 Ti's native FP8 tensor cores do not change that (FP8 compute is a speed/efficiency win, not a memory escape hatch; the fp8mixed file is still 27 GiB on disk and in VRAM). The Q4_K_S GGUF in step 2 is the only path that runs on this card. If you have a 24GB+ card, see the RTX 5090 recipe for the native fp8mixed flow.
OOM when loading the text encoder
Same root cause as documented upstream — the default unquantized Gemma 3 12B encoder OOMs on 16GB cards when loaded alongside the Sulphur 2 weights (Lightricks/ComfyUI-LTXVideo#303 reports peak 29068 MiB on RTX 5080 (16GB VRAM) with the LTX-2 19B-dev-fp8 pipeline — the same 16 GB Blackwell envelope as the 5070 Ti). Replace it with gemma-3-12b-it-qat-UD-Q4_K_XL.gguf from Unsloth (step 3 above), and enable CPU offload for the Gemma encoder via the KJNodes model-offload nodes — keep the encoder unloaded from VRAM while the DiT is sampling.
flash_attention_2 crash on first inference (Blackwell sm_120)
The RTX 5070 Ti is Blackwell GB203 sm_120. FlashAttention-2 wheels still do not ship sm_120 kernels as of mid-2026 (Dao-AILab/flash-attention#2168 tracks the RTX 50-series CUDA error), so any node or snippet that hardcodes attn_implementation="flash_attention_2" will crash at the first attention call. ComfyUI's LTXV nodes default to SDPA and are unaffected, but if you wire in a custom node that forces FA2, switch it to sdpa (or eager). Pinning the cu128 PyTorch wheel (Installation step 1) is also required — older CUDA channels lack sm_120 kernels entirely.
"sulphur_final" referenced in the workflow but missing locally
The upstream workflow JSON contains a sulphur_final checkpoint reference that does not exist as a published file. Per the SulphurAI README: "I'm aware the workflows contain sulphur_final right now, just use the lora or use the full models, don't use both at the same time". If you used the GGUF in step 2, point the loader at sulphur_dev-Q4_K_S.gguf instead and delete or bypass any LoRA node — the distill is already baked into the GGUF weights.
Gemma GGUF loader fails or outputs gibberish
The Gemma 3 GGUF loader in ComfyUI-GGUF needed PRs #399 and #402 merged for the LTX-2 family path (Sulphur-2 inherits this via the LTX-2.3 lineage it builds on); pull the latest city96/ComfyUI-GGUF main — both PRs are now merged (Kijai/LTXV2_comfy discussion #7).
"Can I run this on 8GB VRAM?" — No, not realistically
A community walkthrough specifically about Sulphur-2 deployment (knightli.com) addresses 8GB directly: "but it is not realistic to expect high-resolution, long-video, complex workflows on 8GB". The Q3_K_S GGUF (9.63 GiB) leaves no room for the encoder + activations on an 8GB card, and aggressive offloading destroys throughput. 16GB is the practical floor — which is exactly what this recipe targets.
Slow generation
Keep the Gemma encoder offloaded with the KJNodes model-offload nodes; VRAM thrashing on a 16GB card kills wall time even on the 5070 Ti's fast GDDR7. Drop frame counts and resolution further if you observe steady-state VRAM oscillation in nvidia-smi. Empirical RTX 5070 Ti wall-time numbers will appear at /check/sulphur-2/rtx-5070-ti once contributed.