What You'll Build
A working diffusers setup for Meituan's LongCat-Image — a 6B-parameter bilingual (Chinese + English) MM-DiT/Single-DiT text-to-image model — running natively on a single 24 GB RTX 4090. This recipe is scoped to the base text-to-image variant; the image-editing siblings are out of scope below.
Hardware data: RTX 4090 (24 GB VRAM) · canonical diffusers + enable_model_cpu_offload() · ~18 GB peak per Meituan team statement · See benchmark data
⚠️ Why this recipe pins the base variant. Meituan publishes four siblings under the LongCat-Image brand and their inference paths and VRAM profiles differ — see "Sibling variants and what fits 24 GB" below before downloading anything.
ℹ️ Why this recipe pins the diffusers runtime. On a 16 GB consumer card the only confirmed path is ComfyUI + GGUF (see the 4060 Ti sibling recipe). The 24 GB tier unlocks the canonical
diffusersLongCatImagePipelinedirectly: the Meituan team's own GitHub statement is that the latest official inference code "consumes approximately 18 GB of VRAM and supports inference on an RTX 4090" (Issue #8 comment byjunqiangwu, 2025-12-08). The HF model card's Quick Start independently confirms the same profile: withpipe.enable_model_cpu_offload()it is "Required ~17 GB" (HF model card). The community ComfyUI integration that PR'd back to the official repo similarly reports "Standard (CPU offload disabled): ~24GB+" and "Low VRAM (CPU offload enabled): ~17-19GB" (sooxt98/comfyui_longcat_image). Two independent sources name the same tier — this is the canonical 4090 path.
Sibling variants and what fits 24 GB
| Variant | Purpose | 24 GB fit (cited) |
|---|---|---|
| LongCat-Image (this recipe) | Final-release T2I, 6B params, hybrid MM-DiT/Single-DiT à la Flux1.dev per the arXiv technical report (2512.07584) | Yes — canonical diffusers path with enable_model_cpu_offload(), ~18 GB peak per Meituan team |
| LongCat-Image-Edit | Image-to-image editing variant | Tighter — enable_model_cpu_offload() does not work on the Edit pipeline per user mingyi456 on Issue #8, so the no-offload "~24 GB+" tier is what you get. Borderline on a 24 GB card; not in scope here |
| LongCat-Image-Edit-Turbo | Distilled few-step edit variant | Same memory profile as Edit; if you get a working setup, /contribute so we can publish one |
| LongCat-Image-Dev | Mid-training checkpoint intended for fine-tuning, not inference | Out of scope for this recipe |
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 24 GB VRAM, CUDA-capable | RTX 4090 (24 GB) |
| RAM | 32 GB system RAM (CPU offload spills the text encoder to host) | — |
| Storage | ~30 GB free for the upstream BF16 repo | — |
| Software | Python 3.10, latest diffusers from main, PyTorch with CUDA | — |
Installation
The default pip install torch already includes sm_89 kernels (Ada Lovelace) — no special wheel selection is required for the RTX 4090. FlashAttention-2 also ships full sm_89 kernel coverage, so the standard PyTorch + diffusers install is everything you need.
1. Create the conda environment
The official GitHub README pins Python 3.10:
conda create -n longcat-image python=3.10
conda activate longcat-image
2. Clone the repo and install dependencies
The recommended path for diffusers integration is to install diffusers from main (the LongCatImagePipeline class needs the upstream HF diffusers#12828 integration that the HF model card's Quick Start uses):
git clone https://github.com/meituan-longcat/LongCat-Image
cd LongCat-Image
pip install -r infer_requirements.txt
pip install -U git+https://github.com/huggingface/diffusers
If infer_requirements.txt errors with No module named 'dskernels', see Troubleshooting below — dskernels is not on PyPI, but you can skip it for inference.
3. Pre-download the model
The pipeline auto-downloads on first run, but pre-fetching avoids surprises and gives you a clean progress bar:
hf download meituan-longcat/LongCat-Image --local-dir ./longcat-image
The repo is ~29 GB on disk (BF16 transformer + Qwen2.5-VL-7B text encoder + VAE).
Running
The HF model card's reference Quick Start works as-is on a 24 GB card. Save the following as run_t2i.py inside the cloned LongCat-Image directory:
import torch
from diffusers import LongCatImagePipeline
if __name__ == '__main__':
pipe = LongCatImagePipeline.from_pretrained(
"meituan-longcat/LongCat-Image",
torch_dtype=torch.bfloat16,
)
# On a 24 GB RTX 4090, keep CPU offload enabled — this is the path the
# Meituan team validated at ~18 GB peak. Disable only if you have ≥32 GB VRAM.
pipe.enable_model_cpu_offload()
prompt = (
"A young Asian woman in a yellow knit sweater with a white necklace, "
"hands resting on her knees, calm expression. Background is a rough "
"brick wall, warm afternoon sunlight, medium-distance shot."
)
image = pipe(
prompt,
height=768,
width=1344,
guidance_scale=4.0,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43),
enable_cfg_renorm=True,
enable_prompt_rewrite=True,
).images[0]
image.save("./t2i_example.png")
Run it:
python run_t2i.py
Output lands at ./t2i_example.png. First run downloads any weights not pre-fetched; subsequent runs load straight from the HF cache.
Meituan's repo also includes scripts/inference_t2i.py with the same defaults hardcoded; that script is equivalent to the above and runs with python scripts/inference_t2i.py.
Text-in-image: LongCat-Image renders embedded text — the HF README is explicit that you must wrap the target text in single or double quotation marks (English '...' / "..." or the Chinese full-width equivalents '...' / "..."). Without quotes, the model treats the words as scene description, not glyphs to render.
Results
- Speed: No RTX-4090-specific inference-time measurement is cited in the official model card, GitHub repo, ComfyUI integration, or arXiv tech report at time of writing. Once a community run lands at /check/longcat-image/rtx-4090, this section gets updated; contribute one via /contribute if you measure it.
- VRAM usage: ~18 GB peak with
enable_model_cpu_offload()per the Meituan team comment on Issue #8. The HF model card's Quick Start labels the same path "Required ~17 GB" (model card). The community ComfyUI port lists "Low VRAM (CPU offload enabled): ~17-19GB" and "Standard (CPU offload disabled): ~24GB+" (sooxt98/comfyui_longcat_image) — so a 24 GB card with offload disabled is borderline, but with offload enabled it sits comfortably under the VRAM ceiling. - Quality notes: LongCat-Image is bilingual by design (Chinese + English) and the arXiv technical report (2512.07584) highlights multilingual text rendering as a primary target. The 6B parameter count is "significantly smaller than the nearly 20B or larger Mixture-of-Experts (MoE) architectures common in the field" per the same report, and the architecture is "a hybrid MM-DiT and Single-DiT structure, consistent with Flux1.dev". Quality at native BF16 is the canonical reference — no quantization tradeoffs to consider on this card.
For the full benchmark data, see /check/longcat-image/rtx-4090.
Troubleshooting
pip install -r infer_requirements.txt errors with No module named 'dskernels'
The official Meituan infer_requirements.txt lists dskernels, which is not on PyPI — ghostnyambit reported the same blocker on Issue #8 (2025-12-11). dskernels is only required for training-time DeepSpeed optimizations and not for inference. Comment the line out of infer_requirements.txt and re-run the install — the diffusers Quick Start above does not touch DeepSpeed.
OOM despite enable_model_cpu_offload()
Confirm you are not also calling pipe.to(device, torch.bfloat16) after enable_model_cpu_offload() — the two are mutually exclusive. The HF model card's Quick Start has the pipe.to(...) line commented out for exactly this reason, with the in-line note "Uncomment for high VRAM devices (Faster inference)". On a 24 GB card, leave it commented; the team's quoted ~18 GB number assumes offload is active.
Out of host RAM, not VRAM
User reckless-huang reported on Issue #8 (2025-12-08) that the old script failed on a system with 32 GB host RAM even though VRAM was fine. The Meituan team's follow-up fix in the next-day commit reduced host-memory pressure as well as VRAM. Make sure you've installed from the latest main branch — if you cloned before 2025-12-08, pull again.
enable_model_cpu_offload() doesn't work on the Edit pipeline
User mingyi456 notes on Issue #8 that enable_model_cpu_offload() currently does not work with LongCatImageEditPipeline. This recipe is scoped to the base LongCatImagePipeline for exactly this reason — Edit needs the no-offload "~24GB+" tier, which is borderline on this card. For LongCat-Image-Edit on a 4090, follow the upstream issue thread for the manual sequential-offload patch before assuming a turnkey workflow exists.
LiVeen's FP8 (LongCat-Image-Edit-FP8-e4m3fn) is unverified
A community FP8 quant of the Edit variant exists at LiVeen/LongCat-Image-Edit-FP8-e4m3fn. The author themselves stated on Issue #8 that there is "a fairly high likelihood that this model won't work without the rest of the diffusers stuff, or even at all" and has not tested it. Don't substitute it for the canonical BF16 path until somebody verifies it — report results via /contribute if you try.