What You'll Build
A local anime-focused text-to-image pipeline using Anima, a 2-billion-parameter DiT built on NVIDIA Cosmos-Predict2 with a Qwen3 0.6B text encoder — a collaboration between CircleStone Labs and Comfy Org. The model runs natively in ComfyUI — no custom nodes — and its ~7GB unquantized footprint sits comfortably inside the RTX 3080 Ti's 12GB VRAM budget.
Hardware data: RTX 3080 Ti (12GB VRAM) · ~7GB peak (unquantized, per the lilting.ch overview) · See benchmark data
⚠️ License note: Anima ships under the CircleStone Labs Non-Commercial License and, as a "Derivative Model" of Cosmos-Predict2-2B-Text2Image, is additionally subject to NVIDIA's Open Model License. Commercial use requires writing to
tdrussell@circlestone.ai— see the model card.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | ~7GB unquantized FP16; ~1.2GB with a Q4_K GGUF | RTX 3080 Ti (12GB) |
| RAM | 16GB | — |
| Storage | ~5.6GB for base + encoder + VAE; ~1.2GB for a Q4_K GGUF tier | — |
| Software | ComfyUI (recent build), Python 3.10+ | — |
Installation
1. Update ComfyUI
Anima depends on the Cosmos-Predict2 diffusion model class and the Qwen-Image VAE — both available only in recent ComfyUI builds. From your ComfyUI root:
git pull
pip install -r requirements.txt
ℹ️ PyTorch on the RTX 3080 Ti (Ampere, sm_86): the 3080 Ti needs no special CUDA wheel — the sm_86 kernels have shipped in the default stable PyTorch wheels (cu124 / cu121) since the Ampere launch, so a plain
pip install torch(or therequirements.txtabove) already has everything. Anima runs in BF16/FP16, which Ampere supports natively; there is no FP8 in this model's runtime path, so the lack of FP8 tensor cores on Ampere is a non-issue here. An existing recent ComfyUI venv is fine as-is.
2. Download the model files
Three files are needed, and they go into three different ComfyUI subfolders. The destination folders are stated verbatim in the model card; the weight files live under the repo's split_files/ tree (the root-level paths from older guides now return 404 — use the split_files/ paths below):
# from ComfyUI/ root
cd models/diffusion_models
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/diffusion_models/anima-base-v1.0.safetensors
cd ../text_encoders
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/text_encoders/qwen_3_06b_base.safetensors
cd ../vae
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/vae/qwen_image_vae.safetensors
Per the model card: anima-base-v1.0.safetensors goes in ComfyUI/models/diffusion_models, qwen_3_06b_base.safetensors in ComfyUI/models/text_encoders, and qwen_image_vae.safetensors in ComfyUI/models/vae (this is the Qwen-Image VAE — you may already have it).
3. (Optional) Use a GGUF quant to free up VRAM
The 3080 Ti has comfortable headroom for the unquantized model, but if you want to share VRAM with other workloads, community GGUFs are at JusteLeo/Anima-GGUF (base_model_relation: quantized of circlestone-labs/Anima). These quantize the preview variant and run through stable-diffusion.cpp: the Q6_K file is ~1.74GB, Q5_K ~1.46GB, and Q4_K (~1.2GB) is the smallest tier, aimed at low-RAM devices. For the highest-quality output on a 12GB 3080 Ti, stick with the unquantized base weights from step 2.
4. Load the official workflow
Comfy Org publishes the canonical workflow at comfy.org/workflows/image_anima_preview. Download the JSON and drag-drop it onto the ComfyUI canvas, or drag in any image generated from the workflow page — the workflow is embedded in the file.
Running
After the three model files are in place and the workflow is loaded:
- Type a prompt in the positive text node. The model card's recommended positive prefix is
masterpiece, best quality, score_7, safe,and the documented tag order is[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]. Artist tags must be prefixed with@. - Set resolution between 512×512 and 1536×1536 (the model card documents this range; ~1MP — e.g. 1024×1024 — is the sweet spot).
- Steps: 30–50. CFG: 4–5. Sampler:
er_sdeis the model author's documented default;euler_asoftens line work;dpmpp_2m_sde_gpuadds variety. - Click "Queue Prompt".
Outputs land in ComfyUI/output/ per the standard ComfyUI convention.
Results
- Speed: Omitted — there is no published Anima benchmark on the RTX 3080 Ti, and no comparable number exists in the community to anchor to (the only GPU-named report in the lilting.ch overview is a relative "10x slower than SDXL on Tesla V100" comparison, not an absolute time). Per-image generation time is non-transferable across cards, so forward-extrapolating a figure measured on a different GPU would misstate what this one delivers — even though the 3080 Ti's 912 GB/s memory bandwidth and 10240 CUDA cores (TechPowerUp specs) put it near the top of the 12GB cards. Once a community benchmark lands it will appear at the link below — please contribute yours.
- VRAM usage: ~7GB peak unquantized, per the lilting.ch technical overview (Feb 2026, updated Jun 2026), whose spec table lists VRAM as "~7GB (without quantization)". This is corroborated by the on-disk weights: the
split_files/tree totals ~5.6GB at FP16 (base 4.18GB + Qwen3-0.6B encoder 1.19GB + VAE 0.25GB), and runtime activations push the peak to ~7GB. On a 12GB card — where a desktop display typically leaves ~10.5–11.3GB usable — that still leaves several GB free. - Quality notes: Anime-first. The base model is intentionally style-neutral; reach for explicit
@artisttags or LoRAs (training scripts exist — see discussion #28) for stronger stylization. Text rendering is weak — the model card notes it "can generally do single words and sometimes short phrases, but lengthy text rendering won't work well."
For the full benchmark data, see /check/anima/rtx-3080-ti.
Troubleshooting
Slow generations despite the headroom
The Run times? discussion shows users on 4GB and 6GB GPUs initially hitting multi-minute or multi-hour times. The fix that worked for them was launching ComfyUI with the --fp16-unet command-line flag, or setting ComfyUI's ModelComputeType node to fp16. On a 12GB 3080 Ti this is essentially never needed — the model fits resident with room to spare — but it's the documented first stop if generation is unexpectedly slow.
Out-of-memory during VAE decode
If you are colocating Anima with a heavy second model and see an OOM only at the final image-decode stage, swap the standard VAE decode node for ComfyUI's VAE Decode (Tiled) node — it decodes the latent in tiles to keep the decode-stage peak down. This is unlikely to be necessary on a 12GB 3080 Ti running Anima alone.
No benchmarks for this pair yet
The backend reports verdict: unknown for anima × rtx-3080-ti at the time of writing — no community benchmark submitted. If you run this recipe, please contribute your numbers so the live /check/anima/rtx-3080-ti page can replace these estimates with real data.