What You'll Build
A local anime-focused text-to-image pipeline using Anima, a 2B-parameter DiT built on NVIDIA Cosmos-Predict2 with a Qwen-3 0.6B text encoder — a collaboration between CircleStone Labs and Comfy Org. The model runs natively in ComfyUI — no custom nodes — and comfortably fits the RTX 5060 Ti's 16GB VRAM budget.
Hardware data: RTX 5060 Ti (16GB VRAM) · ~7GB peak (unquantized, per the lilting.ch overview) · See benchmark data
⚠️ License note: Anima ships under the CircleStone Labs Non-Commercial License and inherits NVIDIA's Open Model License for the Cosmos-Predict2 base. Commercial use requires writing to
tdrussell@circlestone.ai— see the model card.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | ~4GB with Q4_K GGUF; ~7GB unquantized FP16 | RTX 5060 Ti (16GB) |
| RAM | 16GB | — |
| Storage | ~6GB for base + encoder + VAE; ~2GB for a GGUF tier | — |
| Software | ComfyUI (recent build), Python 3.10+ | — |
Installation
1. Update ComfyUI
Anima depends on the Cosmos-Predict2 diffusion model class and the Qwen-Image VAE — both available only in recent ComfyUI builds. From your ComfyUI root:
git pull
pip install -r requirements.txt
2. Download the model files
Three files are needed, and they go into three different ComfyUI subfolders. Pull them directly from the official HuggingFace repo:
# from ComfyUI/ root
cd models/diffusion_models
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/anima-base-v1.0.safetensors
cd ../text_encoders
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/qwen_3_06b_base.safetensors
cd ../vae
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/qwen_image_vae.safetensors
The filenames and destination folders are taken verbatim from the model card.
3. (Optional) Use a GGUF quant for lower VRAM
If you'd rather use a quantized variant — useful if you're sharing VRAM with other workloads — community GGUFs are available at JusteLeo/Anima-GGUF. The Q6_K weight file is around 1.74 GB and runs through stable-diffusion.cpp or ComfyUI-GGUF nodes. Q4_K is the smallest tier and aimed at low-RAM devices.
4. Load the official workflow
The Comfy Org publishes the canonical workflow at comfy.org/workflows/image_anima_preview. Download the JSON and drag-drop it onto the ComfyUI canvas, or use any image generated from the workflow page — the workflow is embedded in the file.
Running
After the three model files are in place and the workflow is loaded:
- Type a prompt in the positive text node. The model card recommends prefixing with
masterpiece, best quality, score_7, safe,and tag-ordering as:[quality/meta/year/safety] [character count] [character] [series] [@artist] [general tags]. - Set resolution between 512×512 and 1536×1536 (the model card covers the supported range; ~1MP — e.g. 1024×1024 — is the sweet spot).
- Steps: 30–50. CFG: 4–5. Sampler:
er_sdeis the documented default;euler_asoftens line work;dpmpp_2m_sde_gpuadds variety. - Click "Queue Prompt".
Outputs land in ComfyUI/output/ per the standard ComfyUI convention.
Results
- Speed: Omitted — no public benchmark on RTX 5060 Ti yet, and reported numbers in the community (e.g. GTX 970, Tesla V100, M1 Max) are not directly comparable to a 50-series consumer card. Once a community benchmark lands, it will appear at the link below.
- VRAM usage: ~7GB peak unquantized, per the lilting.ch technical overview (Feb 2026, updated Apr 2026) — well within the 5060 Ti's 16GB budget.
- Quality notes: Anime-first. The base model is intentionally style-neutral; reach for explicit
@artisttags or LoRAs (training scripts exist — see discussion #28) for stronger stylization. Text rendering is weak (per model card).
For the full benchmark data, see /check/anima/rtx-5060-ti.
Troubleshooting
Slow generations on older or low-VRAM cards
The Run times? discussion shows users on 4GB and 6GB GPUs initially hitting multi-minute or multi-hour times. The fix that worked for them was launching ComfyUI with the --fp16-unet command-line flag, or setting ComfyUI's ModelComputeType node to fp16. On a 16GB 5060 Ti this is rarely needed, but it's the documented first stop if performance is unexpectedly bad.
Out-of-memory during VAE decode
The community GGUF maintainer recommends the --vae-tiling flag when running via stable-diffusion.cpp (see the JusteLeo/Anima-GGUF card) to avoid OOM in the VAE step. The equivalent in ComfyUI is the VAE Decode (Tiled) node — swap it in if you see OOMs only at the final decode stage.
No benchmarks for this pair yet
The backend reports verdict: unknown for anima × rtx-5060-ti at the time of writing — no community benchmark submitted. If you run this recipe, please contribute your numbers so the live /check/anima/rtx-5060-ti page can replace these estimates with real data.