self-hosted/ai
§01·recipe · image

Anima 2B on RTX 5060 Ti: Native ComfyUI Anime Text-to-Image

imagebeginner7GB+ VRAMMay 18, 2026
models
tools
prerequisites
  • NVIDIA RTX 5060 Ti (16GB VRAM) or equivalent — ~4GB is the practical floor with a Q4_K GGUF quant; ~7GB unquantized FP16
  • Python 3.10+
  • ComfyUI (recent build with Cosmos-Predict2 / Qwen-Image VAE support)

What You'll Build

A local anime-focused text-to-image pipeline using Anima, a 2B-parameter DiT built on NVIDIA Cosmos-Predict2 with a Qwen-3 0.6B text encoder — a collaboration between CircleStone Labs and Comfy Org. The model runs natively in ComfyUI — no custom nodes — and comfortably fits the RTX 5060 Ti's 16GB VRAM budget.

Hardware data: RTX 5060 Ti (16GB VRAM) · ~7GB peak (unquantized, per the lilting.ch overview) · See benchmark data

⚠️ License note: Anima ships under the CircleStone Labs Non-Commercial License and inherits NVIDIA's Open Model License for the Cosmos-Predict2 base. Commercial use requires writing to tdrussell@circlestone.ai — see the model card.

Requirements

ComponentMinimumTested
GPU~4GB with Q4_K GGUF; ~7GB unquantized FP16RTX 5060 Ti (16GB)
RAM16GB
Storage~6GB for base + encoder + VAE; ~2GB for a GGUF tier
SoftwareComfyUI (recent build), Python 3.10+

Installation

1. Update ComfyUI

Anima depends on the Cosmos-Predict2 diffusion model class and the Qwen-Image VAE — both available only in recent ComfyUI builds. From your ComfyUI root:

git pull
pip install -r requirements.txt

2. Download the model files

Three files are needed, and they go into three different ComfyUI subfolders. Pull them directly from the official HuggingFace repo:

# from ComfyUI/ root
cd models/diffusion_models
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/anima-base-v1.0.safetensors

cd ../text_encoders
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/qwen_3_06b_base.safetensors

cd ../vae
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/qwen_image_vae.safetensors

The filenames and destination folders are taken verbatim from the model card.

3. (Optional) Use a GGUF quant for lower VRAM

If you'd rather use a quantized variant — useful if you're sharing VRAM with other workloads — community GGUFs are available at JusteLeo/Anima-GGUF. The Q6_K weight file is around 1.74 GB and runs through stable-diffusion.cpp or ComfyUI-GGUF nodes. Q4_K is the smallest tier and aimed at low-RAM devices.

4. Load the official workflow

The Comfy Org publishes the canonical workflow at comfy.org/workflows/image_anima_preview. Download the JSON and drag-drop it onto the ComfyUI canvas, or use any image generated from the workflow page — the workflow is embedded in the file.

Running

After the three model files are in place and the workflow is loaded:

  1. Type a prompt in the positive text node. The model card recommends prefixing with masterpiece, best quality, score_7, safe, and tag-ordering as: [quality/meta/year/safety] [character count] [character] [series] [@artist] [general tags].
  2. Set resolution between 512×512 and 1536×1536 (the model card covers the supported range; ~1MP — e.g. 1024×1024 — is the sweet spot).
  3. Steps: 30–50. CFG: 4–5. Sampler: er_sde is the documented default; euler_a softens line work; dpmpp_2m_sde_gpu adds variety.
  4. Click "Queue Prompt".

Outputs land in ComfyUI/output/ per the standard ComfyUI convention.

Results

  • Speed: Omitted — no public benchmark on RTX 5060 Ti yet, and reported numbers in the community (e.g. GTX 970, Tesla V100, M1 Max) are not directly comparable to a 50-series consumer card. Once a community benchmark lands, it will appear at the link below.
  • VRAM usage: ~7GB peak unquantized, per the lilting.ch technical overview (Feb 2026, updated Apr 2026) — well within the 5060 Ti's 16GB budget.
  • Quality notes: Anime-first. The base model is intentionally style-neutral; reach for explicit @artist tags or LoRAs (training scripts exist — see discussion #28) for stronger stylization. Text rendering is weak (per model card).

For the full benchmark data, see /check/anima/rtx-5060-ti.

Troubleshooting

Slow generations on older or low-VRAM cards

The Run times? discussion shows users on 4GB and 6GB GPUs initially hitting multi-minute or multi-hour times. The fix that worked for them was launching ComfyUI with the --fp16-unet command-line flag, or setting ComfyUI's ModelComputeType node to fp16. On a 16GB 5060 Ti this is rarely needed, but it's the documented first stop if performance is unexpectedly bad.

Out-of-memory during VAE decode

The community GGUF maintainer recommends the --vae-tiling flag when running via stable-diffusion.cpp (see the JusteLeo/Anima-GGUF card) to avoid OOM in the VAE step. The equivalent in ComfyUI is the VAE Decode (Tiled) node — swap it in if you see OOMs only at the final decode stage.

No benchmarks for this pair yet

The backend reports verdict: unknown for anima × rtx-5060-ti at the time of writing — no community benchmark submitted. If you run this recipe, please contribute your numbers so the live /check/anima/rtx-5060-ti page can replace these estimates with real data.