Anima 2B on RTX 5070 Ti: Native ComfyUI Anime Text-to-Image

What You'll Build

A local anime-focused text-to-image pipeline using Anima, a 2B-parameter DiT built on NVIDIA Cosmos-Predict2 with a Qwen3 0.6B text encoder — a collaboration between CircleStone Labs and Comfy Org. The model runs natively in ComfyUI — no custom nodes — and uses only a fraction of the RTX 5070 Ti's 16GB VRAM budget.

Hardware data: RTX 5070 Ti (16GB VRAM) · ~7GB peak (unquantized, per the lilting.ch overview) · See benchmark data

⚠️ License note: Anima ships under the CircleStone Labs Non-Commercial License and, as a "Derivative Model" of Cosmos-Predict2-2B-Text2Image, is additionally subject to NVIDIA's Open Model License. Commercial use requires writing to tdrussell@circlestone.ai — see the model card.

Requirements

Component	Minimum	Tested
GPU	~4GB with Q4_K GGUF; ~7GB unquantized FP16	RTX 5070 Ti (16GB)
RAM	16GB	—
Storage	~5.6GB for base + encoder + VAE; ~1.2GB for a Q4_K GGUF tier	—
Software	ComfyUI (recent build), Python 3.10+	—

Installation

1. Update ComfyUI

Anima depends on the Cosmos-Predict2 diffusion model class and the Qwen-Image VAE — both available only in recent ComfyUI builds. From your ComfyUI root:

git pull
pip install -r requirements.txt

ℹ️ Fresh ComfyUI install on Blackwell: the RTX 5070 Ti is a Blackwell (sm_120) card. If you are installing PyTorch from scratch rather than updating an existing ComfyUI, install the CUDA 12.8 wheels so the sm_120 kernels are present: pip install torch --index-url https://download.pytorch.org/whl/cu128. An existing recent ComfyUI venv already ships these.

2. Download the model files

Three files are needed, and they go into three different ComfyUI subfolders. The destination folders are stated verbatim in the model card; the weight files live under the repo's split_files/ tree:

# from ComfyUI/ root
cd models/diffusion_models
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/diffusion_models/anima-base-v1.0.safetensors

cd ../text_encoders
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/text_encoders/qwen_3_06b_base.safetensors

cd ../vae
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/vae/qwen_image_vae.safetensors

Per the model card: anima-base-v1.0.safetensors goes in ComfyUI/models/diffusion_models, qwen_3_06b_base.safetensors in ComfyUI/models/text_encoders, and qwen_image_vae.safetensors in ComfyUI/models/vae (this is the Qwen-Image VAE — you may already have it).

3. (Optional) Use a GGUF quant for lower VRAM

The 5070 Ti has plenty of headroom for the unquantized model, but if you want to share VRAM with other workloads, community GGUFs are at JusteLeo/Anima-GGUF (base_model_relation: quantized of circlestone-labs/Anima). The Q6_K weight file is around 1.74 GB and runs through stable-diffusion.cpp; Q4_K (~1.2 GB) is the smallest tier and aimed at low-RAM devices.

4. Load the official workflow

Comfy Org publishes the canonical workflow at comfy.org/workflows/image_anima_preview. Download the JSON and drag-drop it onto the ComfyUI canvas, or drag in any image generated from the workflow page — the workflow is embedded in the file.

Running

After the three model files are in place and the workflow is loaded:

Type a prompt in the positive text node. The model card's recommended positive prefix is masterpiece, best quality, score_7, safe, and the documented tag order is [quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]. Artist tags must be prefixed with @.
Set resolution between 512×512 and 1536×1536 (the model card documents this range; ~1MP — e.g. 1024×1024 — is the sweet spot).
Steps: 30–50. CFG: 4–5. Sampler: er_sde is the model author's documented default; euler_a softens line work; dpmpp_2m_sde_gpu adds variety.
Click "Queue Prompt".

Outputs land in ComfyUI/output/ per the standard ComfyUI convention.

Results

Speed: Omitted — there is no published Anima benchmark on the RTX 5070 Ti, and no comparable 50-series number exists in the community to anchor to (the only GPU-named report in the lilting.ch overview is a relative "10x slower than SDXL on Tesla V100" comparison, not an absolute time). The RTX 5070 Ti's ~896 GB/s memory bandwidth is well above the lower-tier 50-series cards Anima has been informally tested on, so forward-extrapolating any of those figures would be misleading. Once a community benchmark lands it will appear at the link below.
VRAM usage: ~7GB peak unquantized, per the lilting.ch technical overview (Feb 2026, updated Apr 2026), whose spec table lists VRAM as "~7GB (without quantization)". This is corroborated by the on-disk weights: the split_files/ tree totals ~5.6GB at FP16 (base 4.18GB + Qwen3-0.6B encoder 1.19GB + VAE 0.25GB), and runtime activations push the peak to ~7GB. Either way it leaves over half the 5070 Ti's 16GB budget free.
Quality notes: Anime-first. The base model is intentionally style-neutral; reach for explicit @artist tags or LoRAs (training scripts exist — see discussion #28) for stronger stylization. Text rendering is weak — the model card notes it can "generally do single words and sometimes short phrases, but lengthy text rendering won't work well."

For the full benchmark data, see /check/anima/rtx-5070-ti.

Troubleshooting

Slow generations despite the headroom

The Run times? discussion shows users on 4GB and 6GB GPUs initially hitting multi-minute or multi-hour times. The fix that worked for them was launching ComfyUI with the --fp16-unet command-line flag, or setting ComfyUI's ModelComputeType node to fp16. On a 16GB 5070 Ti this is essentially never needed — the model fits resident with room to spare — but it's the documented first stop if generation is unexpectedly slow.

Out-of-memory during VAE decode

The community GGUF maintainer notes that the --vae-tiling flag is recommended when running via stable-diffusion.cpp to prevent out-of-memory errors during the image decoding phase (see the JusteLeo/Anima-GGUF card). The equivalent in ComfyUI is the VAE Decode (Tiled) node — swap it in if you see OOMs only at the final decode stage. This is unlikely on a 16GB 5070 Ti, but applies if you are colocating Anima with a heavy second model.

No benchmarks for this pair yet

The backend reports verdict: unknown for anima × rtx-5070-ti at the time of writing — no community benchmark submitted. If you run this recipe, please contribute your numbers so the live /check/anima/rtx-5070-ti page can replace these estimates with real data.