Anima 2B on RTX 4070: Native ComfyUI Anime Text-to-Image

What You'll Build

A local anime-focused text-to-image pipeline using Anima, a 2-billion-parameter DiT built on NVIDIA Cosmos-Predict2 with a Qwen3 0.6B text encoder — a collaboration between CircleStone Labs and Comfy Org. The model runs natively in ComfyUI — no custom nodes — and its ~7GB unquantized footprint sits comfortably inside the RTX 4070's 12GB VRAM budget.

Hardware data: RTX 4070 (12GB VRAM) · ~7GB peak (unquantized, per the lilting.ch overview) · See benchmark data

⚠️ License note: Anima ships under the CircleStone Labs Non-Commercial License and, as a "Derivative Model" of Cosmos-Predict2-2B-Text2Image, is additionally subject to NVIDIA's Open Model License. Commercial use requires writing to tdrussell@circlestone.ai — see the model card.

Requirements

Component	Minimum	Tested
GPU	~7GB unquantized FP16; ~1.2GB with a Q4_K GGUF	RTX 4070 (12GB)
RAM	16GB	—
Storage	~5.6GB for base + encoder + VAE; ~1.2GB for a Q4_K GGUF tier	—
Software	ComfyUI (recent build), Python 3.10+	—

Installation

1. Update ComfyUI

Anima depends on the Cosmos-Predict2 diffusion model class and the Qwen-Image VAE — both available only in recent ComfyUI builds. From your ComfyUI root:

git pull
pip install -r requirements.txt

ℹ️ PyTorch on the RTX 4070 (Ada Lovelace, sm_89): unlike Blackwell (sm_120) cards, the 4070 needs no special CUDA wheel — the sm_89 kernels ship in the default stable PyTorch wheels, so a plain pip install torch (or the requirements.txt above) already has everything. An existing recent ComfyUI venv is fine as-is.

2. Download the model files

Three files are needed, and they go into three different ComfyUI subfolders. The destination folders are stated verbatim in the model card; the weight files live under the repo's split_files/ tree (the root-level paths from older guides now return 404 — use the split_files/ paths below):

# from ComfyUI/ root
cd models/diffusion_models
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/diffusion_models/anima-base-v1.0.safetensors

cd ../text_encoders
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/text_encoders/qwen_3_06b_base.safetensors

cd ../vae
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/vae/qwen_image_vae.safetensors

Per the model card: anima-base-v1.0.safetensors goes in ComfyUI/models/diffusion_models, qwen_3_06b_base.safetensors in ComfyUI/models/text_encoders, and qwen_image_vae.safetensors in ComfyUI/models/vae (this is the Qwen-Image VAE — you may already have it).

3. (Optional) Use a GGUF quant to free up VRAM

The 4070 has comfortable headroom for the unquantized model, but if you want to share VRAM with other workloads, community GGUFs are at JusteLeo/Anima-GGUF (base_model_relation: quantized of circlestone-labs/Anima). These quantize the preview variant and run through stable-diffusion.cpp: the Q6_K file is ~1.74GB, Q5_K ~1.46GB, and Q4_K (~1.2GB) is the smallest tier, aimed at low-RAM devices. For the highest-quality output on a 12GB 4070, stick with the unquantized base weights from step 2.

4. Load the official workflow

Comfy Org publishes the canonical workflow at comfy.org/workflows/image_anima_preview. Download the JSON and drag-drop it onto the ComfyUI canvas, or drag in any image generated from the workflow page — the workflow is embedded in the file.

Running

After the three model files are in place and the workflow is loaded:

Type a prompt in the positive text node. The model card's recommended positive prefix is masterpiece, best quality, score_7, safe, and the documented tag order is [quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]. Artist tags must be prefixed with @.
Set resolution between 512×512 and 1536×1536 (the model card documents this range; ~1MP — e.g. 1024×1024 — is the sweet spot).
Steps: 30–50. CFG: 4–5. Sampler: er_sde is the model author's documented default; euler_a softens line work; dpmpp_2m_sde_gpu adds variety.
Click "Queue Prompt".

Outputs land in ComfyUI/output/ per the standard ComfyUI convention.

Results

Speed: Omitted — there is no published Anima benchmark on the RTX 4070, and no comparable Ada-Lovelace 12GB number exists in the community to anchor to (the only GPU-named report in the lilting.ch overview is a relative "10x slower than SDXL on Tesla V100" comparison, not an absolute time). The 4070 also differs materially from its 16GB Ada siblings — roughly 30% fewer CUDA cores and 25% less memory bandwidth than the 4070 Ti SUPER — so forward-extrapolating any 16GB-class figure would be misleading. Once a community benchmark lands it will appear at the link below — please contribute yours.
VRAM usage: ~7GB peak unquantized, per the lilting.ch technical overview (Feb 2026, updated Jun 2026), whose spec table lists VRAM as "~7GB (without quantization)". This is corroborated by the on-disk weights: the split_files/ tree totals ~5.6GB at FP16 (base 4.18GB + Qwen3-0.6B encoder 1.19GB + VAE 0.25GB), and runtime activations push the peak to ~7GB. On a 12GB card — where a desktop display typically leaves ~10.5–11.3GB usable — that still leaves several GB free.
Quality notes: Anime-first. The base model is intentionally style-neutral; reach for explicit @artist tags or LoRAs (training scripts exist — see discussion #28) for stronger stylization. Text rendering is weak — the model card notes it "can generally do single words and sometimes short phrases, but lengthy text rendering won't work well."

For the full benchmark data, see /check/anima/rtx-4070.

Troubleshooting

Slow generations despite the headroom

The Run times? discussion shows users on 4GB and 6GB GPUs initially hitting multi-minute or multi-hour times. The fix that worked for them was launching ComfyUI with the --fp16-unet command-line flag, or setting ComfyUI's ModelComputeType node to fp16. On a 12GB 4070 this is essentially never needed — the model fits resident with room to spare — but it's the documented first stop if generation is unexpectedly slow.

Out-of-memory during VAE decode

If you are colocating Anima with a heavy second model and see an OOM only at the final image-decode stage, swap the standard VAE decode node for ComfyUI's VAE Decode (Tiled) node — it decodes the latent in tiles to keep the decode-stage peak down. This is unlikely to be necessary on a 12GB 4070 running Anima alone.

No benchmarks for this pair yet

The backend reports verdict: unknown for anima × rtx-4070 at the time of writing — no community benchmark submitted. If you run this recipe, please contribute your numbers so the live /check/anima/rtx-4070 page can replace these estimates with real data.