What You'll Build
A local anime-focused text-to-image pipeline using Anima, a 2-billion-parameter DiT built on NVIDIA Cosmos-Predict2 with a Qwen3 0.6B text encoder — a collaboration between CircleStone Labs and Comfy Org. The model runs natively in ComfyUI — no custom nodes — and its ~7GB unquantized footprint sits comfortably inside the RTX 4070's 12GB VRAM budget.
Hardware data: RTX 4070 (12GB VRAM) · ~7GB peak (unquantized, per the lilting.ch overview) · See benchmark data
⚠️ License note: Anima ships under the CircleStone Labs Non-Commercial License and, as a "Derivative Model" of Cosmos-Predict2-2B-Text2Image, is additionally subject to NVIDIA's Open Model License. Commercial use requires writing to
tdrussell@circlestone.ai— see the model card.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | ~7GB unquantized FP16; ~1.2GB with a Q4_K GGUF | RTX 4070 (12GB) |
| RAM | 16GB | — |
| Storage | ~5.6GB for base + encoder + VAE; ~1.2GB for a Q4_K GGUF tier | — |
| Software | ComfyUI (recent build), Python 3.10+ | — |
Installation
1. Update ComfyUI
Anima depends on the Cosmos-Predict2 diffusion model class and the Qwen-Image VAE — both available only in recent ComfyUI builds. From your ComfyUI root:
git pull
pip install -r requirements.txt
ℹ️ PyTorch on the RTX 4070 (Ada Lovelace, sm_89): unlike Blackwell (sm_120) cards, the 4070 needs no special CUDA wheel — the sm_89 kernels ship in the default stable PyTorch wheels, so a plain
pip install torch(or therequirements.txtabove) already has everything. An existing recent ComfyUI venv is fine as-is.
2. Download the model files
Three files are needed, and they go into three different ComfyUI subfolders. The destination folders are stated verbatim in the model card; the weight files live under the repo's split_files/ tree (the root-level paths from older guides now return 404 — use the split_files/ paths below):
# from ComfyUI/ root
cd models/diffusion_models
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/diffusion_models/anima-base-v1.0.safetensors
cd ../text_encoders
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/text_encoders/qwen_3_06b_base.safetensors
cd ../vae
wget https://huggingface.co/circlestone-labs/Anima/resolve/main/split_files/vae/qwen_image_vae.safetensors
Per the model card: anima-base-v1.0.safetensors goes in ComfyUI/models/diffusion_models, qwen_3_06b_base.safetensors in ComfyUI/models/text_encoders, and qwen_image_vae.safetensors in ComfyUI/models/vae (this is the Qwen-Image VAE — you may already have it).
3. (Optional) Use a GGUF quant to free up VRAM
The 4070 has comfortable headroom for the unquantized model, but if you want to share VRAM with other workloads, community GGUFs are at JusteLeo/Anima-GGUF (base_model_relation: quantized of circlestone-labs/Anima). These quantize the preview variant and run through stable-diffusion.cpp: the Q6_K file is ~1.74GB, Q5_K ~1.46GB, and Q4_K (~1.2GB) is the smallest tier, aimed at low-RAM devices. For the highest-quality output on a 12GB 4070, stick with the unquantized base weights from step 2.
4. Load the official workflow
Comfy Org publishes the canonical workflow at comfy.org/workflows/image_anima_preview. Download the JSON and drag-drop it onto the ComfyUI canvas, or drag in any image generated from the workflow page — the workflow is embedded in the file.
Running
After the three model files are in place and the workflow is loaded:
- Type a prompt in the positive text node. The model card's recommended positive prefix is
masterpiece, best quality, score_7, safe,and the documented tag order is[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]. Artist tags must be prefixed with@. - Set resolution between 512×512 and 1536×1536 (the model card documents this range; ~1MP — e.g. 1024×1024 — is the sweet spot).
- Steps: 30–50. CFG: 4–5. Sampler:
er_sdeis the model author's documented default;euler_asoftens line work;dpmpp_2m_sde_gpuadds variety. - Click "Queue Prompt".
Outputs land in ComfyUI/output/ per the standard ComfyUI convention.
Results
- Speed: Omitted — there is no published Anima benchmark on the RTX 4070, and no comparable Ada-Lovelace 12GB number exists in the community to anchor to (the only GPU-named report in the lilting.ch overview is a relative "10x slower than SDXL on Tesla V100" comparison, not an absolute time). The 4070 also differs materially from its 16GB Ada siblings — roughly 30% fewer CUDA cores and 25% less memory bandwidth than the 4070 Ti SUPER — so forward-extrapolating any 16GB-class figure would be misleading. Once a community benchmark lands it will appear at the link below — please contribute yours.
- VRAM usage: ~7GB peak unquantized, per the lilting.ch technical overview (Feb 2026, updated Jun 2026), whose spec table lists VRAM as "~7GB (without quantization)". This is corroborated by the on-disk weights: the
split_files/tree totals ~5.6GB at FP16 (base 4.18GB + Qwen3-0.6B encoder 1.19GB + VAE 0.25GB), and runtime activations push the peak to ~7GB. On a 12GB card — where a desktop display typically leaves ~10.5–11.3GB usable — that still leaves several GB free. - Quality notes: Anime-first. The base model is intentionally style-neutral; reach for explicit
@artisttags or LoRAs (training scripts exist — see discussion #28) for stronger stylization. Text rendering is weak — the model card notes it "can generally do single words and sometimes short phrases, but lengthy text rendering won't work well."
For the full benchmark data, see /check/anima/rtx-4070.
Troubleshooting
Slow generations despite the headroom
The Run times? discussion shows users on 4GB and 6GB GPUs initially hitting multi-minute or multi-hour times. The fix that worked for them was launching ComfyUI with the --fp16-unet command-line flag, or setting ComfyUI's ModelComputeType node to fp16. On a 12GB 4070 this is essentially never needed — the model fits resident with room to spare — but it's the documented first stop if generation is unexpectedly slow.
Out-of-memory during VAE decode
If you are colocating Anima with a heavy second model and see an OOM only at the final image-decode stage, swap the standard VAE decode node for ComfyUI's VAE Decode (Tiled) node — it decodes the latent in tiles to keep the decode-stage peak down. This is unlikely to be necessary on a 12GB 4070 running Anima alone.
No benchmarks for this pair yet
The backend reports verdict: unknown for anima × rtx-4070 at the time of writing — no community benchmark submitted. If you run this recipe, please contribute your numbers so the live /check/anima/rtx-4070 page can replace these estimates with real data.