What You'll Build
A short looping animation (16 frames, ~512×512) generated from a text prompt by layering the AnimateDiff motion module on top of any Stable Diffusion 1.5 checkpoint inside ComfyUI — the lightweight video option that fits an 8GB card, where the modern 22B native video DiTs (LTX-2, Wan, Hunyuan) do not.
Hardware data: RTX 3060 Ti (8GB VRAM) · SD1.5 path at 512×512 / 16 frames · See benchmark data
ℹ️ SD1.5 path only on 8GB. AnimateDiff also ships an SDXL motion adapter (
mm_sdxl_v10_beta), but the canonical AnimateDiff README states SDXL "Inference usually requires ~13GB VRAM" for 1024×1024×16 — that does not fit an 8GB card. This recipe is scoped entirely to the SD1.5 motion modules.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8GB VRAM (cross-attention optimization enabled) | RTX 3060 Ti (8GB) |
| RAM | 16GB+ (model weights offload to system RAM under low-VRAM mode) | — |
| Storage | ~4–6GB (motion module ~1.7GB + an SD1.5 checkpoint ~2–4GB) | — |
| Software | ComfyUI, Python 3.10+ | — |
AnimateDiff is a plug-and-play motion module — it adds animation to an existing SD1.5 image checkpoint without retraining the base, so you reuse any SD1.5 model you already have.
Installation
1. Install the ComfyUI-AnimateDiff-Evolved custom node
The dominant ComfyUI runtime for AnimateDiff is ComfyUI-AnimateDiff-Evolved by Kosinkadink. Clone it into your custom_nodes folder:
cd ComfyUI/custom_nodes
git clone https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
ComfyUI-VideoHelperSuite provides the nodes for combining the generated frames into a video/GIF. Restart ComfyUI after cloning.
2. Download an SD1.5 motion module
Per the ComfyUI-AnimateDiff-Evolved README, the original SD1.5 motion modules — mm_sd_v14, mm_sd_v15, mm_sd_v15_v2, v3_sd15_mm — come from the canonical guoyww/animatediff HuggingFace repo. mm_sd_v15_v2.ckpt (≈1.7GB) is the recommended general-purpose SD1.5 module:
cd ComfyUI/models/animatediff_models
wget https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt
Place motion modules in ComfyUI/models/animatediff_models (the README also accepts ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/models).
3. Have an SD1.5 base checkpoint ready
AnimateDiff layers on top of a regular SD1.5 checkpoint (the model that draws each frame). Drop any SD1.5 .safetensors checkpoint into ComfyUI/models/checkpoints/ if you don't already have one.
Running
Build a standard SD1.5 ComfyUI text-to-image graph, then insert the AnimateDiff Loader node (from AnimateDiff-Evolved) between the model loader and the sampler, pointing it at mm_sd_v15_v2.ckpt. Set the latent batch size to 16 (the frame count) at 512×512, then route the decoded frames into a VideoHelperSuite Video Combine node to export a GIF/MP4.
To stay inside the 8GB envelope, launch ComfyUI with low-VRAM offloading so model weights spill to system RAM when idle:
python main.py --lowvram
Output frames are combined by VideoHelperSuite and land in ComfyUI/output/.
Results
- VRAM usage: The CivitAI Education Beginner's Guide to AnimateDiff reports, for the default 512×512, 16 frames config (Torch 2.0, measured on an RTX 4090), 5.6 GB with Xformers/SDP cross-attention optimization, versus 10.39 GB sub-quadratic and 12.13 GB with no optimization. The 5.6 GB optimized figure is what makes the SD1.5 / 512×512 / 16-frame path fit an 8GB card — keep cross-attention optimization on and avoid upscalers or higher frame counts in the same pass. See /check/animatediff/rtx-3060-ti.
- Speed: No first-party RTX 3060 Ti benchmark exists yet (
/checkverdict isunknown). For a sense of scale on a sibling Ampere card, a firsthand RTX 3060 (12GB) owner in this CivitAI guide reports "16 frames and 8 fps takes about an hour if the image is 512x768, checkpoint, VAE and upscaler" — note that run is a larger 512×768 frame with an added upscaler stage, so a plain 512×512 pass is lighter. If you measure the 512×512 path on a 3060 Ti, please contribute it. - Quality notes: AnimateDiff is a 2023 technique that predates native video DiTs; expect short (16-frame) clips with subtle, looping motion rather than long high-fidelity video.
mm_sd_v15_v2gives the most stable motion of the SD1.5 modules.
For the full benchmark data, see /check/animatediff/rtx-3060-ti.
Troubleshooting
Out of memory on an 8GB card
The same firsthand RTX 3060 account above measured "7.7 GB of dedicated GPU memory" for a 512×768 16-frame run with an upscaler — that configuration exceeds the 8GB RTX 3060 Ti's effective budget (after the desktop compositor, ~7GB is usable). Stay at 512×512, keep the batch at 16 frames, run with --lowvram, ensure cross-attention optimization is active (which the CivitAI Education table credits for the 5.6 GB figure), and drop any upscaler out of the generation pass — upscale as a separate step afterward.
SDXL motion adapter won't fit
Do not use mm_sdxl_v10_beta on this card. The canonical AnimateDiff README quotes ~13GB VRAM for the SDXL path at 1024×1024×16 — it overflows 8GB. Stick to the SD1.5 modules.
Motion module fails to load
If a .ckpt motion module throws an unpickling error, re-download it from the canonical guoyww/animatediff repo — truncated or mirror copies are a common cause. Verify the file is ~1.7GB for mm_sd_v15_v2.ckpt.