What You'll Build
A local install of Waypoint 1.5 — Overworld's 1.2B-parameter real-time interactive video world model — running 720p generation driven by keyboard and mouse input on an RTX 5060 Ti. Waypoint is a "real-time interactive video world model" — the controller loop (button presses, mouse deltas) is part of the inference loop, not a post-hoc edit.
Hardware data: RTX 5060 Ti (16GB VRAM) · 720p target resolution · See benchmark data
Variant note: The Waypoint 1.5 family ships two tiers: a 720p model for "desktop RTX 30 series through RTX 50 series cards" (this recipe) and a 360P fork for laptop GPUs. The RTX 5060 Ti is a desktop Blackwell card, so the 720p 1B checkpoint is the right target.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | NVIDIA RTX 30-series or later desktop card | RTX 5060 Ti (16GB) — pair not yet benchmarked, see /check/ |
| VRAM | Not stated in the model card; the 1.2B BF16 weights leave generous headroom on a 16GB card | 16GB |
| RAM | 16GB system RAM | — |
| Storage | A few GB for BF16 safetensors + caches | — |
| Software | Python 3.10+, PyTorch with CUDA + BF16, HuggingFace diffusers (or world_engine) | — |
The official Waypoint-1.5-1B model card does not publish an explicit VRAM number, but explicitly supports "Desktop RTX 30 Series and later" with the headline performance table showing the 1B BF16 model running 720p at 56 FPS on an RTX 5090 (unquantized, 4-step) and 30 FPS on an RTX 3090 (w8a8 quantized). The 5060 Ti sits between those two cards in the same generation envelope.
Installation
Two paths cover the canonical entry points the Overworld team documents:
Path A — world_engine (recommended for interactive use)
world_engine is Overworld's reference inference library, linked from the model card as the "Core Inference Library". Per the official Wayfarer-Labs/world_engine README:
pip install --upgrade --ignore-installed \
"world_engine @ git+https://github.com/Overworldai/world_engine.git"
export HF_TOKEN=<your_huggingface_access_token>
The README lists three quantization-tier paths by GPU architecture:
- NVIDIA (30xx, 40xx, Ampere+) — INT8 quantization
- NVIDIA Ada Lovelace / Hopper+ (RTX 40xx, H100) — FP8
- NVIDIA Blackwell (B100, B200, RTX 5090) — NVFP4
The RTX 5060 Ti is Blackwell-class (50-series), so on paper NVFP4 is available; unquantized BF16 is the simplest first-run target since quant=None is the example default.
Path B — HuggingFace diffusers (modular pipelines API)
Per the official model card, Waypoint also ships as a ModularPipeline. Install the latest diffusers:
pip install --upgrade diffusers transformers accelerate safetensors
pip install torch --index-url https://download.pytorch.org/whl/cu128
(cu128 wheels are the Blackwell-friendly index for a 5060 Ti.)
Running
Path A — world_engine with a scripted controller sequence
Adapted verbatim from examples/gen_sample.py in the world_engine repo:
# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B
import cv2, sys, json, random, urllib.request
import numpy as np
import imageio.v3 as iio
import torch
from world_engine import WorldEngine, CtrlInput
engine = WorldEngine(sys.argv[1], quant=None, device="cuda")
# Build a small controller programme: mouse, jump, walk W/A/S/D
controller_sequence = [
CtrlInput(mouse=[0.2, 0.2]), CtrlInput(button={32}), CtrlInput(),
CtrlInput(button={1, 32}), CtrlInput(),
]
controller_sequence += (
[CtrlInput(button={32})] * 10 + # forward
[CtrlInput(button={65})] * 10 + # A — left
[CtrlInput(button={68})] * 10 + # D — right
[CtrlInput(button={83})] * 10 # S — back
)
# Seed frame (any 1280x720 RGB image works)
seed_frame = cv2.imread("starter.png")
seed_frame = cv2.cvtColor(cv2.resize(seed_frame, (1280, 720)), cv2.COLOR_BGR2RGB)
seed_frame_x4 = torch.from_numpy(np.repeat(seed_frame[None], 4, axis=0))
with iio.imopen("out.mp4", "w", plugin="pyav") as out:
engine.append_frame(seed_frame_x4)
out.write(seed_frame_x4, fps=60, codec="libx264")
for ctrl in controller_sequence:
out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy())
Note the 4-frame chunking — gen_frame() returns four frames per call, matching the 60 FPS / 4-step schedule the model card cites.
Path B — diffusers ModularPipeline
The model card ships this canonical snippet:
import torch
from diffusers.modular_pipelines import ModularPipeline
from diffusers.utils import load_image, export_to_video
pipe = ModularPipeline.from_pretrained(
"Overworld/Waypoint-1.5-1B", trust_remote_code=True
)
pipe.load_components(
device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True
)
pipe.transformer.apply_inference_patches()
pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False)
image = load_image(
"https://huggingface.co/spaces/Overworld/waypoint-1-small/resolve/main/starter_18.png"
).resize((1024, 512))
state = pipe(image=image, prompt="An explorable world",
button=set(), mouse=(0.0, 0.0), output_type="pil")
state.values["image"] = None
frames = []
for _ in range(150):
state = pipe(state, button={87}, mouse=(0.0, 0.0), output_type="pil")
frames.append(state.values["images"])
export_to_video(frames, "waypoint-v1-5.mp4", fps=60)
button={87} is the W key (walk forward). Replace with your input loop for real-time controllable rollouts.
Results
- Speed: No community benchmark on RTX 5060 Ti yet. The model card publishes only two reference data points — 56 FPS at 720p on RTX 5090 (unquantized, 4-step) and 30 FPS at 720p on RTX 3090 (w8a8 quantized, 4-step). The 5060 Ti's actual throughput will likely sit between those values; once a community submission lands it appears at /check/waypoint-1-5/rtx-5060-ti. If you run it, please submit your numbers.
- VRAM usage: The model card does not state a VRAM figure; 1.2B BF16 weights leave generous headroom on a 16GB 5060 Ti. Live measurements: /check/waypoint-1-5/rtx-5060-ti.
- Latency target: Family-level target is "up to 720p and 60 FPS" with a 512-frame context window — about 10 seconds of rollout at 60 FPS (model card).
- Quality notes: Waypoint is "a generative world model, not a simulator with guaranteed physical accuracy" — design priorities are "Real-time interaction rather than offline batch generation, Low-latency responsiveness to user inputs, Local execution on consumer hardware, Persistent world rollouts where coherence across time matters as much as single-frame fidelity" (model card).
Troubleshooting
Frame rate is far below the model card's headline 56/30 FPS
The 5090 number is unquantized BF16; the 3090 number is w8a8 quantized. On a 5060 Ti, try the world_engine INT8 path (the repo lists "NVIDIA (30xx, 40xx, Ampere+) — INT8 quantization") before reaching for unquantized BF16. Also confirm pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False) is enabled in the diffusers path — the model card snippet includes it for a reason.
HF_TOKEN errors / 401 on download
Per the world_engine README, world_engine requires export HF_TOKEN=<your_huggingface_access_token> before the first run because Overworld checkpoints are gated. Accept the licence on the model card first.
Confusion with the 360P variant or other "Waypoint" projects
Only the Overworld/Waypoint-1.5-1B repo (720p) and its Overworld/Waypoint-1.5-1B-360P sibling (laptop tier) are the canonical world-model weights. Unrelated "Waypoint" libraries (e.g. game-dev navigation, robotics path planning) are different projects — don't conflate them.
CUDA-wheel mismatch on Blackwell (5060 Ti)
The 5060 Ti is Blackwell-class. Use the cu128 PyTorch wheels (pip install torch --index-url https://download.pytorch.org/whl/cu128) rather than the default cu121 / cu126 index to avoid kernel-launch failures on this architecture.
For other issues, file a report via the submission form.