Waypoint 1.5 on RTX 5060 Ti: Real-Time Interactive World Model at 720p

What You'll Build

A local install of Waypoint 1.5 — Overworld's 1.2B-parameter real-time interactive video world model — running 720p generation driven by keyboard and mouse input on an RTX 5060 Ti. Waypoint is a "real-time interactive video world model" — the controller loop (button presses, mouse deltas) is part of the inference loop, not a post-hoc edit.

Hardware data: RTX 5060 Ti (16GB VRAM) · 720p target resolution · See benchmark data

Variant note: The Waypoint 1.5 family ships two tiers: a 720p model for "desktop RTX 30 series through RTX 50 series cards" (this recipe) and a 360P fork for laptop GPUs. The RTX 5060 Ti is a desktop Blackwell card, so the 720p 1B checkpoint is the right target.

Requirements

Component	Minimum	Tested
GPU	NVIDIA RTX 30-series or later desktop card	RTX 5060 Ti (16GB) — pair not yet benchmarked, see /check/
VRAM	Not stated in the model card; the 1.2B BF16 weights leave generous headroom on a 16GB card	16GB
RAM	16GB system RAM	—
Storage	A few GB for BF16 safetensors + caches	—
Software	Python 3.10+, PyTorch with CUDA + BF16, HuggingFace `diffusers` (or `world_engine`)	—

The official Waypoint-1.5-1B model card does not publish an explicit VRAM number, but explicitly supports "Desktop RTX 30 Series and later" with the headline performance table showing the 1B BF16 model running 720p at 56 FPS on an RTX 5090 (unquantized, 4-step) and 30 FPS on an RTX 3090 (w8a8 quantized). The 5060 Ti sits between those two cards in the same generation envelope.

Installation

Two paths cover the canonical entry points the Overworld team documents:

Path A — `world_engine` (recommended for interactive use)

world_engine is Overworld's reference inference library, linked from the model card as the "Core Inference Library". Per the official Wayfarer-Labs/world_engine README:

pip install --upgrade --ignore-installed \
  "world_engine @ git+https://github.com/Overworldai/world_engine.git"
export HF_TOKEN=<your_huggingface_access_token>

The README lists three quantization-tier paths by GPU architecture:

NVIDIA (30xx, 40xx, Ampere+) — INT8 quantization
NVIDIA Ada Lovelace / Hopper+ (RTX 40xx, H100) — FP8
NVIDIA Blackwell (B100, B200, RTX 5090) — NVFP4

The RTX 5060 Ti is Blackwell-class (50-series), so on paper NVFP4 is available; unquantized BF16 is the simplest first-run target since quant=None is the example default.

Path B — HuggingFace `diffusers` (modular pipelines API)

Per the official model card, Waypoint also ships as a ModularPipeline. Install the latest diffusers:

pip install --upgrade diffusers transformers accelerate safetensors
pip install torch --index-url https://download.pytorch.org/whl/cu128

(cu128 wheels are the Blackwell-friendly index for a 5060 Ti.)

Running

Path A — `world_engine` with a scripted controller sequence

Adapted verbatim from examples/gen_sample.py in the world_engine repo:

# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B
import cv2, sys, json, random, urllib.request
import numpy as np
import imageio.v3 as iio
import torch
from world_engine import WorldEngine, CtrlInput

engine = WorldEngine(sys.argv[1], quant=None, device="cuda")

# Build a small controller programme: mouse, jump, walk W/A/S/D
controller_sequence = [
    CtrlInput(mouse=[0.2, 0.2]), CtrlInput(button={32}), CtrlInput(),
    CtrlInput(button={1, 32}), CtrlInput(),
]
controller_sequence += (
    [CtrlInput(button={32})] * 10 +  # forward
    [CtrlInput(button={65})] * 10 +  # A — left
    [CtrlInput(button={68})] * 10 +  # D — right
    [CtrlInput(button={83})] * 10    # S — back
)

# Seed frame (any 1280x720 RGB image works)
seed_frame = cv2.imread("starter.png")
seed_frame = cv2.cvtColor(cv2.resize(seed_frame, (1280, 720)), cv2.COLOR_BGR2RGB)
seed_frame_x4 = torch.from_numpy(np.repeat(seed_frame[None], 4, axis=0))

with iio.imopen("out.mp4", "w", plugin="pyav") as out:
    engine.append_frame(seed_frame_x4)
    out.write(seed_frame_x4, fps=60, codec="libx264")
    for ctrl in controller_sequence:
        out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy())

Note the 4-frame chunking — gen_frame() returns four frames per call, matching the 60 FPS / 4-step schedule the model card cites.

Path B — `diffusers` ModularPipeline

The model card ships this canonical snippet:

import torch
from diffusers.modular_pipelines import ModularPipeline
from diffusers.utils import load_image, export_to_video

pipe = ModularPipeline.from_pretrained(
    "Overworld/Waypoint-1.5-1B", trust_remote_code=True
)
pipe.load_components(
    device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True
)
pipe.transformer.apply_inference_patches()
pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False)

image = load_image(
    "https://huggingface.co/spaces/Overworld/waypoint-1-small/resolve/main/starter_18.png"
).resize((1024, 512))

state = pipe(image=image, prompt="An explorable world",
             button=set(), mouse=(0.0, 0.0), output_type="pil")

state.values["image"] = None
frames = []
for _ in range(150):
    state = pipe(state, button={87}, mouse=(0.0, 0.0), output_type="pil")
    frames.append(state.values["images"])

export_to_video(frames, "waypoint-v1-5.mp4", fps=60)

button={87} is the W key (walk forward). Replace with your input loop for real-time controllable rollouts.

Results

Speed: No community benchmark on RTX 5060 Ti yet. The model card publishes only two reference data points — 56 FPS at 720p on RTX 5090 (unquantized, 4-step) and 30 FPS at 720p on RTX 3090 (w8a8 quantized, 4-step). The 5060 Ti's actual throughput will likely sit between those values; once a community submission lands it appears at /check/waypoint-1-5/rtx-5060-ti. If you run it, please submit your numbers.
VRAM usage: The model card does not state a VRAM figure; 1.2B BF16 weights leave generous headroom on a 16GB 5060 Ti. Live measurements: /check/waypoint-1-5/rtx-5060-ti.
Latency target: Family-level target is "up to 720p and 60 FPS" with a 512-frame context window — about 10 seconds of rollout at 60 FPS (model card).
Quality notes: Waypoint is "a generative world model, not a simulator with guaranteed physical accuracy" — design priorities are "Real-time interaction rather than offline batch generation, Low-latency responsiveness to user inputs, Local execution on consumer hardware, Persistent world rollouts where coherence across time matters as much as single-frame fidelity" (model card).

Troubleshooting

Frame rate is far below the model card's headline 56/30 FPS

The 5090 number is unquantized BF16; the 3090 number is w8a8 quantized. On a 5060 Ti, try the world_engine INT8 path (the repo lists "NVIDIA (30xx, 40xx, Ampere+) — INT8 quantization") before reaching for unquantized BF16. Also confirm pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False) is enabled in the diffusers path — the model card snippet includes it for a reason.

`HF_TOKEN` errors / 401 on download

Per the world_engine README, world_engine requires export HF_TOKEN=<your_huggingface_access_token> before the first run because Overworld checkpoints are gated. Accept the licence on the model card first.

Confusion with the 360P variant or other "Waypoint" projects

Only the Overworld/Waypoint-1.5-1B repo (720p) and its Overworld/Waypoint-1.5-1B-360P sibling (laptop tier) are the canonical world-model weights. Unrelated "Waypoint" libraries (e.g. game-dev navigation, robotics path planning) are different projects — don't conflate them.

CUDA-wheel mismatch on Blackwell (5060 Ti)

The 5060 Ti is Blackwell-class. Use the cu128 PyTorch wheels (pip install torch --index-url https://download.pytorch.org/whl/cu128) rather than the default cu121 / cu126 index to avoid kernel-launch failures on this architecture.

For other issues, file a report via the submission form.