What You'll Build
A local install of Waypoint 1.5 — Overworld's 1.2B-parameter real-time interactive video world model — running 720p generation driven by keyboard and mouse input on an RTX 4090. Waypoint is a "real-time interactive video world model" — the controller loop (button presses, mouse deltas) is part of the inference loop, not a post-hoc edit.
Hardware data: RTX 4090 (24GB VRAM) · 720p target resolution · See benchmark data
ℹ️ Not an image-to-3D-mesh model. Despite being grouped in our
3dvertical (which spans both classical mesh-output models like TRELLIS / Hunyuan3D and the newer generative-worlds class), Waypoint 1.5 does not produce.obj/.glb/.plymesh files. It produces an interactive video stream conditioned on live controller inputs, with success measured in frames-per-second and rollout coherence rather than mesh topology. If you need a mesh output, see TRELLIS or Hunyuan3D-2 on this GPU instead.
Variant note: The Waypoint 1.5 family ships two tiers: a 720p model for "desktop RTX 30 series through RTX 50 series cards" (this recipe) and a 360P fork for laptop GPUs and Apple Silicon. The RTX 4090 is a desktop Ada Lovelace card, so the 720p 1B checkpoint is the right target.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | NVIDIA RTX 30-series or later desktop card | RTX 4090 (24GB) — pair not yet benchmarked, see /check/ |
| VRAM | Not stated in the model card; the 1.2B BF16 weights are ~11.18 GB on disk per the HF tree API (3.72 GB fused model.safetensors + 7.44 GB modular transformer/diffusion_pytorch_model.safetensors + 22.76 MB VAE), so the 24 GB envelope of a 4090 has ample headroom even before quantization | 24GB |
| RAM | 16GB system RAM | — |
| Storage | ~12 GB for BF16 safetensors + caches | — |
| Software | Python 3.10+, PyTorch with CUDA + BF16, HuggingFace diffusers (or world_engine) | — |
The official Waypoint-1.5-1B model card does not publish an explicit VRAM number. It does explicitly support "Desktop RTX 30 Series and later" and publishes a headline performance table with the 1B BF16 model running 720p at 56 FPS on an RTX 5090 (unquantized, 4-step) and 30 FPS on an RTX 3090 (w8a8 quantized, 4-step). The RTX 4090 — Ada Lovelace (sm_89), 24 GB, ~1008 GB/s memory bandwidth — sits squarely inside the supported envelope between those two reference points. The 24 GB VRAM matches a 5090 and is well above the 11.18 GB on-disk BF16 footprint.
Installation
Two paths cover the canonical entry points the Overworld team documents:
Path A — world_engine (recommended for interactive use)
world_engine is Overworld's reference inference library, linked from the model card as the "Core Inference Library". Per the official Wayfarer-Labs/world_engine README:
python3 -m venv .env
source .env/bin/activate
pip install --upgrade --ignore-installed \
"world_engine @ git+https://github.com/Overworldai/world_engine.git"
export HF_TOKEN=<your_huggingface_access_token>
The README lists three inference quantization paths by GPU architecture:
| Config | Description | Supported GPUs (per the README) |
|---|---|---|
intw8a8 | INT8 weights + INT8 dynamic per-token activations | NVIDIA (30xx, 40xx, Ampere+) |
fp8w8a8 | FP8 (e4m3) weights + FP8 per-tensor activations via torch._scaled_mm | NVIDIA Ada Lovelace / Hopper+ (RTX 40xx, H100) |
nvfp4 | NVFP4 weights + FP4 activations via FlashInfer/CUTLASS | NVIDIA Blackwell (B100, B200, RTX 5090) |
The RTX 4090 is Ada Lovelace (sm_89), so the recommended fast paths are fp8w8a8 (Ada has 4th-gen Tensor Cores with native FP8 support) or intw8a8 (broadly compatible). The Blackwell-only nvfp4 path does not apply on a 4090 — it requires sm_120. Start with quant=None (unquantized BF16) since the 24 GB envelope has enough headroom, then drop to fp8w8a8 for throughput.
Path B — HuggingFace diffusers (modular pipelines API)
Per the official model card, Waypoint also ships as a ModularPipeline. Install the latest diffusers against a standard CUDA wheel — Ada is fully covered by the default pip install torch (no special index needed, unlike Blackwell which requires cu128):
pip install --upgrade diffusers transformers accelerate safetensors
pip install torch # default cu124/cu126 index is fine on Ada sm_89
Running
Path A — world_engine with a scripted controller sequence
Adapted verbatim from examples/gen_sample.py in the world_engine repo:
# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B
import cv2, sys, json, random, urllib.request
import numpy as np
import imageio.v3 as iio
import torch
from world_engine import WorldEngine, CtrlInput
engine = WorldEngine(sys.argv[1], quant=None, device="cuda")
# Build a small controller programme: mouse, jump, walk W/A/S/D
controller_sequence = [
CtrlInput(mouse=[0.2, 0.2]), CtrlInput(button={32}), CtrlInput(),
CtrlInput(button={1, 32}), CtrlInput(),
]
controller_sequence += (
[CtrlInput(button={32})] * 10 + # forward
[CtrlInput(button={65})] * 10 + # A — left
[CtrlInput(button={68})] * 10 + # D — right
[CtrlInput(button={83})] * 10 # S — back
)
# Seed frame (any 1280x720 RGB image works)
seed_frame = cv2.imread("starter.png")
seed_frame = cv2.cvtColor(cv2.resize(seed_frame, (1280, 720)), cv2.COLOR_BGR2RGB)
seed_frame_x4 = torch.from_numpy(np.repeat(seed_frame[None], 4, axis=0))
with iio.imopen("out.mp4", "w", plugin="pyav") as out:
engine.append_frame(seed_frame_x4)
out.write(seed_frame_x4, fps=60, codec="libx264")
for ctrl in controller_sequence:
out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy())
Note the 4-frame chunking — gen_frame() returns four frames per call (the world_engine README explicitly documents this: Waypoint-1.5 "applies temporal compression and generates 4 frames for every controller input" of shape [4, 720, 1280, 3]), matching the 60 FPS / 4-step schedule the model card cites.
Path B — diffusers ModularPipeline
The model card ships this canonical snippet:
import torch
from diffusers.modular_pipelines import ModularPipeline
from diffusers.utils import load_image, export_to_video
pipe = ModularPipeline.from_pretrained(
"Overworld/Waypoint-1.5-1B", trust_remote_code=True
)
pipe.load_components(
device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True
)
pipe.transformer.apply_inference_patches()
pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False)
image = load_image(
"https://huggingface.co/spaces/Overworld/waypoint-1-small/resolve/main/starter_18.png"
).resize((1024, 512))
state = pipe(image=image, prompt="An explorable world",
button=set(), mouse=(0.0, 0.0), output_type="pil")
state.values["image"] = None
frames = []
for _ in range(150):
state = pipe(state, button={87}, mouse=(0.0, 0.0), output_type="pil")
frames.append(state.values["images"])
export_to_video(frames, "waypoint-v1-5.mp4", fps=60)
button={87} is the W key (walk forward). Replace with your input loop for real-time controllable rollouts.
Results
- Speed: No community benchmark exists on RTX 4090 yet. The model card publishes two reference data points — 56 FPS at 720p on RTX 5090 (unquantized BF16, 4-step) and 30 FPS at 720p on RTX 3090 (w8a8 quantized, 4-step). Neither figure can be relabeled as a 4090 measurement: the 5090 is Blackwell (sm_120, GDDR7) and the 3090 is Ampere (sm_86, GDDR6X), both cross-generation from Ada Lovelace's sm_89. The 4090's actual throughput on this model is unknown until a community submission lands at /check/waypoint-1-5/rtx-4090. If you run it, please submit your numbers.
- VRAM usage: The model card does not state a VRAM figure. As a derived envelope: the BF16 weights are ~11.18 GB on disk per the HF tree API (3.72 GB fused + 7.44 GB modular transformer + 22.76 MB VAE), so the 24 GB envelope of a 4090 has ample headroom for activations, the 512-frame context window, and the autoencoder pipeline even at unquantized BF16. Live measurements: /check/waypoint-1-5/rtx-4090.
- Latency target: Family-level target is "up to 720p and 60 FPS" with a 512-frame context window — about 10 seconds of rollout at 60 FPS (model card).
- Quality notes: Waypoint is "a generative world model, not a simulator with guaranteed physical accuracy" — design priorities are "Real-time interaction rather than offline batch generation, Low-latency responsiveness to user inputs, Local execution on consumer hardware, Persistent world rollouts where coherence across time matters as much as single-frame fidelity" (model card).
For the full benchmark data, see /check/waypoint-1-5/rtx-4090.
Troubleshooting
Frame rate is below your expectation
The model card's two reference data points (56 FPS 5090 BF16 / 30 FPS 3090 w8a8) cover the boundaries of Overworld's "RTX 30 Series and later" support envelope. On a 4090 you have two levers per the world_engine README: try quant="fp8w8a8" (Ada has 4th-gen Tensor Cores with native FP8 support, distinct from the Blackwell-only nvfp4 path) or quant="intw8a8" (the broadly-compatible Ampere+ path). Also confirm pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False) is enabled in the diffusers path — the model card snippet includes it for a reason.
HF_TOKEN errors / 401 on download
Per the world_engine README, world_engine requires export HF_TOKEN=<your_huggingface_access_token> before the first run. Accept the licence on the model card first, then re-export your token.
Confusion with the 360P variant, the SPar3D successor rumour, or other "Waypoint" projects
Only the Overworld/Waypoint-1.5-1B repo (720p) and its Overworld/Waypoint-1.5-1B-360P sibling (laptop tier) are the canonical world-model weights. Despite the name overlap, this is not an image-to-3D mesh model and not a SPar3D successor — spar3d/Waypoint-1.5 does not resolve on the Hub. Unrelated "Waypoint" libraries (e.g. game-dev navigation, robotics path planning) are different projects — don't conflate them.
Picking a quant path on Ada (RTX 4090)
The world_engine README maps nvfp4 to Blackwell only (B100, B200, RTX 5090). On a 4090, do not attempt the nvfp4 path — it requires sm_120 kernels via FlashInfer/CUTLASS that the Ada Tensor Cores do not implement. Use unquantized BF16 as the default (24 GB has the headroom), fp8w8a8 for a throughput uplift (Ada's native FP8), or intw8a8 if you want the most broadly-compatible quantized path.
For other issues, file a report via the submission form.