Waypoint 1.5 on RTX 4080: Real-Time Interactive World Model at 720p

What You'll Build

A local install of Waypoint 1.5 — Overworld's 1.2B-parameter real-time interactive video world model — running 720p generation driven by keyboard and mouse input on an RTX 4080 (16GB). Waypoint is a real-time interactive video world model: the controller loop (button presses, mouse deltas) is part of the inference loop, not a post-hoc edit.

Hardware data: RTX 4080 (16GB VRAM) · 720p target resolution · See benchmark data

ℹ️ Not an image-to-3D-mesh model. Despite being grouped in our 3d vertical (which spans both classical mesh-output models like TRELLIS / Hunyuan3D and the newer generative-worlds class), Waypoint 1.5 does not produce .obj / .glb / .ply mesh files. It produces an interactive video stream conditioned on live controller inputs, with success measured in frames-per-second and rollout coherence rather than mesh topology. If you need a mesh output, see TRELLIS or Hunyuan3D-2.1 on this GPU instead.

Variant note: The Waypoint 1.5 family ships two tiers: a 720p model for higher-performance systems (this recipe) and a 360p fork for "a broader range of gaming PCs and Apple Silicon Macs." The RTX 4080 is a desktop Ada Lovelace card squarely inside the 720p model's "Desktop RTX 30 Series and later" support window, so the 720p 1B checkpoint is the right target.

Requirements

Component	Minimum	Tested
GPU	NVIDIA RTX 30-series or later desktop card	RTX 4080 (16GB) — pair not yet benchmarked, see /check/
VRAM	Not stated on the model card; the 1.2B BF16 weights are ~11.19 GB on disk per the HF tree API (3.72 GB fused `model.safetensors` + 7.44 GB modular `transformer/diffusion_pytorch_model.safetensors` + 22.76 MB VAE), leaving room inside the 4080's 16 GB for activations and the 512-frame context. The w8a8 quant paths (below) reduce the footprint further.	16GB
RAM	16GB system RAM	—
Storage	~12 GB for BF16 safetensors + caches	—
Software	Python 3.10+, PyTorch with CUDA + BF16, HuggingFace `diffusers` (or `world_engine`)	—

The official Waypoint-1.5-1B model card does not publish an explicit VRAM number. It explicitly supports "Desktop RTX 30 Series and later," and its published performance table cites reference data points on both an RTX 3090 and an RTX 5090. The RTX 4080 — Ada Lovelace (sm_89), 16 GB GDDR6X, ~716.8 GB/s memory bandwidth — sits inside that supported envelope, between those two reference cards. With the BF16 weights at ~11.19 GB on disk and w8a8 quantization available as a memory-and-throughput lever, the 16 GB envelope is comfortable rather than vast — see the quant guidance in Installation.

Installation

Two paths cover the canonical entry points the Overworld team documents.

Path A — `world_engine` (recommended for interactive use)

world_engine is Overworld's reference inference library, linked from the model card as the "Core Inference Library." Per the official Wayfarer-Labs/world_engine README:

python3 -m venv .env
source .env/bin/activate
pip install --upgrade --ignore-installed \
  "world_engine @ git+https://github.com/Overworldai/world_engine.git"
export HF_TOKEN=<your_huggingface_access_token>

The README lists three inference quantization paths by GPU architecture:

Config	Description (per the README)	Supported GPUs (per the README)
`intw8a8`	INT8 weights + INT8 dynamic per-token activations	NVIDIA (30xx, 40xx, Ampere+)
`fp8w8a8`	FP8 (e4m3) weights + FP8 per-tensor activations via `torch._scaled_mm`	NVIDIA Ada Lovelace / Hopper+ (RTX 40xx, H100)
`nvfp4`	NVFP4 weights + FP4 activations via FlashInfer/CUTLASS	NVIDIA Blackwell (B100, B200, RTX 5090)

The RTX 4080 is Ada Lovelace (sm_89), so the relevant fast paths are fp8w8a8 (Ada's 4th-gen Tensor Cores have native FP8 support) and intw8a8 (the broadly-compatible Ampere+ path). The Blackwell-only nvfp4 path does not apply on a 4080 — it requires sm_120 hardware the Ada Tensor Cores do not implement. You can start with quant=None (unquantized BF16) — the 16 GB envelope holds the ~11.19 GB weights — then drop to fp8w8a8 or intw8a8 for throughput and headroom on the 16 GB card.

Path B — HuggingFace `diffusers` (Modular Pipeline API)

Per the official model card, Waypoint also ships as a ModularPipeline. Install the latest diffusers against a standard CUDA wheel — Ada sm_89 is fully covered by the default pip install torch (no special index needed, unlike Blackwell which requires cu128):

pip install --upgrade diffusers transformers accelerate safetensors
pip install torch  # default cu124/cu126 index is fine on Ada sm_89

Running

Path A — `world_engine` with a scripted controller sequence

Adapted from examples/gen_sample.py in the world_engine repo:

# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B
import cv2, sys
import numpy as np
import imageio.v3 as iio
import torch
from world_engine import WorldEngine, CtrlInput

# Ada (RTX 4080): start unquantized; switch to "fp8w8a8" or "intw8a8" for throughput
engine = WorldEngine(sys.argv[1], quant=None, device="cuda")

# Build a small controller programme: mouse, jump, walk W/A/S/D
controller_sequence = [
    CtrlInput(mouse=[0.2, 0.2]), CtrlInput(button={32}), CtrlInput(),
    CtrlInput(button={1, 32}), CtrlInput(),
]
controller_sequence += (
    [CtrlInput(button={32})] * 10 +  # forward
    [CtrlInput(button={65})] * 10 +  # A — left
    [CtrlInput(button={68})] * 10 +  # D — right
    [CtrlInput(button={83})] * 10    # S — back
)

# Seed frame (any 1280x720 RGB image works)
seed_frame = cv2.imread("starter.png")
seed_frame = cv2.cvtColor(cv2.resize(seed_frame, (1280, 720)), cv2.COLOR_BGR2RGB)
seed_frame_x4 = torch.from_numpy(np.repeat(seed_frame[None], 4, axis=0))

with iio.imopen("out.mp4", "w", plugin="pyav") as out:
    engine.append_frame(seed_frame_x4)
    out.write(seed_frame_x4, fps=60, codec="libx264")
    for ctrl in controller_sequence:
        out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy())

Note the 4-frame chunking — gen_frame() returns four frames per call. The world_engine README documents that the model "generates 4 frames for every controller input" via temporal compression, producing an output of shape [4, 720, 1280, 3].

Path B — `diffusers` ModularPipeline

The model card ships this canonical snippet:

import torch
from diffusers.modular_pipelines import ModularPipeline
from diffusers.utils import load_image, export_to_video

pipe = ModularPipeline.from_pretrained(
    "Overworld/Waypoint-1.5-1B", trust_remote_code=True
)
pipe.load_components(
    device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True
)
pipe.transformer.apply_inference_patches()
pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False)

image = load_image(
    "https://huggingface.co/spaces/Overworld/waypoint-1-small/resolve/main/starter_18.png"
).resize((1024, 512))

state = pipe(image=image, prompt="An explorable world",
             button=set(), mouse=(0.0, 0.0), output_type="pil")

state.values["image"] = None
frames = []
for _ in range(150):
    state = pipe(state, button={87}, mouse=(0.0, 0.0), output_type="pil")
    frames.append(state.values["images"])

export_to_video(frames, "waypoint-v1-5.mp4", fps=60)

button={87} is the W key (walk forward). Replace it with your input loop for real-time controllable rollouts. Output lands in waypoint-v1-5.mp4.

Results

Speed: No community benchmark exists on RTX 4080 yet, and the model card does not name the 4080. The model card performance table publishes two desktop-GPU reference points at 720p, 4-step: 30 FPS on an RTX 3090 (w8a8 quantized) and 72 FPS on an RTX 5090 (w8a8 quantized; 56 FPS unquantized). Neither can be relabeled as a 4080 measurement — the RTX 3090 is Ampere (sm_86, GDDR6X) and the RTX 5090 is Blackwell (sm_120, GDDR7), both cross-generation from the 4080's Ada Lovelace sm_89, and the 4080's 16 GB / ~716.8 GB/s envelope differs materially from both. The 4080 sits between those two reference points but its actual throughput is unknown until a community submission lands at /check/waypoint-1-5/rtx-4080. If you run it, please submit your numbers.
VRAM usage: The model card does not state a VRAM figure. As a derived envelope, the BF16 weights are ~11.19 GB on disk per the HF tree API (3.72 GB fused + 7.44 GB modular transformer + 22.76 MB VAE), so the 4080's 16 GB holds the weights plus activations and the 512-frame context at unquantized BF16, with the w8a8 quant paths (fp8w8a8 / intw8a8) reducing the footprint further. Live measurements will appear at /check/waypoint-1-5/rtx-4080.
Latency target: The family-level target is "up to 720p and 60 FPS" with a 512-frame context window — about 10 seconds of rollout at 60 FPS (model card).
Quality notes: Waypoint is "a generative world model, not a simulator with guaranteed physical accuracy"; the model card warns that "long interactive rollouts may drift, collapse, or become inconsistent" (model card).

For the full benchmark data, see /check/waypoint-1-5/rtx-4080.

Troubleshooting

Frame rate is below your expectation

The model card's two reference data points (30 FPS RTX 3090 w8a8 / 72 FPS RTX 5090 w8a8) bracket the boundaries of Overworld's "RTX 30 Series and later" support envelope, and the 4080 is not separately benchmarked. On a 4080 you have two architecture-appropriate levers per the world_engine README: try quant="fp8w8a8" (Ada has 4th-gen Tensor Cores with native FP8 support, distinct from the Blackwell-only nvfp4 path) or quant="intw8a8" (the broadly-compatible Ampere+ path). Also confirm pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False) is enabled in the diffusers path — the model card snippet includes it for a reason.

`HF_TOKEN` errors / 401 on download

Per the world_engine README, world_engine requires export HF_TOKEN=<your_huggingface_access_token> before the first run — it uses the token to download the weights from HuggingFace. The model card is Apache-2.0 and ungated, so no licence click-through is needed; create a token at your HF settings and export it.

Do not use the `nvfp4` quant path on a 4080

The world_engine README maps nvfp4 to Blackwell only (RTX 50-series, B200). On an Ada Lovelace 4080, do not attempt the nvfp4 path — it requires sm_120 FP4 kernels (via FlashInfer/CUTLASS) that the Ada Tensor Cores do not implement. Use unquantized BF16 as the default, fp8w8a8 for a throughput uplift (Ada's native FP8), or intw8a8 for the most broadly-compatible quantized path.

Confusion with the 360P variant or other "Waypoint" projects

Only the Overworld/Waypoint-1.5-1B repo (720p) and its Overworld/Waypoint-1.5-1B-360P sibling (laptop / Apple Silicon tier) are the canonical world-model weights. Despite the name overlap, this is not an image-to-3D mesh model and not a SPar3D successor. Unrelated "Waypoint" libraries (game-dev navigation, robotics path planning) are different projects — don't conflate them.

For other issues, file a report via the submission form.