▶ A reset-free harness for embodied agents

CONTINUAL HARNESS

Online Adaptation for Self-Improving Foundation Agents

  • BLUE — cleared
  • YELLOW LEGACY (hard) — cleared
  • CRYSTAL — 0 KO
  1. L98Seth Karten*1
  2. L98Joel Zhang*2
  3. L72Tersoo Upaa Jr1
  4. L72Ruirong Feng1
  5. L72Wenzhe Li1
  6. L72Chengshuai Shi1
  7. L99Chi Jin1
  8. L99Kiran Vodrahalli3

* Equal contribution. 1 Princeton University  ·  2 ARISE Foundation  ·  3 Google DeepMind

README

What is Continual Harness?

Coding harnesses such as Claude Code and OpenHands wrap foundation models with tools, memory, and planning, but no equivalent exists for embodied agents' long-horizon partial-observability decision-making. We first report our Gemini Plays Pokémon (GPP) experiments. With iterative human-in-the-loop harness refinement, GPP became the first AI system to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle. In the hardest stages, the agent itself began iterating on its strategy through long-context memory, surfacing emergent self-improvement signals under human-in-the-loop refinement.

Continual Harness removes the human from this loop: a reset-free self-improving harness for embodied agents that formalizes and automates what we observed. Starting from only a minimal environment interface, the agent alternates between acting and refining its own prompt, sub-agents, skills, and memory, drawing on any past trajectory data. Prompt-optimization methods require episode resets; Continual Harness adapts online within a single run.

On Pokémon Red and Emerald across frontier models, Continual Harness starting from scratch substantially reduces button-press cost relative to the minimalist baseline and recovers a majority of the gap to a hand-engineered expert harness, with capability-dependent gains. We then close the loop with the model itself: an online process-reward co-learning loop, in which an open-source agent's rollouts through the refining harness are relabeled by a frontier teacher and used to update the model, drives sustained in-game milestone progress on Pokémon Red without resetting the environment between training iterations.

SELECTED RUNS

The harness in motion.

A walking tour across Pokémon Red and Emerald — sub-agents, skills, online prompt optimization, long-context memory, gym battles, and bootstrapped auto-evolution. Clips are sped up for readability.

01

Sub-agent creation & delegation

The harness spawns specialized sub-agents on the fly and delegates sub-tasks to them.

02

Skill creation & revision

The agent writes a new skill, uses it, then revises it after observing the outcome.

03

Online prompt optimization

The harness rewrites its own prompt mid-run, with no episode reset between iterations.

04

Memory unsticks a blocked route

Long-context memory recognizes a previously-failed path and routes around it.

05

Route 102 — battling with sub-agents & replanning

Combat sub-agents handle wild encounters while the planner replans on partial observations.

06

Pewter Gym — Brock

Red. The harness battles its way through the first gym leader.

07

Cerulean Gym — Misty

Red. Type-aware combat sub-agents take down the Cascade Badge fight.

08

Vermilion Gym — Lt. Surge

Red. Clearing the switch puzzle and the Thunder Badge battle.

09

Fixing & using a navigation skill

Emerald. The agent repairs its own navigation skill, then uses the fixed version in the field.

10

Objective planning

Emerald. Decomposing a long-horizon goal into sub-objectives the harness can act on.

11

Online refinement — first pass

Emerald. An early self-improvement loop: a refinement cycle to the harness in flight.

12

Refining a battle sub-agent

Emerald. The harness refines a dedicated battling sub-agent over a long-horizon run.

13

Continual refinement after a defeat

Emerald. Fixing the navigation skill, losing to the rival, writing memory of the defeat, then switching policy to switch-train the whole team.

14

Bootstrapped continual run

Emerald. End-to-end: navigating to the gym, the Wally battle, entering Mauville and recalling memory from a previous run, solving switches, and a double battle.

📡 LIVE

Gemini Plays Pokémon — live stream.

The human-in-the-loop precursor that motivated Continual Harness. Iterative harness refinement, long-context memory, and emergent strategy in real time.

Twitch embeds require a matching parent param. Replace YOUR_DOMAIN_HERE in index.html with your deployed hostname.

CITE

Citation

@misc{karten2026continualharness,
  title  = {Continual Harness: Online Adaptation for Self-Improving Foundation Agents},
  author = {Karten, Seth and Zhang, Joel and Upaa Jr, Tersoo and
            Feng, Ruirong and Li, Wenzhe and Shi, Chengshuai and
            Jin, Chi and Vodrahalli, Kiran},
  year   = {2026},
  note   = {Preprint}
}