Research Areas

Multi-agent systems, agent harnesses, and economic alignment

This page is the index of my papers. My research develops methods and evaluations for multi-agent systems (zero-sum and general-sum), generalized harnesses for embodied agents, and economic alignment and safety.

Economic Alignment

Large populations of language agents, mechanism design, and policy evaluation in synthetic economic systems, with a focus on alignment and safety.

Agent Harnesses

Strategic language agents and reasoning frameworks for adversarial decision making, with expert-level performance in competitive games.

Robotics

Earlier work on improving kinodynamic planning for autonomous vehicles with learned controllers.

Selected Papers

Agent harnesses

PokéChamp: An Expert-level Minimax Language Agent

Seth Karten, Andy Nguyen, Chi Jin. ICML Spotlight, 2025.

The first Pokemon battling paper at ICML, ICLR, or NeurIPS. An ICML Spotlight paper that establishes competitive Pokemon battling as a top-tier machine learning setting for reasoning agents and strategic language agents.

Evaluations & RL environments

The PokeAgent Challenge: Competitive and Long Context Learning at Scale

Seth Karten, Jake Grigsby, Stephanie Milani, Kiran Vodrahalli, Amy Zhang, Fei Fang, Yuke Zhu, Chi Jin. NeurIPS Competition Track, 2025.

A competition benchmark and evaluation harness that turns Pokemon into a durable machine learning testbed for long-context learning, reasoning agents, embodied agents, and strategic decision making.

Evaluations & RL environments

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Wayne Chi, Yixiong Fang, Arnav Yayavaram, Siddharth Yayavaram, Seth Karten, Qiuhong Anna Wei, Runkun Chen, Alexander Wang, Valerie Chen, Ameet Talwalkar, Chris Donahue. arXiv preprint, 2026.

A benchmark of 132 real game development tasks evaluating agentic coding, multimodal reasoning, and graphics-aware capabilities, with image and video feedback substantially improving performance.

Evaluations & RL environments

Automatic Generation of High-Performance RL Environments

Seth Karten, Rahul Dev Appapogu, Chi Jin. arXiv preprint, 2026.

An agent-assisted method that translates RL environments into high-performance implementations with semantic equivalence, achieving speedups up to 22,320x and validating cross-backend policy transfer.