Research | Seth Karten

Economic alignment

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

Seth Karten, Cameron Crow, Chi Jin. COLM, 2026.

A paper on enabling economic alignment in multi-agent marketplaces of language agents.

arXiv

Economic alignment

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

Seth Karten, Wenzhe Li, Zihan Ding, Samuel Kleiner, Yu Bai, Chi Jin. NeurIPS Algorithmic Collective Action Workshop, 2025.

This paper studies large populations of language agents and uses them to analyze policy and mechanism design questions in multi-agent generative simulacra, with a focus on economic alignment and safety.

arXiv Code

Agents

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin. COLM, 2026.

A reinforcement learning framework for training VLMs on 100+ turn game-playing tasks, using a turn-level critic for training stability and achieving 3x game progress over frontier models.

arXiv

Agents

PokéChamp: An Expert-level Minimax Language Agent

Seth Karten, Andy Nguyen, Chi Jin. ICML Spotlight, 2025.

The first Pokemon battling paper at ICML, ICLR, or NeurIPS. An ICML Spotlight paper that establishes competitive Pokemon battling as a top-tier machine learning setting for reasoning agents and strategic language agents.

arXiv Code Website

Evaluations & RL environments

The PokeAgent Challenge: Competitive and Long Context Learning at Scale

Seth Karten, Jake Grigsby, Stephanie Milani, Kiran Vodrahalli, Amy Zhang, Fei Fang, Yuke Zhu, Chi Jin. NeurIPS Competition Track, 2025.

A competition benchmark and evaluation harness that turns Pokemon into a durable machine learning testbed for long-context learning, reasoning agents, embodied agents, and strategic decision making.

arXiv PDF Website

Evaluations & RL environments

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Wayne Chi, Yixiong Fang, Arnav Yayavaram, Siddharth Yayavaram, Seth Karten, Qiuhong Anna Wei, Runkun Chen, Alexander Wang, Valerie Chen, Ameet Talwalkar, Chris Donahue. ICML, 2026.

A benchmark of 132 real game development tasks evaluating agentic coding, multimodal reasoning, and graphics-aware capabilities, with image and video feedback substantially improving performance.

arXiv

Evaluations & RL environments

Automatic Generation of High-Performance RL Environments

Seth Karten, Rahul Dev Appapogu, Chi Jin. COLM, 2026.

An agent-assisted method that translates RL environments into high-performance implementations with semantic equivalence, achieving speedups up to 22,320x and validating cross-backend policy transfer.

arXiv

Evaluations & RL environments

FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Wenzhe Li, Zihan Ding, Seth Karten, Chi Jin. ICML, 2024.

A benchmark and empirical test harness for evaluating competitive multi-agent reinforcement learning in structured adversarial environments.

arXiv Code

Emergent communication

On the Role of Emergent Communication for Social Learning in Multi-Agent Reinforcement Learning

Seth Karten, Siva Kailas, Huao Li, Katia Sycara. AAMAS, 2023.

This paper examines how emergent communication shapes social learning dynamics in multi-agent reinforcement learning.

arXiv Code

Emergent communication

Towards True Lossless Sparse Communication in Multi-Agent Systems

Seth Karten, Mycal Tucker, Siva Kailas, Katia Sycara. ICRA, 2023.

This paper studies how to learn sparse communication protocols without discarding information needed for coordination.

arXiv Code

Emergent communication

Interpretable Learned Emergent Communication for Human-Agent Teams

Seth Karten, Mycal Tucker, Huao Li, Siva Kailas, Michael Lewis, Katia Sycara. IEEE Transactions on Cognitive and Developmental Systems, 2023.

This paper focuses on making learned emergent communication more interpretable for human-agent teams.

Publisher Code

Robotics

Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers

Aravind Sivaramakrishnan, Edgar Granados, Seth Karten, Troy McMahon, Kostas E. Bekris. IROS, 2021.

This paper improves kinodynamic planning for vehicles by combining classical planning with learned goal-reaching controllers.

Publisher

Multi-agent systems, agent harnesses, and economic alignment

Economic Alignment

Agents

Evaluations & RL Environments

Emergent Communication

Robotics

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

PokéChamp: An Expert-level Minimax Language Agent

The PokeAgent Challenge: Competitive and Long Context Learning at Scale

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Automatic Generation of High-Performance RL Environments

FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

On the Role of Emergent Communication for Social Learning in Multi-Agent Reinforcement Learning

Towards True Lossless Sparse Communication in Multi-Agent Systems

Interpretable Learned Emergent Communication for Human-Agent Teams

Improving Kinodynamic Planners for Vehicular Navigation with Learned Goal-Reaching Controllers