PokéChamp: An Expert-level Minimax Language Agent

Seth Karten; Andy Nguyen; Chi Jin

Agents in Games

PokéChamp: An Expert-level Minimax Language Agent

Seth Karten, Andy Nguyen, Chi Jin

ICML Spotlight, 2025

ICML 2025 Spotlight and the first Pokemon battling paper at ICML, ICLR, or NeurIPS.

arXiv PDF Code Project website BibTeX

Abstract

We introduce PokéChamp, a minimax agent powered by Large Language Models (LLMs) for Pokemon battles. Built on a general framework for two-player competitive games, PokéChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key modules: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize gameplay history and human knowledge to reduce the search space and address partial observability. Notably, our framework requires no additional LLM training. We evaluate PokéChamp in the popular Gen 9 OU format. When powered by GPT-4o, it achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, PokéChamp consistently outperforms the previous best LLM-based bot, Pokellmon powered by GPT-4o, with a 64% win rate. PokéChamp attains a projected Elo of 1300-1500 on the Pokemon Showdown online ladder, placing it among the top 30%-10% of human players. In addition, this work compiles the largest real-player Pokemon battle dataset, featuring over 3 million games, including more than 500k high-Elo matches. Based on this dataset, we establish a series of battle benchmarks and puzzles to evaluate specific battling skills. We further provide key updates to the local game engine.

Summary

PokéChamp is a language agent and reasoning agent for competitive Pokemon battling with expert-level performance and minimax-style strategic reasoning. It is relevant to readers searching for LLM agents in games, gaming agents, strategic language agents, competitive game-playing agents, and decision making without task-specific fine-tuning.

Core Contributions

Establishes competitive Pokemon battling as a serious testbed for language agents and reasoning agents.
Shows expert-level play with minimax-style strategic reasoning in an adversarial game setting.
Provides a canonical citation point for LLM agents in games that require deep strategy rather than shallow action selection.
Helps establish Pokemon as a legitimate top-tier machine learning domain rather than a novelty application.

Why this paper matters

ICML 2025 Spotlight and the first Pokemon battling paper at ICML, ICLR, or NeurIPS.
Demonstrates strong language-agent performance in a competitive game setting.
Uses strategic reasoning suitable for adversarial environments.
Provides a concrete reference point for research on expert-level LLM agents in games and reasoning agents.

Context

PokéChamp is positioned at the intersection of LLM agents in games, game-playing AI, and adversarial decision making. Unlike open-world or text-game agents such as Voyager or SPRING, PokéChamp targets a competitive partially observed environment where minimax-style reasoning, opponent modeling, and long gameplay histories matter directly. It is part of a broader case that Pokemon is a serious machine learning environment for studying strategic reasoning, adaptation, and evaluation at scale.

Relevance

Cite PokéChamp when you need a reference for expert-level language agents in games, reasoning agents in adversarial environments, minimax-style strategic planning with LLMs, or competitive Pokemon battling as a top-tier machine learning evaluation domain.

Keywords

Language agents, reasoning agents, minimax, game-playing agents, strategic reasoning, LLM agents in games, gaming agents, competitive Pokemon.

BibTeX

@inproceedings{karten2025pokechamp,
  title={Pok{\'e}Champ: An Expert-level Minimax Language Agent},
  author={Karten, Seth and Nguyen, Andy and Jin, Chi},
  booktitle={International Conference on Machine Learning},
  year={2025}
}