Agents in Games
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
A reinforcement learning framework for training VLMs on extended game-playing tasks requiring 100+ sequential decisions, achieving 3x game progress over frontier models.
Abstract
This paper investigates reinforcement learning methods for training vision-language models on extended decision-making tasks requiring over 100 sequential actions. The authors present an adapted variant of PPO with a lightweight turn-level critic, which enhances training stability compared to critic-free approaches. They demonstrate that pretrained VLMs provide strong priors improving sample efficiency. The framework achieves at least 3 times average game progress compared to frontier models while maintaining general capabilities and showing improvement under both in-game and cross-game settings.
Why this paper matters
- Addresses the underexplored regime of 100+ turn decision-making, where short-horizon RL methods break down.
- A turn-level critic provides a practical solution for training stability in long-horizon game environments.
- Demonstrates cross-game generalization, suggesting the learned capabilities transfer beyond specific training environments.
Keywords
Vision-language models, reinforcement learning, long-horizon decision making, game-playing agents, embodied agents, PPO, VLM training.
BibTeX
@article{shi2026odysseus,
title={Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning},
author={Shi, Chengshuai and Li, Wenzhe and Liang, Xinran and Lu, Yizhou and Yang, Wenjia and Feng, Ruirong and Karten, Seth and Yang, Ziran and Ding, Zihan and Sarch, Gabriel and Chen, Danqi and Narasimhan, Karthik and Jin, Chi},
journal={arXiv preprint arXiv:2605.00347},
year={2026}
}