Policy Gradient Methods Outperform Specialized Game Theory Algorithms in Imperfect-Information Games

Written by

A new MIT-led study challenges long-held assumptions in game theory, demonstrating that general-purpose policy gradient methods can outperform specialized game-theoretic algorithms in imperfect-information, zero-sum games. The research, presented at the International Conference on Learning Representations, provides a benchmark for evaluating algorithms that train neural networks to compete in strategic interactions where players have hidden information.

The team, including MIT PhD student Sobhan Mohammadpour and Assistant Professor Gabriele Farina, found that policy gradient methods—originally developed in the 1990s for decision-making—achieved lower exploitability scores than game-theory-based approaches in games like Phantom Tic-Tac-Toe, imperfect-information Hex, and Liar’s Dice. Exploitability measures how well a player performs against a worst-case adversary; a score of zero indicates perfect play.

“It had been pretty much taken for granted that specialized game-theoretic algorithms were the right approach,” said co-author Samuel Sokota. “Our study showed that policy gradient methods can work better than these specialized algorithms.” The researchers attribute the oversight to a lack of rigorous benchmarking, which they have now addressed by releasing a freely available benchmark tool that runs on ordinary laptops.

The benchmark, built on OpenSpiel, allows researchers to train and compare algorithms on games with up to 30 billion states. Farina emphasized that the term “game” applies broadly to multi-agent strategic interactions, including military operations, trading, and negotiations—all of which involve hidden information. “The idea that we can improve on these games suggests that we can also do better in these other settings,” said co-author Eugene Vinitsky.

Ian Gemp of Google DeepMind praised the work: “This work serves as a compelling reminder that modernizing classical tools like policy gradient methods remains a highly productive path for solving complex strategic problems.”

Policy Gradient Methods Outperform Specialized Game Theory Algorithms in Imperfect-Information Games

Comments

Leave a Reply Cancel reply

More posts

MIT Spinout Ferveret Uses Nuclear-Inspired Cooling to Slash Data Center Energy and Water Use

Four MIT Affiliates Awarded 2026 Hertz Foundation Fellowships for Groundbreaking Research

MIT Study Reveals Why Trio Comparisons Outperform Pairwise for Predicting Preferences

Jinhua Zhao Appointed Lead of MIT’s Urban Studies and Planning Department