Policy Gradient Methods Outperform Specialized Game Theory Algorithms in Imperfect-Information Games

A new MIT-led study challenges long-held assumptions in game theory, demonstrating that general-purpose policy gradient methods can outperform specialized game-theoretic algorithms in imperfect-information, zero-sum games. The research, presented at the International Conference on Learning Representations, provides a benchmark for evaluating algorithms that train neural networks to compete in strategic interactions where players have hidden information.

The team, including MIT PhD student Sobhan Mohammadpour and Assistant Professor Gabriele Farina, found that policy gradient methods—originally developed in the 1990s for decision-making—achieved lower exploitability scores than game-theory-based approaches in games like Phantom Tic-Tac-Toe, imperfect-information Hex, and Liar’s Dice. Exploitability measures how well a player performs against a worst-case adversary; a score of zero indicates perfect play.

“It had been pretty much taken for granted that specialized game-theoretic algorithms were the right approach,” said co-author Samuel Sokota. “Our study showed that policy gradient methods can work better than these specialized algorithms.” The researchers attribute the oversight to a lack of rigorous benchmarking, which they have now addressed by releasing a freely available benchmark tool that runs on ordinary laptops.

The benchmark, built on OpenSpiel, allows researchers to train and compare algorithms on games with up to 30 billion states. Farina emphasized that the term “game” applies broadly to multi-agent strategic interactions, including military operations, trading, and negotiations—all of which involve hidden information. “The idea that we can improve on these games suggests that we can also do better in these other settings,” said co-author Eugene Vinitsky.

Ian Gemp of Google DeepMind praised the work: “This work serves as a compelling reminder that modernizing classical tools like policy gradient methods remains a highly productive path for solving complex strategic problems.”

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *