In a surprising twist that challenges long-held assumptions in game theory, a new MIT study shows that general-purpose algorithms called policy gradient methods can outperform specialized game-theoretic algorithms in certain imperfect-information games. The findings, presented at the International Conference on Learning Representations in Rio De Janeiro, could reshape how artificial intelligence agents are trained to make decisions in competitive, real-world scenarios.
Imperfect-information games—where players don’t know everything about their opponents—are common in life, from poker and bidding wars to military operations and financial negotiations. For decades, the prevailing belief was that algorithms specifically designed for these games, grounded in game theory, would always outshine general-purpose alternatives. However, the MIT-led team discovered that policy gradient methods, originally developed in the 1990s for single-agent decision-making, often perform better and with greater efficiency.
The researchers created a benchmark to fairly evaluate different algorithms, measuring performance through a concept called exploitability—how well a player does against a worst-case adversary. In experiments involving five games, including Phantom Tic-Tac-Toe, imperfect-information Hex, and Liar’s Dice, neural networks trained with policy gradient algorithms consistently achieved lower exploitability scores than those trained with game-theoretic algorithms.
“Our study showed that policy gradient methods can work better than these specialized algorithms, and that the specialized algorithms may not work as well as people thought,” said Samuel Sokota, a co-author from Carnegie Mellon University. The team’s benchmarking software, which they have made freely available, allows others to test and compare algorithms with just a single line of code added to the OpenSpiel library.
The implications extend far beyond board games. “Hidden information is a very important property of the world,” said Eugene Vinitsky of New York University, another co-author. “It pervades military operations, trading scenarios, and negotiations—all of which are carried out under conditions of hidden information. The idea that we can improve on these games suggests that we can also do better in these other settings as well.”
Ian Gemp, a computer scientist and game theory expert at Google DeepMind not involved in the study, called the results encouraging: “This work serves as a compelling reminder that modernizing classical tools remains a highly productive path for solving complex strategic problems.”


Leave a Reply