A new paper from MIT researchers provides a major upgrade to the nearly century-old idea of random utility models (RUMs), showing that asking people to rank three options instead of just two can reveal hidden correlations that dramatically improve preference predictions.
In 1927, psychologist L. L. Thurstone laid the foundation for random utility models, which assume that when people choose among alternatives, they select the one with the highest subjective value, even if they cannot assign a specific number to that choice. These models are inherently random because preferences vary across individuals and even within the same person over time.
RUMs are widely used by governments and companies to predict behavior in counterfactual scenarios, such as how commuters would react to a road closure or how to allocate a budget to maximize public good. Despite nearly a century of refinement, the standard approach relies heavily on pairwise comparisons (e.g., “Do you prefer A or B?”) because people find it easier to compare two items than to assign a numerical rating.
However, the MIT team — Yeshwanth Cherapanamjeri, Gabriele Farina, Constantinos Daskalakis, and Sobhan Mohammadpour — proved that pairwise comparisons alone cannot capture correlations between preferences. For example, someone who favors gun control is likely also to support government-funded child care, or a fan of independent movies may also enjoy foreign films but dislike blockbusters. Ignoring these correlations leads to inaccurate models.
The key breakthrough, presented at the International Conference on Learning Representations in Rio de Janeiro, is that correlations become detectable when large numbers of people rank three alternatives in order of preference. The same information can also be obtained from a combination of best-of-three and best-of-two choices. The researchers developed an efficient algorithm to merge individual triplet rankings into a single model that captures the full picture.
“This paper provides a crucial breakthrough,” says Emma Frejinger, a computer scientist at the University of Montreal. “It mathematically proves why traditional data collection fails and demonstrates that simply asking users for their best-of-three choices unlocks the ability to accurately train these powerful models.”
The work has direct implications for AI alignment. Large language models (LLMs) are often trained by having humans rank candidate outputs — a process that can be made far more effective by using triplet comparisons. As Daskalakis notes, “RUMs play a central role in the commercial viability and usefulness of LLMs.”
The team’s findings also show that the number of experiments needed does not grow exponentially with the number of items in a catalog, making the approach practical for real-world applications like streaming services, e-commerce, and political polling.
“This finding provides a highly practical roadmap for collecting better data to drive more accurate optimizations,” adds Frejinger.
Looking ahead, the MIT researchers believe that building and refining utility models will remain a vibrant area of research, critical to aligning AI systems with human preferences and to sustaining the internet economy.

