Two LLMs Team Up to Help Robots Interpret Vague Instructions and Prioritize What Matters

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a new method that uses two large language models (LLMs) to help robots understand ambiguous instructions and focus on key details. The approach, called Masked Inverse Reinforcement Learning (Masked IRL), reduces the amount of demonstration data needed to teach a robot by nearly five times, while improving the robot’s ability to infer unspoken user preferences.

Traditional robot training often requires either extensive physical demonstrations or detailed written instructions. Masked IRL automates the process: first, one LLM clarifies ambiguous prompts (e.g., turning “stay close” into “stay close to the surface of the table”) by comparing a user’s demonstration trajectory to the shortest possible path. Then a second LLM evaluates the environment and “masks” irrelevant details – such as a person leaning on a table – while highlighting critical ones like obstacles to avoid. The robot then uses these prioritized details to generate a safe motion plan.

In experiments, the system correctly identified unstated user preferences up to 15 percent more often than comparable baselines. Real-world tests showed a robotic arm successfully moving a coffee mug around a laptop, wiping a table while “staying close” to it, and handing a user a bag of chips while “staying away” from both the person and the table – all after fewer than 50 kinesthetic demonstrations.

The team plans to enhance Masked IRL with camera input, allowing robots to visually focus on relevant objects in dynamic environments. The work was supported by the Tata Group via the MIT Generative AI Impact Consortium Award and the Department of Defense, and will be presented at the 2026 IEEE International Conference on Robotics and Automation.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *