MIT Develops Long-Term Memory System That Lets Robots Answer Where You Left Your Keys

Imagine asking a robot, “Where did I leave my keys?” and getting an accurate, real-time answer. MIT researchers have created a new spatial memory framework called DAAAM (Describe Anything, Anywhere, Anytime, at Any Moment) that gives robots the ability to form and recall detailed mental models of large-scale environments. This breakthrough could transform how robots assist humans in factories, homes, and beyond.

DAAAM combines advanced map representations with rich, language-based descriptions of objects a robot encounters as it explores. The system runs fast enough for mobile robots to use in real-time, answering complex queries in plain English with 21% to 53% higher accuracy than existing methods.

“If we want robots to work side-by-side with humans and interact better with humans, they must speak the same language,” says Luca Carlone, associate professor in MIT’s Department of Aeronautics and Astronautics and lead researcher on the project. “The robot must be able to reason about time and space the same way humans do.”

The framework bridges computer vision and robotic mapping. As a robot moves through an environment, DAAAM attaches detailed descriptions to objects—like noting that a red bicycle with a flat tire is in the bike rack outside the Stata Center. It stores this information in a 3D map-based representation arranged spatially, grouping objects into regions for efficient retrieval.

To overcome the speed limitations of existing annotation techniques, DAAAM aggregates nearby objects and uses an optimization method to select key frames—images with the clearest view of multiple objects—allowing the system to describe several items in parallel. This speeds up computation tenfold, making real-time performance possible.

“We annotate every object only once, so our framework can run in very large-scale environments in real time,” explains lead author Nicolas Gorlo, an MIT graduate student. “And by clustering objects into regions, it can answer a wide range of queries about objects and locations.”

The researchers used a large language model (LLM) that calls on various tools to retrieve specific information quickly, reducing hallucinations. For example, if asked about a sculpture near an MIT campus building, DAAAM can use a semantic search tool to retrieve information based on the word “sculpture” or a location-based tool to find the building.

Future work aims to expand DAAAM to capture significant events and incorporate confidence levels into responses. “Ultimately, we want to have robots that can help with any sort of tasks,” Gorlo says. “With this framework, we are trying to create the foundations to enable a generalist agent that can do anything you ask.”

The research was presented at the Conference on Computer Vision and Pattern Recognition (CVPR) and funded by the U.S. Army Research Laboratory and the Office of Naval Research.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *