Scientists develop new method for robots to map scenes and complete tasks intuitively

MIT's Clio runs in real-time to map task-relevant objects in a robot's surroundings, allowing the bot (Boston Dynamic's quadruped robot Spot, pictured) carry out a natural language task ("pick up orange backpack"). Credit: MIT.

Imagine walking into a messy kitchen with the goal of cleaning the counter.

You could quickly sweep up everything or carefully sort items by type, depending on your task.

Now, robots can do something similar thanks to a new method developed by engineers at MIT. This method, called Clio, allows robots to analyze their surroundings and focus on the objects relevant to their tasks.

The Clio system enables robots to process natural-language instructions, like “move the stack of books” or “find the green book,” and then determine which parts of a scene are important for completing those tasks.

By using advanced technology, Clio helps robots efficiently identify and remember only what they need to finish the job.

MIT researchers tested Clio in various environments, from cluttered offices to an entire five-story building on campus.

In one test, a quadruped robot was tasked with navigating an office building and retrieving specific items, like a dog toy. Clio allowed the robot to recognize the important objects and map out the scene while ignoring unnecessary items.

Clio’s name is inspired by the Greek muse of history, reflecting its ability to remember only what’s needed for a task.

The system has many potential uses, including search and rescue operations, domestic robots, and robots working on factory floors.

“It’s about helping robots understand their environment and what to remember to carry out a mission,” says Luca Carlone, an MIT professor and lead researcher.

The results of this project were published in IEEE Robotics and Automation Letters by Carlone and his team. The project involved collaboration between MIT’s Laboratory for Information and Decision Systems (LIDS) and the MIT SPARK Laboratory, with contributions from researchers at MIT Lincoln Laboratory.

Traditionally, robots could only recognize objects in controlled environments where they had been programmed to work with a limited number of known objects. But recent advances in computer vision and natural language processing now allow robots to identify objects in more realistic, “open-set” environments, where they encounter new, unfamiliar items.

Clio improves on this by adjusting how a robot interprets a scene based on the task at hand. For example, if the robot’s task is to move a stack of books, Clio recognizes the stack as a single object. However, if the task is to move only the green book, Clio will focus on that specific book while ignoring the others.

Clio uses a combination of neural networks and mapping tools to break an image into small segments. These segments are then processed by a neural network to determine if they are semantically related to the task. The system also incorporates an idea from information theory called the “information bottleneck,” which helps the robot focus only on relevant information while disregarding the rest.

In real-world tests, Clio proved its effectiveness. For example, in one experiment, the system was used in a cluttered apartment. Clio successfully segmented scenes and identified task-relevant objects like a pile of clothes. The team also ran Clio on Boston Dynamics’ robot, Spot, which was tasked with exploring an office building. As Spot moved through the building, Clio identified and mapped the objects necessary to complete tasks in real time.

Running Clio in real time was a significant achievement, as similar methods can take hours to process information.

Looking ahead, the team aims to develop Clio for more complex tasks, such as search and rescue missions where robots might be instructed to find survivors or restore power.

This advancement could pave the way for robots that are more intelligent and capable of handling real-world challenges with greater precision.