Researchers have developed a new technique that helps artificial intelligence (AI) programs create better maps of three-dimensional (3D) spaces using two-dimensional (2D) images from multiple cameras.
This advancement could significantly improve the navigation of autonomous vehicles, especially since it works well even with limited computing power.
“Most self-driving cars use powerful AI programs called vision transformers.
These programs take 2D images from several cameras and build a 3D representation of the car’s surroundings,” explains Tianfu Wu, an associate professor of electrical and computer engineering at North Carolina State University and the lead author of the study.
“Although these AI programs use different methods, there is still much room for improvement.”
Wu and his team have developed a technique called Multi-View Attentive Contextualization (MvACon).
This technique can be added to existing vision transformer AIs to enhance their ability to map 3D spaces without requiring extra data from the cameras.
MvACon is based on an earlier approach called Patch-to-Cluster attention (PaCa), which Wu and his colleagues introduced last year. PaCa helps transformer AIs identify objects in images more efficiently.
“The main advancement here is applying what we learned from PaCa to the challenge of mapping 3D space using multiple cameras,” says Wu.
To test MvACon, the researchers used it with three top vision transformers—BEVFormer, the BEVFormer DFA3D variant, and PETR. These transformers collected 2D images from six different cameras.
In all cases, MvACon significantly improved the performance of the vision transformers.
“Performance improved especially in locating objects and determining their speed and direction,” says Wu. “Adding MvACon to the vision transformers also had a minimal impact on computing power.”
The next steps for Wu and his team include testing MvACon with additional benchmark datasets and using it with real video input from autonomous vehicles. “If MvACon continues to outperform the existing vision transformers, we are hopeful that it will be widely adopted,” Wu says.
The research paper, “Multi-View Attentive Contextualization for Multi-View 3D Object Detection,” will be presented on June 20 at the IEEE/CVF Conference on Computer Vision and Pattern Recognition in Seattle, Washington.
This new technique represents a significant step forward in making autonomous vehicles safer and more reliable by enabling them to better understand and navigate their surroundings.