
Imagine walking through a bustling museum in a foreign country and understanding everything the tour guide and people around you are saying—even if you don’t speak their language.
That’s the promise of a new invention created by researchers from the University of Washington (UW).
Led by doctoral student Tuochao Chen, the team has developed an innovative headphone system called Spatial Speech Translation.
This device can translate multiple people speaking at the same time, while keeping the sound of their voices and where they are standing intact.
Chen got the idea during a visit to a museum in Mexico, where he struggled to understand the tour guide despite using a translation app.
The app picked up too much background noise, making the translation useless.
Existing technologies like Meta’s translation glasses only work with one speaker at a time. Chen and his team wanted something that worked in real-life environments with many people talking.
Spatial Speech Translation solves this problem by using noise-canceling headphones with built-in microphones.
When turned on, the system instantly detects how many people are speaking and where they are located. Lead author Chen describes it as working “like radar,” scanning the space in all directions and constantly updating the number and location of speakers.
The system translates each person’s speech separately and maintains the unique sound of their voice, including its volume and direction.
This allows listeners to know who is speaking, even if they are moving around. Impressively, it does all of this without cloud computing, protecting users’ privacy by keeping everything local to the device.
The prototype currently works on Apple M2 chip devices like laptops and the Apple Vision Pro.
The team tested their device in ten different indoor and outdoor locations.
In trials with 29 participants, users preferred Spatial Speech Translation over traditional translation technologies because it preserved the sound and position of each speaker’s voice. Users found a 3–4 second delay ideal, as shorter delays led to more translation errors.
The researchers are now working to speed up the translation process. So far, the system supports Spanish, German, and French, but the team believes it can be expanded to over 100 languages.
Chen is optimistic about the future, envisioning a world where language barriers are no longer obstacles. “If I’m walking down the street in Mexico, even though I don’t speak Spanish, I can translate all the people’s voices and know who said what,” he said.
The researchers presented their work at the ACM CHI Conference on Human Factors in Computing Systems in Yokohama, Japan, making their code available for others to build upon.
With this breakthrough, understanding different languages in real-time might soon be as simple as putting on a pair of headphones.
Source: University of Washington.