Revolutionary AI headphones: Tune into one voice in a crowd

A prototype of the headphone system: binaural microphones attached to off-the-shelf noise canceling headphones. Credit: Kiyomi Taguchi/University of Washington.

Imagine being in a noisy place and only hearing the person you’re looking at, loud and clear.

That’s the incredible new technology developed by a team at the University of Washington.

Their system, called “Target Speech Hearing” (TSH), uses artificial intelligence (AI) to allow headphone wearers to focus on a single person’s voice, even in the middle of a bustling crowd.

Traditional noise-canceling headphones can block out unwanted noise, but they struggle to selectively let certain sounds through.

For example, the latest Apple AirPods Pro can adjust sound levels based on your environment but don’t give you control over whom to listen to. This is where the University of Washington’s innovation stands out.

Here’s how it works: When you want to focus on a specific person, you look at them for three to five seconds while wearing special headphones with built-in microphones. You tap a button, and the system “enrolls” the person’s voice by capturing their sound waves.

The AI software learns the speaker’s vocal patterns, allowing it to isolate their voice and cancel out all other noises. The best part? It continues to focus on that voice even if you move around or look away.

This breakthrough was presented on May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems.

Although it’s not available for purchase yet, the team has made the code for this proof-of-concept device available for other researchers to build on.

“We often think of AI as something in web-based chatbots,” said Shyam Gollakota, a professor at the Paul G. Allen School of Computer Science & Engineering. “But with this project, we are using AI to enhance how we hear. Now, you can clearly hear a single speaker even in a noisy place full of chatter.”

To use the TSH system, you wear regular headphones fitted with microphones. When you look at someone and tap the button, the sound waves from that person’s voice hit the microphones on both sides of the headset at the same time.

This signal is sent to a small computer that uses machine learning to recognize and focus on the speaker’s voice. The more the person talks, the better the system gets at isolating their voice.

In tests with 21 people, users rated the clarity of the focused voice nearly twice as high as regular, unfiltered audio.

Currently, the TSH system can only enroll one speaker at a time, and it works best if there aren’t other loud voices coming from the same direction. If the sound quality isn’t perfect, users can re-enroll the speaker to improve it.

The team is now working to adapt this technology for use in earbuds and hearing aids, which could greatly benefit people in noisy environments.

This remarkable development in AI technology could change the way we listen, making it easier to communicate in noisy places.