New AI technology from UW researchers lets noise-canceling headphone users choose which sounds they hear

A man wearing a surgical mask and headphones walks through the University of Washington campus while holding a smartphone. People walk behind him. — A team led by researchers at the University of Washington has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. In this provided photo, co-author Malek Itani is seen demonstrating the system.
*University of Washington*

We typically wear noise-canceling headphones to drown out unpleasant sounds, such as cars honking or construction machines drilling. But what if you still wanted to hear someone knocking on your door or birds chirping on your walk? New artificial intelligence technology from the University of Washington could soon make that possible. Researchers developed an algorithm that allows users to pick which sounds can filter through their headphones in real time.

Shyam Gollakota is a professor of computer science and engineering at UW. He joins us with more details on the new technology and the ethical implications of choosing your own audio environment.

The following transcript was created by a computer and edited by a volunteer:

Dave Miller: This is Think Out Loud on OPB. I’m Dave Miller. People typically wear noise-canceling headphones to drown out unpleasant sounds, like the drone of an airplane or construction noise. But what if you still wanted to hear someone knocking on your door, or birds chirping as you go for a walk? Shyam Gollakota is a professor of Computer Science and Engineering at the University of Washington. He is using artificial intelligence to help users to pick which sounds they want to filter in or out of their headphones in real time. He joins us now. Welcome back to the show.

Shyam Gollakota: Thank you, Dave.

Miller: So why let some sounds in while blocking others? What was the basic idea that propelled this work?

Gollakota: Sound is such a fundamental medium through which we perceive our environment, but today, we are surrounded by a cacophony of sounds that can end up overwhelming our senses. So we started this project to explore if we can get back some of this choice in terms of what sounds we hear in real world environments. For example, say you’re in a park and you want to admire the sounds of the chirping birds and relax, but then you have a loud chatter of a nearby group of people who just can’t stop talking. Now, imagine if your headphones could grant you the ability to focus on the sounds of the birds while the rest of the noise just goes away. That’s exactly what we set out to achieve in this project.

Miller: Can you remind us just the basics here, of how existing noise cancellation technology works?

Gollakota: Yeah, so today’s noise cancellation technology cancels out all the sounds. It does not have any intelligence or understanding of what sounds like what. To achieve what I just talked about, we need high-level intelligence to identify all the sounds in an environment, just like a human does, then we need the ability to separate the target sounds from the interfering sounds. If this is not hard enough, we need to make sure that the sound we extract is synced with the user’s visual senses, so that we don’t hear the sound after a couple of seconds of it happening. So whatever algorithms we are running should process the sound in real time in less than one hundredth of a second, which is pretty challenging to do.

Miller: You just brought up a couple of different things. Let’s take them one by one, starting with how you got the system to identify what different sounds are what. How does it know what a jackhammer is? What a tweeting bird is? What a crying baby is? What a truck is? And on and on.

Gollakota: We designed something called a neural network. These are networks which imitate how a brain works, and we train the neural network with lots and lots of examples of different kinds of sounds. For example, bird chirping, jackhammer. After a lot of training, these neural networks were able to learn and identify what each of the sound sounds like.

Miller: How good is the system at distinguishing all those different sounds? And what proved to be the most challenging things for it to learn?

Gollakota: We use the existing noise canceling headsets to suppress all the sounds, and the neural networks to introduce the sounds back. The neural networks are pretty good at extracting the sounds of interest and, in real time, playing it back into the ear, but in terms of how well it works depends on how well your noise cancellation headset really is.

There are some sounds which are actually pretty challenging. For example, something like music and human speech, they can share pretty pretty similar characteristics, like vocal sounds and harmonics, so it’s difficult for our system to perform tasks such as separating the speech of a person in the presence of background music that also has vocals. Similarly, it’s challenging to separate music from other classes like alarm clocks or a bird chirping, because they all have similar characteristics.

Miller: The worst case scenario there would be is you get rid of something that you wanted to let in, or the opposite?

Gollakota: I think it’s more the opposite, where you let in something which you wanted to get rid of.

Miller: Is there something that you would be most excited to use these headphones for?

Gollakota: I go hiking a lot. I live in Seattle, so one of the things I’ve been noticing these days is that people talk a lot on the hike, which is good, but they also start playing music out loud. I would love to be able to hike by listening to just the sounds of nature while I can block out the sounds which I don’t want to hear.

Miller: I totally understand what you’re saying. I agree that it seems like there has been a real huge uptick in people with bluetooth speakers just hanging on a carabiner on their backpacks as they walk in pretty places, forcing everybody to be exposed to their music choices, which is annoying, Igrant you that. But I guess it also makes me wonder if, at what point, technological solutions to technological or human problems, what is the end point there? Do you actually envision walking around in the woods, truly, with noise-canceling headphones on so that if people around you are playing music, you won’t hear it? Do you honestly imagine doing that?

Gollakota: You know, we are right now surrounded by very noisy environments. Living in cities, we have way too many noises, and this can really affect people’s mental health and well being. Having some control over what you can listen to can be pretty helpful. In fact, people these days use noise-canceling headsets a lot. You see people walking around with noise-canceling headsets because it can be pretty distracting and overwhelming, and what we’re doing right now is giving people an option to opt-in sounds so that they can hear some classes of sound, so they have more control in terms of what they want to hear.

Miller: Do you imagine the way users would use this eventually is that the default would be to block everything out except for these few things that I’m saying I want in, or that the default would be to block these particular things but let everything else in? I guess I’m wondering what the default option would be, in terms of most users’ experiences as you imagine it.

Gollakota: That’s a good question. I suspect it’s going to be the latter, which is block certain kinds of sounds. Like, for example, early in the morning, I don’t want the garbage collector sound waking me up.

Miller: Or leaf blowers.

Gollakota: Exactly. These are the kind of sounds which you would typically want to block out. I think that’s probably going to be the more common use case for this kind of technology.

Miller: You mentioned the question of time lag, and you can’t have that if people are going to be safely navigating with this thing, or say, communicating with others. What is the lag that you’ve engineered so far?

Gollakota: We were able to achieve around one-hundredth of a second of lag, which is really small, but what’s really interesting here is that when people typically talk about neural networks and artificial intelligence these days, they’re familiar with large language models like ChatGPT, which require huge data centers that really are not possible in our application. We can’t send the data to a cloud and process it, so we need to run the whole thing on the smartphone itself, and extract the sounds in one hundredth of a second. What we are showing in this work is that it’s indeed possible to achieve intelligence that can run on a device on a limited compute platform, and intuition is that unlike language, the ability to distinguish between sounds is something you see even in small animals like insects. What we are showing here is that we don’t really need very large neural models to be able to achieve this task, and we can do everything on the device itself.

Miller: There are tons of examples of all of us, even without technology, figuring out ways to craft our own experience of the world: where we live, who we spend time with, what we read, what we watch, and on and on. This just seems like it’s the next step in that evolution. But I’m just curious in the minute we have left, how do you think about the ethical implications of everybody literally choosing their own soundscape?

Gollakota: That’s a pretty good question. In fact, we do choose what media outlets we listen to, what radio programs we listen to, so that’s a choice. I do think that sound is different, because sound is such a fundamental way we perceive our senses, and as with everything else, I think it’s important for people to have the ability to have some choice in terms of what they are perceiving in their senses.

Miller: Shyam Gollakota, thanks very much.

Gollakota: Thank you.

Miller: Shyam Gollakota is a professor of computer science and engineering at the University of Washington, where he is working on a new version of noise cancellation where we can let in the sounds that we want and keep out the sounds that we don’t.

Contact “Think Out Loud®”

If you’d like to comment on any of the topics in this show or suggest a topic of your own, please get in touch with us on Facebook, send an email to thinkoutloud@opb.org, or you can leave a voicemail for us at 503-293-1983. The call-in phone number during the noon hour is 888-665-5865.

Think Out Loud

New AI technology from UW researchers lets noise-canceling headphone users choose which sounds they hear

Broadcast: Tuesday, Nov. 21

Contact “Think Out Loud®”

OPB’s First Look newsletter