Listening to Noise and Nature With Smart Ears
Filing a noise complaint is a bit of a gamble. By the time an inspector arrives, the stream of trucks thundering by the night before may be long gone or the construction tools bedeviling the dinner hour turned off. In a dense soundscape, even pinpointing the worst offender can be a challenge. Was it a jackhammer or a tamping machine making that repetitive racket?
Where logistics, human perceptual capabilities or simple manpower may fall short, however, smart acoustic sensors are being trained to succeed. Endowed with machine listening, the auditory sibling to computer vision, these cutting-edge computing devices can distinguish individual sounds, record how often they occur and measure how loud each one is in order to provide the evidence needed to help enforce the city’s noise code.
“What we’re doing is enabling machines to listen, extracting information from audio at a scale impossible for humans to hear and reporting it in real time. If you’ve deployed 60 sensors, for example, you can’t have someone listening to them 24 hours a day,” explains Mark Cartwright, an informatics professor and one of the lead machine listening researchers for an NYU-based project called Sounds of New York City (SONYC), which uses a smart acoustic sensor network to monitor, analyze and mitigate urban noise pollution.
Backed by the National Science Foundation, the team is building tools to measure the impact of different types of sounds on urban neighborhoods, so that city agencies can respond more effectively. Cartwright, a former research professor at NYU and a continuing collaborator, trained models to detect the presence of different sources, such as jackhammers, trucks and honking, developed methods to estimate their loudness and came up with the protocols for labeling data, while launching a campaign powered by citizen scientists to help do it. He worked on yet another feature that will be key to the sensors’ success: compressing machine listening models to run on low-resources equipment, such as solar-powered devices or single-board machines.
Dashboards display the output of the SONYC sensors, which were placed in Manhattan, Brooklyn and Queens, enabling city noise inspectors to see when and where sounds are occurring. The team is currently testing a more need-driven approach, in which a new generation of sensors are sent for short-term deployment to locations to measure and document the patterns and impact of specific nuisance sounds.
“Let’s say a warehouse moves into a community and the amount of trucking increases, much of it directed down a particular street. We’d want to quantify how disruptive it is and let regulators decide if the trucks should be routed differently,” he says. “The aim is to provide evidence that can be used to push for accountability and changes in policy or plans. Should quieter electric jackhammers be used at a site, for example? Should backup beeper regulations change? Should emergency vehicles use different types of sirens?”
In another collaboration with NYU, called Spatial Sound Scene Description, he is working on new capabilities, including models to count particular sounds, such as vehicle noises, and to localize and track them in space, identifying their direction and distance from the sensor. Putting these sensors in place for a long time would allow urban ecologists, for example, to better understand migration patterns.
“There are important real-world tasks where the count and location of sources are important, especially when visibility is limited or sources are occluded. In addition to wildlife monitoring, these include perception for autonomous agents, such as self-driving cars; machine condition monitoring, such as sensing when and where factory equipment may fail; and sound awareness sensing for people who are deaf or hard of hearing,” Cartwright says.
He adds, “Our ability to localize specific sound sources is still very rudimentary. We’re also still having to teach the machine each different class we want it to detect or separate. In contrast, when we humans encounter a new sound, we may not know what it is, but we can still recognize it as a distinct sound. Machines can’t do that well yet.”
In his Sound Interaction and Computing Lab, Cartwright also develops tools for sound design and music production that align interfaces with users’ goals and abilities, enabling novices to use complex audio tools for creative expression that typically require significant knowledge and experience to employ effectively.
Software synthesizers, for example, have scores of parameters. Like a pilot in training stepping into the cockpit of an airplane, it would take a long time to learn what all the gadgets are and how to use them. His tools would allow them to communicate their sound choices in descriptive language, the way they might describe them to a friend, or by making the sound with their own voice.
“In audio mixing, as with other problems, there is not one right solution but many possible ones that express a variety of artistic goals. People should not simply accept what’s provided by the software,” Cartwright says, adding, “In all my work, I strive to amplify human abilities rather than automate them away.”