Researchers wired New York with acoustic sensors and asked citizen scientists to label every jackhammer, siren, and barking dog. The result is the largest urban noise dataset ever assembled.
New York City receives over 300,000 noise complaints a year through its 311 hotline — more than any other category of grievance. But complaints are blunt instruments. They tell you someone is angry; they do not tell you what is actually happening in the acoustic environment. The SONYC project changed that. Beginning in 2016, researchers at New York University deployed a network of low-cost acoustic sensors across Manhattan, Brooklyn, and Queens, recording the city's soundscape in continuous 10-second clips. The question was deceptively ambitious: could you build a machine that listens to a city the way a resident does?
The answer required human ears first. Over 18,000 recordings were uploaded to Zooniverse, where citizen scientists tagged each clip with fine-grained labels: engine idling, jackhammer, music from a bar, dog bark, ice cream truck, air conditioner hum. The taxonomy grew to encompass eight coarse categories and dozens of fine-grained sound sources, capturing the layered reality of urban noise — where a construction site, a passing ambulance, and a street musician can occupy the same 10-second window. A subset was then verified by trained acoustic experts, creating a gold-standard annotation layer for training machine learning models.
With 62,000 downloads, SONYC-UST has become the benchmark dataset for urban sound classification research. But its impact extends beyond academia. New York's Department of Environmental Protection has used SONYC data to target noise enforcement, and the project's sensor-to-classification pipeline is being adapted by cities from Barcelona to Singapore. The recordings are a time capsule of a specific urban moment — and simultaneously, a template for how any city might learn to hear itself more clearly.
Percentage of recordings containing each dominant sound category across 4-hour time blocks
Noise complaints tell you someone is angry. Acoustic sensors tell you what is actually happening.
dataset · 2019 · CC BY 4.0
Continuous acoustic monitoring provides evidence-based inputs for noise zoning, construction permitting, and nightlife regulations — replacing anecdotal complaints with measured data.
SONYC-UST's multilabel annotations and expert-verified subset have established the benchmark for urban audio classification, accelerating progress in environmental sound recognition.
Chronic noise exposure is linked to cardiovascular disease, sleep disruption, and cognitive impairment. This dataset enables the spatial mapping of noise burden in communities with the least political power to complain.
Share this story