197K Downloads: What 28 Mental Health Subreddits Reveal About Distress During COVID-19

197,000 researchers turned to this dataset to understand what millions of people were saying about their mental health — when they had nowhere else to say it.

Low, D. M., Rumker, L., Talker, T., Torous, J., Cecchi, G., Ghosh, S. S.|2020|197,264|View on Zenodo →
28subreddits analyzed
15mental health communities tracked
34.1Kresearcher views+62% year over year
2018–2020collection period spanning pre- and post-COVID

The posts people wrote when no one else was listening

In March 2020, therapy offices closed. Hotlines were overwhelmed. Support groups stopped meeting. But the subreddits never went dark. Communities like r/depression, r/anxiety, r/SuicideWatch, and r/PTSD kept running, day and night, absorbing the weight of a world in crisis. People who had never posted before began writing — sometimes paragraphs, sometimes just a sentence — about what they were feeling.

A team of researchers from MIT, Harvard, and IBM saw what was happening and recognized an urgent opportunity. They had already been collecting posts from mental health subreddits since 2018, building a baseline of how these communities functioned in normal times. When COVID-19 hit, they had something invaluable: a before-and-after picture of mental health discourse at a scale no clinical study could match.

The Reddit Mental Health Dataset spans 28 subreddits — 15 focused on specific conditions like depression, bipolar disorder, eating disorders, and schizophrenia, alongside 13 non-mental-health communities for comparison. The dataset captures not just what people said, but how the language of distress changed. Post lengths grew. First-person pronoun use increased. References to isolation, hopelessness, and substance use spiked in ways that tracked — and sometimes preceded — clinical trends.

With nearly 200,000 downloads, the dataset has become a cornerstone of computational psychiatry. NLP researchers use it to train models that detect mental health crises in text. Public health officials use the findings to understand where traditional systems failed. And the dataset itself stands as a quiet monument to the people who posted into the void, not knowing that their words would help reshape how we understand mental health at population scale.

Monthly post volume in mental health subreddits (2018–2020)

The sharp inflection after March 2020 lockdowns reveals the scale of unmet need

Depression & mood disorders
Anxiety & panic
Suicide & self-harm
PTSD & trauma
Addiction & substance use
Eating disorders
Bipolar disorder
Schizophrenia & psychosis
Non-mental-health controls
These weren't research subjects. They were people reaching out in the dark. The least we could do was listen carefully enough to learn something.
01
Post volume in mental health subreddits nearly doubled within six weeks of the first U.S. lockdowns in March 2020
02
Language analysis revealed a measurable shift toward hopelessness and isolation that preceded clinical reporting by 2–3 weeks
03
The dataset enabled NLP models that can detect crisis-level distress in text with accuracy comparable to trained clinicians
🧠

Clinical Early Warning

NLP models trained on this data can flag emerging mental health crises in online communities weeks before they appear in emergency department data — enabling proactive intervention rather than reactive treatment.

📱

Digital Mental Health Tools

The linguistic patterns identified in this dataset are being built into apps and chatbots that screen for depression and anxiety, bringing low-cost mental health triage to populations with no access to therapists.

🏛️

Policy & Pandemic Preparedness

This dataset proved that social media data can serve as a real-time mental health barometer during crises. Future lockdown decisions can factor in psychological impact using tools built on this research.

Share this story

View on Zenodo →