78%
prediction accuracy from lifestyle data alone
matching clinical screening tools without any clinical input

Sleep, Screen Time, and Sentiment: A 100,000-Row Dataset Linking Lifestyle to Mental Health

A dataset of thousands of survey responses reveals the lifestyle patterns that predict depression, anxiety, and burnout before symptoms surface.

Sharma, Priya, Gupta, Ankit · 2024DOI: 10.5281/zenodo.14838661CC BY 4.0View on Zenodo →

The patterns hiding in everyday choices

Every night, millions of people make a decision that will shape their mental health the following week: what time to go to sleep. This dataset quantifies that relationship with uncomfortable precision. Among respondents sleeping fewer than five hours nightly, 68% reported moderate to severe anxiety symptoms. Among those sleeping seven to eight hours, the figure dropped to 19%. The correlation held even after controlling for age, income, and pre-existing conditions. Sleep, it turns out, is not just a lifestyle choice. It is a psychiatric variable.

The dataset goes far beyond sleep. Researchers collected structured responses across six lifestyle domains — sleep, exercise, nutrition, screen time, social connection, and work hours — alongside validated mental health scales and free-text sentiment responses. This dual structure makes the dataset unusually versatile: clinicians use the structured data for predictive modeling, while NLP researchers use the text responses to train sentiment classifiers that can detect distress signals in natural language.

What makes the findings genuinely unsettling is their predictive power. Machine learning models trained on the lifestyle variables alone — no clinical interviews, no diagnostic criteria, no biomarkers — achieved 78% accuracy in classifying respondents into mental health risk categories. The implication is stark: the data exhaust of ordinary life contains enough signal to approximate a clinical screening. The 7,000 researchers who downloaded this dataset are now grappling with what that means for privacy, for healthcare access, and for the boundary between lifestyle tracking and mental health surveillance.

Mild Anxiety
Moderate Anxiety
Severe Anxiety
Depression Indicators
Burnout Symptoms
No Significant Symptoms

Mental health risk by sleep duration

Percentage of respondents reporting moderate-severe symptoms by nightly sleep hours

Lifestyle factor predictive weight in mental health models

Relative importance of each lifestyle domain in the best-performing classification model

The data exhaust of ordinary life contains enough signal to approximate a clinical screening. Seven thousand researchers are grappling with what that means.
01
Sleep duration and social connection together account for 50% of predictive power in the best mental health classification model
02
Respondents with >6 hours daily screen time showed 2.3x higher rates of depressive symptoms, independent of other factors
03
Free-text sentiment analysis detected distress signals that preceded self-reported symptoms in 34% of follow-up cases
🧠

Clinical Screening

The dataset's predictive accuracy from lifestyle data alone suggests that wearable devices and smartphone sensors could serve as passive mental health screening tools, catching early warning signs before individuals seek clinical help.

⚖️

Privacy and Ethics

If lifestyle data can predict mental health status with 78% accuracy, employers, insurers, and platforms with access to behavioral data may possess de facto diagnostic capabilities. The dataset has fueled urgent conversations about mental health data governance.

🏥

Preventive Intervention

Public health programs focused on sleep hygiene and social connection could address the two most predictive lifestyle factors. The data suggests that lifestyle interventions may be as impactful as expanding access to clinical mental health services.

Share this story

View on Zenodo →