Sleep Quality Prediction

AIM

To predict sleep quality based on lifestyle and health factors.


DATASET LINK

Sleep Health and Lifestyle Dataset


DESCRIPTION

What is the requirement of the project?

  • This project aims to predict the quality of sleep using various health and lifestyle metrics. Predicting sleep quality helps individuals and healthcare professionals address potential sleep-related health issues early.

Why is it necessary?

  • Sleep quality significantly impacts physical and mental health. Early predictions can prevent chronic conditions linked to poor sleep, such as obesity, heart disease, and cognitive impairment.

How is it beneficial and used?

  • Individuals: Assess their sleep health and make lifestyle changes to improve sleep quality.
  • Healthcare Professionals: Use the model as an auxiliary diagnostic tool to recommend personalized interventions.

How did you start approaching this project? (Initial thoughts and planning)

  • Researching sleep health factors and existing literature.
  • Exploring and analyzing the dataset to understand feature distributions.
  • Preprocessing data for effective feature representation.
  • Iterating over machine learning models to find the optimal balance between accuracy and interpretability.

Mention any additional resources used


LIBRARIES USED

  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn
  • joblib
  • flask

EXPLANATION

DETAILS OF THE DIFFERENT FEATURES

Feature Name Description Type Values/Range
Gender Respondent's gender Categorical [Male, Female]
Age Respondent's age Numerical Measured in years
Sleep Duration (hours) Hours of sleep per day Numerical Measured in hours
Physical Activity Level Daily physical activity in minutes Numerical Measured in minutes
Stress Level Stress level on a scale Numerical 1 to 5 (low to high)
BMI Category Body Mass Index category Categorical [Underweight, Normal, Overweight, Obese]
Systolic Blood Pressure Systolic blood pressure Numerical Measured in mmHg
Diastolic Blood Pressure Diastolic blood pressure Numerical Measured in mmHg
Heart Rate (bpm) Resting heart rate Numerical Beats per minute
Daily Steps Average number of steps per day Numerical Measured in steps
Sleep Disorder Reported sleep disorder Categorical [Yes, No]

WHAT I HAVE DONE

Step 1: Exploratory Data Analysis

  • Summary statistics
  • Data visualization for numerical feature distributions
  • Target splits for categorical features

Step 2: Data Cleaning and Preprocessing

  • Handling missing values
  • Label encoding categorical features
  • Standardizing numerical features

Step 3: Feature Engineering and Selection

  • Merging features based on domain knowledge
  • Creating derived features such as "Activity-to-Sleep Ratio"

Step 4: Modeling

  • Model trained: Decision Tree
  • Class imbalance handled using SMOTE
  • Metric for optimization: F1-score

Step 5: Result Analysis

  • Visualized results using confusion matrices and classification reports
  • Interpreted feature importance for tree-based models

MODELS USED AND THEIR ACCURACIES

Model Accuracy (%) F1-Score (%) Precision (%) Recall (%)
Decision Tree 74.50 75.20 73.00 77.50

CONCLUSION

WHAT YOU HAVE LEARNED

Insights gained from the data

  • Sleep Duration, Stress Level, and Physical Activity are the most indicative features for predicting sleep quality.

Improvements in understanding machine learning concepts

  • Learned and implemented preprocessing techniques like encoding categorical variables and handling imbalanced datasets.
  • Gained insights into deploying a machine learning model using Flask for real-world use cases.

Challenges faced and how they were overcome

  • Managing imbalanced classes: Overcame this by using SMOTE for oversampling the minority class.
  • Choosing a simple yet effective model: Selected Decision Tree for its interpretability and ease of deployment.

USE CASES OF THIS MODEL

Application 1

A health tracker app can integrate this model to assess and suggest improvements in sleep quality based on user inputs.

Application 2

Healthcare providers can use this tool to make preliminary assessments of patients' sleep health, enabling timely interventions.


FEATURES PLANNED BUT NOT IMPLEMENTED

Feature 1

Advanced models such as Random Forest, AdaBoost, and Gradient Boosting were not implemented due to the project's focus on simplicity and interpretability.

Feature 2

Integration with wearable device data for real-time predictions was not explored but remains a potential enhancement for future work.