Skip to content

Classifier to predict the presence of a sleep disorder based on the other columns in the dataset.

Notifications You must be signed in to change notification settings

grascya/Sleep-Health_-Lifestyle-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics and lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, data.csv, with the following columns:

  • Person ID
  • Gender
  • Age
  • Occupation
  • Sleep Duration: Average number of hours of sleep per day
  • Quality of Sleep: A subjective rating on a 1-10 scale
  • Physical Activity Level: Average number of minutes the person engages in physical activity daily
  • Stress Level: A subjective rating on a 1-10 scale
  • BMI Category
  • Blood Pressure: Indicated as systolic pressure over diastolic pressure
  • Heart Rate: In beats per minute
  • Daily Steps
  • Sleep Disorder: One of None, Insomnia or Sleep Apnea

Background: You work for a health insurance company and are tasked to identify whether a potential client will likely have a sleep disorder. The company wants to use this information to determine the premium they want the client to pay.

Objective: Construct a classifier to predict the presence of a sleep disorder based on the other columns in the dataset.
Methods Used: Exploratory Data Analysis, Inferential Statistics, Data Visualization, Machine Learning, Predictive Modeling.
Type of Problem: Multi-class Classification Task.
Language, Libraries, technologies used: Python, Pandas, Matplotlib, Seaborn, Numpy, Scipy, Scikit-learn, joblib

KEY INSIGHTS:

To start this project, I first checked that all the data was clean and matched the description in the data dictionary; I cleaned up the data that wasn't clean and then validated all my data.
Once my data was clean, I carried out an exploratory data analysis, followed by statistical tests which revealed that :

  • Those whose occupation is Accountant, Doctor, Engineer, or Lawyer are less likely to have a sleep disorder nurses have a high chance of sleep apnea, and Salespersons and Teachers are more likely to have insomnia
  • Overweight people have a high chance to suffer from a sleep disorder and people with an ideal or normal Blood pressure are less likely to have a sleep disorder.
  • People between the ages of 50 and 60 have low stress levels, and a sleep quality of around 9, but are susceptible to sleep apnea
  • Men and women aged between 42 and 45 are very likely to have insomnia, and women of 50 and above 55 have a very high chance of having sleep apnea

After that, I preprocessed my data and created a baseline model: A LogisticRegression and a comparison model: A DecisionTree, i fitted both models and evaluated them. With an accuracy of 89% the baseline model performs better .
I plotted the importance of each variable to see which variables contributed the most to the model prediction. I saved the model as a pickle file using joblib

Dataset Source: Kaggle

About

Classifier to predict the presence of a sleep disorder based on the other columns in the dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published