Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics and lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, data.csv, with the following columns:

Person ID
Gender
Age
Occupation
Sleep Duration: Average number of hours of sleep per day
Quality of Sleep: A subjective rating on a 1-10 scale
Physical Activity Level: Average number of minutes the person engages in physical activity daily
Stress Level: A subjective rating on a 1-10 scale
BMI Category
Blood Pressure: Indicated as systolic pressure over diastolic pressure
Heart Rate: In beats per minute
Daily Steps
Sleep Disorder: One of None, Insomnia or Sleep Apnea

Background: You work for a health insurance company and are tasked to identify whether a potential client will likely have a sleep disorder. The company wants to use this information to determine the premium they want the client to pay.

Objective: Construct a classifier to predict the presence of a sleep disorder based on the other columns in the dataset.
Methods Used: Exploratory Data Analysis, Inferential Statistics, Data Visualization, Machine Learning, Predictive Modeling.
Type of Problem: Multi-class Classification Task.
Language, Libraries, technologies used: Python, Pandas, Matplotlib, Seaborn, Numpy, Scipy, Scikit-learn, joblib

KEY INSIGHTS:

To start this project, I first checked that all the data was clean and matched the description in the data dictionary; I cleaned up the data that wasn't clean and then validated all my data.
Once my data was clean, I carried out an exploratory data analysis, followed by statistical tests which revealed that :

Those whose occupation is Accountant, Doctor, Engineer, or Lawyer are less likely to have a sleep disorder nurses have a high chance of sleep apnea, and Salespersons and Teachers are more likely to have insomnia
Overweight people have a high chance to suffer from a sleep disorder and people with an ideal or normal Blood pressure are less likely to have a sleep disorder.
People between the ages of 50 and 60 have low stress levels, and a sleep quality of around 9, but are susceptible to sleep apnea
Men and women aged between 42 and 45 are very likely to have insomnia, and women of 50 and above 55 have a very high chance of having sleep apnea

After that, I preprocessed my data and created a baseline model: A LogisticRegression and a comparison model: A DecisionTree, i fitted both models and evaluated them. With an accuracy of 89% the baseline model performs better .
I plotted the importance of each variable to see which variables contributed the most to the model prediction. I saved the model as a pickle file using joblib

Dataset Source: Kaggle

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
Sleep_Health.ipynb		Sleep_Health.ipynb
banner_image.png		banner_image.png
data.csv		data.csv
model.pkl		model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sleep Health and Lifestyle

KEY INSIGHTS:

About

Releases

Packages

Languages

grascya/Sleep-Health_-Lifestyle-Dataset

Folders and files

Latest commit

History

Repository files navigation

Sleep Health and Lifestyle

KEY INSIGHTS:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages