Obesity Level Predictor

author: Yun Zhou, Zanan Pech and Sepehr Heydarian

DSCI 522 Data Workflows Project

About

In this project we attempt to build a model to classify different levels of obesity. From the dataset we utilized, the target variable is categorized as Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II, and Obesity Type III. We trained and evaluated three machine learning models - K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Decision Tree enhanced with AdaBoost. Our evaluation showed that SVM and the Decision Tree with AdaBoost achieved high predictive accuracy of 97.1% and 97.9% respectively. Although the accuracy of our KNN model was relatively lower at 88.0%. These scores were calcualted from evaluating unseen test data that were splitted prior to models being created. These high scores reflect on the quality of features within the dataset and the models ability to generalize well. With these promising scores, this model could potentially act as a useful tool in the healthcare industry to better help patients and healthcare professionals.

The dataset used is obtained from UC Irvine Machine Learning Repository - Link here. This dataset was used in work by Fabio Mendoza Palechor and Alexis de la Hoz Manotas (Palechor, F. M., & De La Hoz Manotas, A., 2019). Find work here. The dataset contains 2111 observations with 16 features (and one target - obesity level) from individuals from Mexico, Peru, and Colombia (Estimation of Obesity Levels Based On Eating Habits and Physical Condition, 2019). This dataset contains 24 duplicate rows which were dropped after data validation process. Despite its limitations, this dataset was chosen as it offers a rich set of features which are relevant to obesity. However, its important to note that the data comes from only three countries, limiting its diversity to capture global trends, and a significant portion of the data was synthesized which may introduce some bias.

Report

The final report can be found here

Dependencies

Docker

Usage

Ensure Docker is installed and running - Install from here

Clone the main branch of this repository: Repository link

git clone https://github.com/UBC-MDS/obesity-classifier-group17

Once in the root directory of repository in local run the following command in terminal to open container.

docker compose up

From the output of the above command in the terminal find the link to the container. See image as reference to find the url.

For further work on the environment and updating dependencies use environment.yml file (found here.) Once file is updated with new dependencies run:

conda-lock -k explicit --file environment.yml -p linux-64

Push changes to main repository and on Github Actions go to Publish Docker Image to run the workflow. Find docker tag in the new published image and update tag in the docker-compose.yml.

Developer notes

Developer dependencies

conda (version 23.9.0 or higher)
conda-lock (version 2.5.7 or higher)

Running the analysis using Makefile

Open Terminal and set working directory to the root of the repository.
To reset the project to a clean state (remove all generated files), run the following command from the root directory:

    make clean

To run the entire analysis, run the following command:

    make all

Running the analysis using scripts

Open Terminal and set working directory to the root of the repository and run the following commands.
Download dataset

python scripts/download_data.py --write_to="data/raw" --name="ObesityDataSet_raw_data_sinthetic.csv"

Clean data and do validation

python scripts/clean_data.py --raw-data='data/raw/ObesityDataSet_raw_data_sinthetic.csv' --name='ObesityDataSet_processed_data.csv' --data-to="data/processed/" --plot-to="results/figures"

Split and preprocess data

python scripts/split_n_preprocess.py --clean-data=data/processed/ObesityDataSet_processed_data.csv --data-to=data/processed --preprocessor-to=results/models --seed=522 --html-to="results/htmls"

Explanatory Data Analysis

python scripts/eda.py --training_data_split=data/processed/obesity_train.csv --plot_path=results/figures/

Fit the models

python scripts/fit_obesity_classifier.py --encoded-train-data=data/processed/obesity_train_target_encoding.csv --data-to=results/tables --preprocessor=results/models/obesity_preprocessor.pickle --seed=522 --pipeline-to=results/models

Evaluate the models

python scripts/evaluate_models.py --test-data=data/processed/obesity_test_target_encoding.csv --pipeline-path=results/models/trained_pipelines.pkl --data-to=results/tables --plot-to=results/figures

Render report files

quarto render report/obesity_level_predictor_report.qmd --to html

Running Test Cases

For this project series of functions were created and used in python scripts. The test files can be found in the tests folder in the root of this repository. Link to test folder.
Instructions are provided to run the the tests to validate the functions in the README.md file in the tests folder.

License

The Obesity Level Predictor project report is licensed under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-ND 4.0). For additional information visit license link. Follow guidelines highlighted in the license file when using and sharing this work.

References

Palechor, F. M., & De La Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 25, 104344. Retrieved from https://doi.org/10.1016/j.dib.2019.104344

Estimation of Obesity Levels Based On Eating Habits and Physical Condition. (2019). UCI Machine Learning Repository. Retrieved from https://doi.org/10.24432/C5H31Z.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
data		data
img		img
report		report
results		results
scripts		scripts
src		src
tests		tests
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obesity Level Predictor

About

Report

Dependencies

Usage

Developer notes

Developer dependencies

Running the analysis using Makefile

Running the analysis using scripts

Running Test Cases

License

References

About

Releases 4

Packages

Contributors 4

Languages

License

UBC-MDS/obesity-classifier-group17

Folders and files

Latest commit

History

Repository files navigation

Obesity Level Predictor

About

Report

Dependencies

Usage

Developer notes

Developer dependencies

Running the analysis using Makefile

Running the analysis using scripts

Running Test Cases

License

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages