CrossFit Performance Predictor Using Gradient Boosting Models from Scratch (Only Numpy)

Welcome to my project! I have developed a CrossFit Performance Predictor using various Gradient Boosting Models, built from scratch using only numpy. I have worked with a dataset from the CrossFit Games, implementing models including Simple Gradient Boost, XGBoost, CatBoost, and LightGBM. Furthermore, I compared these custom implementations with their respective library models to analyze their accuracy and training time.

The project begins with implementing gradient boost models and testing them with diabetes and California housing datasets. I then perform an exploratory data analysis (EDA) and preprocessing on the CrossFit Games dataset and adapt my custom models to this data. Finally, I conduct an evaluation of my models in comparison to the equivalent library models.

For an in-depth understanding and detailed walkthrough of the project, please refer to my Jupyter notebook, report.ipynb. This report covers every stage of the project, providing comprehensive insights into the development and evaluation of my models.

This project is a demonstration of the power of Gradient Boosting Models in predictive tasks and the effectiveness of implementing these complex models from scratch. I hope you find it informative and insightful.

Package versions

Installation

To use this repository, first clone it using the following command:

git clone https://github.com/SeungjaeLim/Crossfit-GBM_from_Scratch.git

Then, navigate to the directory of the cloned repository:

cd Crossfit-GBM_from_Scratch

Next, install the necessary dependencies using conda:

conda env create -f environment.yaml

Usage

To run the training script, use the following command:

python ./train/train.py

After the model has been trained, you can run the inference script with the following command:

python ./inference/infer.py

Evaluation

Fran	Helen	Grace	Filthy50	FGoneBad

For a more comprehensive evaluation and detailed results of the implemented models, please refer to our Jupyter notebook report.ipynb. In this notebook, you'll find in-depth analysis, additional visualizations, and explanations for each step of the project. It serves as a report that covers the entire process of our project, from data exploration and preprocessing to model training, evaluation, and conclusion.

You can view report.ipynb directly on GitHub or download it to run it on your local Jupyter notebook environment. Remember to ensure that you have all the necessary packages installed to avoid any run-time issues.

Reviews

Grade	Summary of the Project	Pros	Cons
100.00	Based on the theoretical background, gradient boost models were actually implemented using numpy. Performed comparisons of each model on various datasets, from the diabetes, california housing dataset to the crossfit dataset, and obtained quite successful accuracy.	1. There are many visualizations from the introduction to the main code, so it was easy to understand the results and distribution. 2. Data analysis was conducted from various angles using various datasets and models to help understanding. 3. It is impressive that the models were actually implemented using numpy using a theoretical background.	1. It was unfortunate that the number of data in diabetes or crossfit set was slightly insufficient. 2. As an advantage, but also as a disadvantage, too many distribution visualizations, such as some correlation plots, were not considered for context understanding. 3. What function each code has or what role it plays is written less. If you write down those points, I think it will help me to read more.
100.00	Introduction on ensemble methods, and their applications on crossfit performance prediction	1. Describing raw data with graphs, and analyzing its feature via EDA. 2. Introducing and comparing various ML methods. 3. Well-organized structure as a real paper.	I'm not such an expert to figure out its weaknesses, but I hope there would be some more explanations on how each algorithm works and what each of the graph implies for us ML beginners.
100.00	This notebook goes through crossfit performance predictor with gradient boost. At first, notebook describe about background knowledge to understand how this algorithm works. Then, implement some models based on gradient boost, train, and get result. Finally, comparing those result and conclude the notebook with analysis.	1. Background knowledge is very high quality with a lot of images that help understanding the concept. 2. A lot of analysis about the predicted result. 3. Topic is very impressive that gets machine learning inspiration from his own crossfit gym.	1. There are no mathematical notations about introducing the algorithm. Using LaTex in colab to express mathematical notation can make a better notebook. 2. For the diabetes dataset, XGBoost shows the best performance but is worst in the California housing dataset implemented from the library. I want to know the reason why XGBoost's performance is bad in the California housing dataset, but there is nothing about it. 3. In LightGBM classifier, there is background knowledge about how a lot of categorical features makes LightGBM's performance stronger, so I think notating the number of categorical features of each dataset can make a better notebook to understand the feature of LightGBM classifier.
100.00	I am highly impressed with the student's project on gradient boosting models and their implementation in the context of CrossFit workout outcome prediction. The project showcased a deep understanding of the models, advanced implementation skills using Numpy, and an ability to compare and analyze the performance of different implementations.	With its exemplary performance and insightful conclusions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossFit Performance Predictor Using Gradient Boosting Models from Scratch (Only Numpy)

Package versions

Installation

Usage

Evaluation

Reviews

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
inference		inference
models		models
preprocess		preprocess
train		train
README.md		README.md
Report.ipynb		Report.ipynb
environment.yaml		environment.yaml

SeungjaeLim/Crossfit-GBM_from_Scratch

Folders and files

Latest commit

History

Repository files navigation

CrossFit Performance Predictor Using Gradient Boosting Models from Scratch (Only Numpy)

Package versions

Installation

Usage

Evaluation

Reviews

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages