March Madness

This project aims to predict the outcome of every game in the March Madness basketball tournament.

As well as just predicting each game using machine learning, a complete pipeline is built which ingests open-source data, cleans and processes the data, builds a machine learning model, and evaluates the model. This entire repository is self-contained; it is possible to run and test the whole pipeline by cloning the repo.

March Madness

March Madness is the annual Division I college basketball tournament in the United States of America. After a qualification round known as the First Four which reduces eight low-seeded teams to four, 64 college basketball teams compete in a straight knockout tournament with the winning team declared the national champions. The men's competition has been played almost yearly since 1939 with an equivalent women's competition introduced in 1982.

March Madness is one of the largest annual sporting events in America and tens of millions of Americans take part in bracket pool contests, attempting to predict the outcome of every game in the tournament.

Set Up

Create the Python environment by running the following command:

conda env create --name march-madness-2023 --file ./env/environment.yaml

Download API keys for Kaggle (required to ingest some datasets) . Follow the instructions on the Kaggle website.
Run the ./run_pipeline.py Python file.

See the Project Set Up page on the Wiki for more details.

Pipeline

The pipeline ingests data from two sources; Kaggle and Five Thirty Eight. This data is then processed and converted into a training and test set. The training set is used to train multiple machine learning models, and then current tournament predictions are made with the test set. These predictions can be submitted back to a Kaggle competition or viewed on a local dashboard.

The basic flow of the pipeline is given below.

See the Pipeline Processes page on the Wiki for more details.

Dashboard

After running the main pipeline, it is possible to run a Dash dashboard on a local server (i.e. localhost). This is achievable with the run_component/run_server parameter in the config file.

The dashboard displays model predictions to the user alongside key feature values and SHAP values. See the image below for a basic example.

Wiki

The Wiki is the main source of information on this project. It has pages that explain each main source code module, each main data science technique used, and background information. Please check out the Wiki first if unsure about any part of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
config		config
data		data
env		env
src		src
viz		viz
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

March Madness

March Madness

Set Up

Pipeline

Dashboard

Wiki

About

Releases

Packages

Languages

License

CurtisThompson/march-madness

Folders and files

Latest commit

History

Repository files navigation

March Madness

March Madness

Set Up

Pipeline

Dashboard

Wiki

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages