The NBA playoffs is the postseason tournament of the National Basketball Association (NBA) held to determine the league's Champion. An annual best-of-seven elimination tournament, the NBA playoffs are held after the league's regular season and its preliminary postseason tournament, the NBA Play-In Tournament.
I use historical data on each regular season played by a team to predict the last 3 Champions (2020-2022).
My regression model combined with adjusted ranking metrics correctly predicted ALL 3 Champions!
But what are the stats (features) that have allowed my model to perform so well?
The following picture shows all the work steps that are carried out. I usually combine these steps in a fully automated pipeline, but since this is a side project and my free time is limited, the pipeline is split into 3 files that are executed sequentially.
- Parse selected Basketball-Reference (Website) pages and save all relevant pages in html-format.
- Basketball-Reference
- Aggregate the data from the html pages and upload it to my MongoDB Cloud account.
- Predict the last 3 (2020-2022) NBA Champions with Machine Learning.
- PowerBI file with a three charts, all three are featured in the 'nba_ml.ipynb' file.