Please download the entire data
file from here, and place it outside the working directory of the notebooks, like this:
- data
- dataset1
- dataset2
- main
- notebook1
- notebook2
cluster_analysis
: exploratory data analysis on each cluster of Pitchers.
cluster_modeling
: run clustering algorithms on pitchers data and measure prediction. results. CAUTION: this notebook takes several hours to run entirely.
functions
: some functions used in the modeling process.
get_statcast
: retrieve Statcast data.
pitcher_data_cleaning
: clean data which are used in cluster modeling process.
RV_calculation
: calculate RE24 and run values on PA data.
Also, some libraries need to be installed to fully run the scripts:
- Scikit Learn
- Scipy
- Numpy
- Pandas
- tqdm
- Matplotlib
- Seaborn