MLB Pitchers Clustering and Expected Run Value Prediction Model

Please download the entire data file from here, and place it outside the working directory of the notebooks, like this:

data
1. dataset1
2. dataset2
main
1. notebook1
2. notebook2

cluster_analysis: exploratory data analysis on each cluster of Pitchers.

cluster_modeling: run clustering algorithms on pitchers data and measure prediction. results. CAUTION: this notebook takes several hours to run entirely.

functions: some functions used in the modeling process.

get_statcast: retrieve Statcast data.

pitcher_data_cleaning: clean data which are used in cluster modeling process.

RV_calculation: calculate RE24 and run values on PA data.

Also, some libraries need to be installed to fully run the scripts:

Scikit Learn
Scipy
Numpy
Pandas
tqdm
Matplotlib
Seaborn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.markdown

readme.markdown

MLB Pitchers Clustering and Expected Run Value Prediction Model

Files

readme.markdown

Latest commit

History

readme.markdown

File metadata and controls

MLB Pitchers Clustering and Expected Run Value Prediction Model