Skip to content

This project clusters MLB pitchers and predicts future performance (expected run value) based on historical cluster results.

Notifications You must be signed in to change notification settings

cj0121/pitchers-clustering

Repository files navigation

MLB Pitchers Clustering and Expected Run Value Prediction Model

Please download the entire data file from here, and place it outside the working directory of the notebooks, like this:

  1. data
    1. dataset1
    2. dataset2
  2. main
    1. notebook1
    2. notebook2


cluster_analysis: exploratory data analysis on each cluster of Pitchers.

cluster_modeling: run clustering algorithms on pitchers data and measure prediction. results. CAUTION: this notebook takes several hours to run entirely.

functions: some functions used in the modeling process.

get_statcast: retrieve Statcast data.

pitcher_data_cleaning: clean data which are used in cluster modeling process.

RV_calculation: calculate RE24 and run values on PA data.


Also, some libraries need to be installed to fully run the scripts:

  1. Scikit Learn
  2. Scipy
  3. Numpy
  4. Pandas
  5. tqdm
  6. Matplotlib
  7. Seaborn

About

This project clusters MLB pitchers and predicts future performance (expected run value) based on historical cluster results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published