GitHub - gmjohns/bookie

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
src/msfapi		src/msfapi
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
README.txt		README.txt

Repository files navigation

# bookie
To run KNN-CV.py:
1) Ensure the libraries identified in the import statements are available to
the program.
2) Select KNN-CV.py from the bookie/src/mfsapi directory and run the program.
The KNN-CV.py file depends on two data files located in the
bookie/src/mfsapi/data directory ("2017-regularPP.csv" and
2018-regularPP.csv") and is preconfigured to go there for the data.
That is all that is required to run the code.
Notes:
a) This program performs KNN for 4 scenarios using cross-validation of 2017
and 2018 season data. The 4 scenarios are: 1) Normalized data,
2) Standardized data, 3) Normalized with mean-shifted and PCA data and
4) Standardized with PCA data.
b) The code is set up to run with the following global paramaters: max
number of K-nearest neighbors = 300, random state = 30,
number of CV splits = 10, max number of PCA components to consider = 10.
c) With these parameters, it takes 3-4 hours to run and the output is
4 graphs, so you can start it and go out for a nice dinner.
d) If you want to just verify that the code works without verifying the
results compared to the report, you can change the global variables
as long as you don't make selections that cause errors (i.e., inconsistent
parameters with the code).

To run KNN_Train_Test.py:
1) Ensure the libraries identified in the import statements are available to
the program.
2) Select KNN_Train_Test.py from the bookie/src/mfsapi directory and run
the program. The KNN_Train_Test.py file depends on three data files
located in the bookie/src/mfsapi/data directory ("2017-regularPP.csv",
2018-regularPP.csv", and "2019-regularPP.csv") and is preconfigured to go
there for the data. That is all that is required to run the code.
Notes:
a) This program performs KNN for 2 scenarios using 2017 and 2018 season data
for training and 2019 season data for testing. The 2 scenarios are:
1) Standardized data, and 2) Standardized with PCA data.
b) The code is set up to run with the following global paramaters:
max number of K-nearest neighbors = 300, random state = 30, max number
of PCA components to consider = 10.
c) KNN_Train_Test.py takes much less time that the KNN-CV.py file
to run, but it still takes over an hour. So, you don't have time
for a nice dinner but can go out for a quick lunch.
d) If you want to just verify that the code works without verifying the
results compared to the report, you can change the global variables
as long as you don't make selections that cause errors (i.e., inconsistent
parameters with the code).

To run decisionTree.R:
1) Ensure the packages identified in the install statements are available to
the program.
2) Select decisionTree.R from the bookie/src/mfsapi directory and run
the program. The decisionTree.R file depends on three data files
located in the bookie/src/mfsapi/data directory ("2017-regularPP.csv",
2018-regularPP.csv", and "2019-regularPP.csv") and is preconfigured to go
there for the data. That is all that is required to run the code.
Notes:
a) This program performs Decision Tree for 2 scenarios using 2017 data for training and 2018 data for testing (Midway
Results) 2017 and 2018 season data for training and 2019 season data for testing (Final Results). The 2 scenarios are:
1) Cross-Validation, and 2) Without Cross-Validaion.
b) This code produces seperate decision tree outputs for 2 approaches. The 2 approaches are: 1) GINI Index and
2) Information Gain

To run svm.py:
1) Ensure the packages identified in the import statements are available to
the program.
2) To run: python svm.py
Notes:
- This code performs tuning of pca components on 2017/2018 data with cross-validation using optimal
hyperparameters for SVM with sigmoid kernel on standardized data set and outputs to csv file sorted by
descending accuracy.

To run svm_final.py:
1) Ensure the packages identified in the import statements are available to
the program.
2) To run: python svm_final.py
Notes:
- This code fits the optimal SVM model with 2017/2018 training data and tests on unseen 2019 test set.
The code outputs accuracy on the test set as well as several plots to visualize results.

To run nn_keras.py
1) Ensure all necessary packages are installed.
2) To run: python nn_keras
Notes:
- hyperparameter tuning has been commented out on the current version. Currently set to trainon 2017/2018 data and test on 2019 with optimal hyperparamters chosen.