This project is a submission for the Nordeus JobFair Data Science Challenge, dedicated to predicting league ranks for clubs inside of Top Eleven. Using machine learning models, the aim to forecast the league position of each club at the end of a season. Presumably, the goal is to create a balanced and competitive experience for players.
The project was realised inside Google Colab, utilising its resources, because of the limited computing perfornace of the local machine.
The project utilizes two datasets:
jobfair_train.csv
- Contains features like user activity, player statistics, and the target variableleague_rank
.jobfair_test.csv
- Similar to the training dataset but without the target variable, for model prediction.
Features include user engagement metrics, player quality indicators, and other relevant game activity data.
We explore several models:
- RandomForestClassifier
- XGBoostClassifier
- LGBMClassifier
- DecisionTreeClassifier
- StackingClassifier (with Logistic Regression as the final estimator), achieving the best MAE of 2.59
Make sure Python is installed on your system. Dependencies include:
- pandas
- scikit-learn
- xgboost
- matplotlib
Install these using pip:
pip install pandas scikit-learn xgboost matplotlib
To run the models and evaluate their performance, follow these steps:
- Load the datasets
jobfair_train.csv
andjobfair_test.csv
. - Preprocess the data as per the preprocessing steps outlined in the code.
- Train the machine learning models using the preprocessed training data.
- Evaluate the models using cross-validation techniques.
- Use the trained models to make predictions on the preprocessed test data.
- Analyze the results, and adjust the models or preprocessing steps as needed.
There are several areas where this project can be further enhanced:
- Further Data Exploration: Implementing more sophisticated data exploration techqniues (eg. different plots) to avoid having a large bias.
- Advanced Modeling Techniques: Experimenting with more sophisticated machine learning algorithms or deep learning models.
- Feature Engineering: Exploring additional features or transformations that could improve model performance.
- Hyperparameter Tuning: More extensive tuning of model parameters to optimize performance.
- Data Augmentation: Increasing the dataset size or variety, possibly by incorporating additional relevant data sources.
- Model Interpretability: Implementing tools and techniques for better understanding and interpreting the model's decisions.
- Deployment Strategy: Developing a plan for deploying the model in a real-world environment, ensuring scalability and maintainability.
To run the models and evaluate their performance, follow these steps:
- Load the datasets
jobfair_train.csv
andjobfair_test.csv
. - Preprocess the data as per the preprocessing steps outlined in the code.
- Train the machine learning models using the preprocessed training data.
- Evaluate the models using cross-validation techniques.
- Use the trained models to make predictions on the preprocessed test data.
- Analyze the results, and adjust the models or preprocessing steps as needed.
README.md
: This file, providing an overview and instructions.NordeusChallenge.ipynb
: Contains code for the challenge.league_rank_predictions.csv
: Includes all the predictions.
This project was realised as part of the JobFiar 2023 Nordeus challenge.