Skip to content

msoczi/football_predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Football matches result predictions


For page with results refer to: https://msoczi.github.io/football_predictions/web/index.html


The aim of the project was to create a tool for predicting the results of league matches from the leading European leagues based on data prepared by myself.

The project was implemented from scratch, i.e. it included:

  • collection of raw data on the basis of which it will be possible to create characteristics and then modeling
  • creating variables based on i.a. time aggregates (last n matches), position in the table, team form, etc.
  • calculate historical data for modeling
  • building the target solution: XGBoost model with 3 classes. Then, based on the estimated probability, a decision tree was created, which in a simple, rule-based way predicts which team will win the match (or a possible draw)
  • creating a script that downloads data about upcoming matches, creating model variables for given teams and prediction of the match result.

Raw data with match results are downloaded from https://www.football-data.co.uk.
The advantage of the approach is the ability to predict results from any league. But o far, it is possible to predict the results of the first league of the following countries:

  • England England
  • Italy Italy
  • Germany Germany
  • Spain Spain
  • France France

Based on the raw data, I created the appropriate characteristics by myself. The full list of variables is available in the file: variables

The XGBoost model was built on a hand-prepared historical sample containing 7210 rows and 354 columns. As the objective function, multi:softprob was used so that the model's output was the probability of assigning observations to each of the 3 classes of match result - H (Home), A (Away), D (Draw).
These probabilities were then used to build a simple decision tree (max_depth = 3) that would allow to categorize individual observations in a rule-based manner, i.e. to predict the final result with simple rules. This procedure allowed for the generalization of the results in such a way that the draw was not too rare. Below is the sheme of decision tree.
tree

Forecasts do not use bookmaker odds.

You can view the results on the site:

You can also clone the repository and use it with python.
How to use?

  1. Clone repository.
git clone https://github.com/msoczi/football_predictions
  1. Create and activate virtual environment for python.
# LINUX:
python3 -m venv football_preds
source football_preds/bin/activate

# WINDOWS:
python -m venv football_preds
football_preds/Scripts/activate
  1. Install required packages (in virtual environment!).
pip install -r requirements.txt
  1. Run the main_script.py from console.
python scripts/main_script.py <LEAGUE_NAME>

Then results will be saved to \output_tables for league passed in the argument.