Skip to content

Commit

Permalink
feat : figure, fixed directive
Browse files Browse the repository at this point in the history
  • Loading branch information
mathisdrn committed Apr 8, 2024
1 parent 0dd772f commit 59cdb76
Show file tree
Hide file tree
Showing 3 changed files with 232 additions and 73 deletions.
14 changes: 12 additions & 2 deletions Paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ La création d'un tableau de bord interactif a été réalisé à l'aide de la l

L'écriture de ce papier a été réalisé dans un fichier Markdown.

[MyST](https://mystmd.org/) fait partie d'un écosystème d'outils qui chercher à améliorer le travail de communication scientifique en favorisant le développement d'une science reproducible et indexable. Cet outil a été utilisé pour permettre la diffusion de ce papier de recherche au format d'un [site statique](https://mathisdrn.github.io/head_coach_dismissal/) et d'un [PDF](https://raw.githubusercontent.com/mathisdrn/head_coach_dismissal/master/exports/head_coach_dismissal.pdf) de qualité scientifique.
[MyST](https://mystmd.org/) fait partie d'un écosystème d'outils qui chercher à améliorer le travail de communication scientifique en favorisant le développement d'une science reproducible et indexable. Cet outil a été utilisé pour permettre la diffusion de ce papier de recherche au format d'un [site statique](https://mathisdrn.github.io/head_coach_dismissal/) et d'un [PDF](https://raw.githubusercontent.com/mathisdrn/head_coach_dismissal/master/exports/head_coach_dismissal.pdf) répondant aux exigences de qualité scientifique.

MyST permet de réutiliser les entrées et les sorties des Notebooks Jupyter. Ainsi l'ensemble des figures, tableaux et variables présentes dans ce papier sont directement issus des Notebooks Jupyter. À titre d'exemple, il est possible de renouveller l'intégralité de l'étude à d'autres ligues ou d'autres périodes en modifiant simplement les paramètres des fonctions utilisées dans les Notebooks Jupyter :

Expand Down Expand Up @@ -131,6 +131,11 @@ Les données sur les matchs sont extraites de Transfermakt. Elles contiennent de

## Pré-traitement des données

% continuer la présentation sur la cellule correspondante dans le notebook

```{embed} #inconsistent_team_names
```

Utilisation de l'algorithme de la distance Levenshtein [@Levenshtein1965BinaryCC] pour matcher les noms des clubs entre les deux jeux de données

```{code} python
Expand Down Expand Up @@ -197,13 +202,18 @@ Lorsque l'on s'intéresse au nombre de coach employés par les clubs durant la p
Proportion of Clubs by Number of Head Coaches Appointed (2017 - 2022)
```

Les [](#hc_tenure_per_league1) et [](#hc_per_club_per_league1) observent s'intéresse à l'ancienneté des coachs sportif et au renouvellement des coachs sportifs par rapport aux ligues d'interêt.
Les [](#hc_tenure_per_league1) et [](#hc_per_club_per_league1) et [](#hc_tenure_per_league_kde1) s'intéresse à l'ancienneté des coachs sportif et au renouvellement des coachs sportifs par rapport aux ligues d'interêt.

```{figure} #hc_tenure_per_league
:name: hc_tenure_per_league1
Average Head Coach Tenure for Completed Appointments per League
```

```{figure} #hc_tenure_per_league_kde
:name: hc_tenure_per_league_kde1
Kernel Density Estimation of Head Coach Tenure for Completed Appointments per League (2017 - 2022)
```

```{figure} #hc_per_club_per_league
:name: hc_per_club_per_league1
Average Number of Head Coaches Appointed per Club versus League (2017 - 2022)
Expand Down
179 changes: 132 additions & 47 deletions src/01 Preprocessing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -29,7 +29,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"metadata": {},
"outputs": [
{
Expand All @@ -53,10 +53,8 @@
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>league</th>\n",
" <th>country</th>\n",
" <th>season_year</th>\n",
" <th>date</th>\n",
" <th>league</th>\n",
" <th>home_team</th>\n",
" <th>home_goals</th>\n",
" <th>away_team</th>\n",
Expand All @@ -66,54 +64,44 @@
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Premier League</td>\n",
" <td>England</td>\n",
" <td>2018</td>\n",
" <td>2017-08-11</td>\n",
" <td>Premier League</td>\n",
" <td>Arsenal</td>\n",
" <td>4.0</td>\n",
" <td>Leicester City</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Premier League</td>\n",
" <td>England</td>\n",
" <td>2018</td>\n",
" <td>2017-08-12</td>\n",
" <td>Premier League</td>\n",
" <td>Watford</td>\n",
" <td>3.0</td>\n",
" <td>Liverpool</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Premier League</td>\n",
" <td>England</td>\n",
" <td>2018</td>\n",
" <td>2017-08-12</td>\n",
" <td>Premier League</td>\n",
" <td>Crystal Palace</td>\n",
" <td>0.0</td>\n",
" <td>Huddersfield</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Premier League</td>\n",
" <td>England</td>\n",
" <td>2018</td>\n",
" <td>2017-08-12</td>\n",
" <td>Premier League</td>\n",
" <td>West Brom</td>\n",
" <td>1.0</td>\n",
" <td>Bournemouth</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Premier League</td>\n",
" <td>England</td>\n",
" <td>2018</td>\n",
" <td>2017-08-12</td>\n",
" <td>Premier League</td>\n",
" <td>Chelsea</td>\n",
" <td>2.0</td>\n",
" <td>Burnley</td>\n",
Expand All @@ -124,33 +112,36 @@
"</div>"
],
"text/plain": [
" league country season_year date home_team \\\n",
"0 Premier League England 2018 2017-08-11 Arsenal \n",
"1 Premier League England 2018 2017-08-12 Watford \n",
"2 Premier League England 2018 2017-08-12 Crystal Palace \n",
"3 Premier League England 2018 2017-08-12 West Brom \n",
"4 Premier League England 2018 2017-08-12 Chelsea \n",
" date league home_team home_goals away_team \\\n",
"0 2017-08-11 Premier League Arsenal 4.0 Leicester City \n",
"1 2017-08-12 Premier League Watford 3.0 Liverpool \n",
"2 2017-08-12 Premier League Crystal Palace 0.0 Huddersfield \n",
"3 2017-08-12 Premier League West Brom 1.0 Bournemouth \n",
"4 2017-08-12 Premier League Chelsea 2.0 Burnley \n",
"\n",
" home_goals away_team away_goals \n",
"0 4.0 Leicester City 3.0 \n",
"1 3.0 Liverpool 3.0 \n",
"2 0.0 Huddersfield 3.0 \n",
"3 1.0 Bournemouth 0.0 \n",
"4 2.0 Burnley 3.0 "
" away_goals \n",
"0 3.0 \n",
"1 3.0 \n",
"2 3.0 \n",
"3 0.0 \n",
"4 3.0 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "display_data"
"output_type": "execute_result"
}
],
"source": [
"#| label: match_results\n",
"display(match_results.head())"
"match_results.rename(columns = {'season_year': 'season'}, inplace = True)\n",
"# Select all match_results columns except 'country'\n",
"match_results[['date', 'league', 'home_team', 'home_goals', 'away_team', 'away_goals']].head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -305,11 +296,112 @@
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>league</th>\n",
" <th>team</th>\n",
" <th>coach_name</th>\n",
" <th>appointed</th>\n",
" <th>end_date</th>\n",
" <th>days_in_post</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Premier League</td>\n",
" <td>Manchester City</td>\n",
" <td>Pep Guardiola</td>\n",
" <td>2016-07-01</td>\n",
" <td>NaT</td>\n",
" <td>2784</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Premier League</td>\n",
" <td>Liverpool FC</td>\n",
" <td>Jürgen Klopp</td>\n",
" <td>2015-10-08</td>\n",
" <td>2024-06-30</td>\n",
" <td>3188</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Premier League</td>\n",
" <td>Chelsea FC</td>\n",
" <td>Graham Potter</td>\n",
" <td>2022-09-08</td>\n",
" <td>2023-04-02</td>\n",
" <td>206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Premier League</td>\n",
" <td>Chelsea FC</td>\n",
" <td>Thomas Tuchel</td>\n",
" <td>2021-01-26</td>\n",
" <td>2022-09-07</td>\n",
" <td>589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Premier League</td>\n",
" <td>Chelsea FC</td>\n",
" <td>Frank Lampard</td>\n",
" <td>2019-07-04</td>\n",
" <td>2021-01-25</td>\n",
" <td>571</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" league team coach_name appointed end_date \\\n",
"0 Premier League Manchester City Pep Guardiola 2016-07-01 NaT \n",
"1 Premier League Liverpool FC Jürgen Klopp 2015-10-08 2024-06-30 \n",
"2 Premier League Chelsea FC Graham Potter 2022-09-08 2023-04-02 \n",
"3 Premier League Chelsea FC Thomas Tuchel 2021-01-26 2022-09-07 \n",
"4 Premier League Chelsea FC Frank Lampard 2019-07-04 2021-01-25 \n",
"\n",
" days_in_post \n",
"0 2784 \n",
"1 3188 \n",
"2 206 \n",
"3 589 \n",
"4 571 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#| label: head_coach\n",
"display(head_coach.head())"
"display(head_coach.head())\n",
"\n",
"head_coach[['league', 'team', 'coach_name', 'appointed', 'end_date', 'days_in_post']].head()"
]
},
{
Expand Down Expand Up @@ -347,20 +439,13 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"label": "inconsistent_team_names"
},
"source": [
"In total, match_results dataset contains {eval}`len(match_team)` teams and head_coach dataset contains {eval}`len(coach_team)` teams. However some teams name are different between the two datasets. For example 'Liverpool' in match_results is 'Liverpool FC' in head_coach. This is problematic as we will need to join data on team's columns.\n",
"\n",
"In total there is {eval}`len(coach_team_not_in_match)` teams present in match_results but not in head_coach and {eval}`len(match_team_not_in_coach)` teams present in head_coach but not in match_results. It indicates that despite mismatched names, that there are several teams present in match_results which do not have records of a coach. (needs more explaination in Data Extraction about data and why this is surprising based on how we filter head coach to at least include latest head coach).\n",
"\n",
"Addressing this surprise ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To address mismatched teams name we will use Levenshtein Distance (add reference to paper) to match team's name of head_coach missing in match teams with match teams."
"In total there is {eval}`len(coach_team_not_in_match)` teams present in match_results but not in head_coach and {eval}`len(match_team_not_in_coach)` teams present in head_coach but not in match_results. It indicates that despite mismatched names, that there are several teams present in match_results which do not have records of a coach. (needs more explaination in Data Extraction about data and why this is surprising based on how we filter head coach to at least include latest head coach)."
]
},
{
Expand Down
112 changes: 88 additions & 24 deletions src/02 Headcoach analysis.ipynb

Large diffs are not rendered by default.

0 comments on commit 59cdb76

Please sign in to comment.