-
Notifications
You must be signed in to change notification settings - Fork 0
/
rwc.Rmd
122 lines (99 loc) · 3.66 KB
/
rwc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "Rugby World Cup"
output: html_notebook
---
# Analysing data on England players at the Rugby World Cup
```{r}
#Use this line if you don't have rio installed
# install.packages("rio")
#Import data
rwcdata <- rio::import("England Rugby.xlsx", sheet = 2)
#Import specific data for chart
regionsandpositions <- rio::import("rugby-world-cup.csv")
colnames(regionsandpositions) <- c("Region","Position","Players","regtotal")
```
## The stacked bar chart
We've adapted some code from [the BBC R Cookbook](https://bbc.github.io/rcookbook/#make_a_stacked_bar_chart).
First we install the packages needed:
```{r}
install.packages('devtools')
devtools::install_github('bbc/bbplot')
```
```{r}
#This line of code installs the pacman page if you do not have it installed - if you do, it simply loads the package
if(!require(pacman))install.packages("pacman")
pacman::p_load('dplyr', 'tidyr', 'gapminder',
'ggplot2', 'ggalt',
'forcats', 'R.utils', 'png',
'grid', 'ggpubr', 'scales',
'bbplot')
```
This section of code we don't use because it generates the data in the right format - we've already done that.
```{r}
'
#prepare data
stacked_df <- gapminder %>%
filter(year == 2007) %>%
mutate(lifeExpGrouped = cut(lifeExp,
breaks = c(0, 50, 65, 80, 90),
labels = c("Under 50", "50-65", "65-80", "80+"))) %>%
group_by(continent, lifeExpGrouped) %>%
summarise(continentPop = sum(as.numeric(pop)))
#set order of stacks by changing factor levels
stacked_df$lifeExpGrouped = factor(stacked_df$lifeExpGrouped, levels = rev(levels(stacked_df$lifeExpGrouped)))
'
```
Here we generate the stacked bar chart
```{r}
#create plot - specify data frame and x and y axes
stackedbarforrob <- ggplot(data = regionsandpositions,
aes(x = reorder(Region,regtotal), y = Players))+
#Specify fill colour is based on position
geom_col(aes(fill = Position), width = 0.7) +
#Add the BBC styling
bbc_style() +
#Add colour scheme
scale_fill_viridis_d(direction = -1) +
#Add a black line at the bottom of the bars
geom_hline(yintercept = 0, size = 1, colour = "#333333") +
#Add the legend (subtitle can be filled too if you want)
labs(title = "Where England's rugby players come from",
subtitle = "") +
#Position the legend
theme(legend.position = "top",
legend.justification = "left") +
guides(fill = guide_legend(reverse = TRUE))
#Flip from vertical to horizontal bars
stackedbarforrob <- stackedbarforrob + coord_flip()
#Show the results
stackedbarforrob
#Save because we can
ggsave("stackedbarforrob.png", width = 10, height = 7, units = "in")
```
## A sankey diagram
```{r}
#Code adapted from https://www.r-graph-gallery.com/321-introduction-to-interactive-sankey-diagram-2.html
# Library
if(!require(networkD3)){
install.packages("networkD3")
}
library(networkD3)
library(dplyr)
# A connection data frame is a list of flows with intensity for each flow
links <- regionsandpositions
colnames(links) <- c("source","target","value")
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
name=c(as.character(links$source),
as.character(links$target)) %>% unique()
)
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1
# Make the Network
p <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE)
p
```