-
Notifications
You must be signed in to change notification settings - Fork 0
/
05_exercises.Rmd
358 lines (285 loc) · 17.3 KB
/
05_exercises.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
---
title: 'Weekly Exercises #5'
author: "Bella Ding"
output:
html_document:
keep_md: TRUE
toc: TRUE
toc_float: TRUE
df_print: paged
code_download: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error=TRUE, message=FALSE, warning=FALSE)
```
```{r libraries}
library(tidyverse) # for data cleaning and plotting
library(gardenR) # for Lisa's garden data
library(lubridate) # for date manipulation
library(openintro) # for the abbr2state() function
library(palmerpenguins)# for Palmer penguin data
library(maps) # for map data
library(ggmap) # for mapping points on maps
library(gplots) # for col2hex() function
library(RColorBrewer) # for color palettes
library(sf) # for working with spatial data
library(leaflet) # for highly customizable mapping
library(ggthemes) # for more themes (including theme_map())
library(plotly) # for the ggplotly() - basic interactivity
library(gganimate) # for adding animation layers to ggplots
library(transformr) # for "tweening" (gganimate)
library(gifski) # need the library for creating gifs but don't need to load each time
library(shiny) # for creating interactive apps
library(ggimage)
theme_set(theme_minimal())
```
```{r data}
# SNCF Train data
small_trains <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/small_trains.csv")
# Lisa's garden data
data("garden_harvest")
# Lisa's Mallorca cycling data
mallorca_bike_day7 <- read_csv("https://www.dropbox.com/s/zc6jan4ltmjtvy0/mallorca_bike_day7.csv?dl=1") %>%
select(1:4, speed)
# Heather Lendway's Ironman 70.3 Pan Am championships Panama data
panama_swim <- read_csv("https://raw.githubusercontent.com/llendway/gps-data/master/data/panama_swim_20160131.csv")
panama_bike <- read_csv("https://raw.githubusercontent.com/llendway/gps-data/master/data/panama_bike_20160131.csv")
panama_run <- read_csv("https://raw.githubusercontent.com/llendway/gps-data/master/data/panama_run_20160131.csv")
#COVID-19 data from the New York Times
covid19 <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
```
## Put your homework on GitHub!
Go [here](https://github.com/llendway/github_for_collaboration/blob/master/github_for_collaboration.md) or to previous homework to remind yourself how to get set up.
Once your repository is created, you should always open your **project** rather than just opening an .Rmd file. You can do that by either clicking on the .Rproj file in your repository folder on your computer. Or, by going to the upper right hand corner in R Studio and clicking the arrow next to where it says Project: (None). You should see your project come up in that list if you've used it recently. You could also go to File --> Open Project and navigate to your .Rproj file.
## Instructions
* Put your name at the top of the document.
* **For ALL graphs, you should include appropriate labels.**
* Feel free to change the default theme, which I currently have set to `theme_minimal()`.
* Use good coding practice. Read the short sections on good code with [pipes](https://style.tidyverse.org/pipes.html) and [ggplot2](https://style.tidyverse.org/ggplot2.html). **This is part of your grade!**
* **NEW!!** With animated graphs, add `eval=FALSE` to the code chunk that creates the animation and saves it using `anim_save()`. Add another code chunk to reread the gif back into the file. See the [tutorial](https://animation-and-interactivity-in-r.netlify.app/) for help.
* When you are finished with ALL the exercises, uncomment the options at the top so your document looks nicer. Don't do it before then, or else you might miss some important warnings and messages.
## Warm-up exercises from tutorial
1. Choose 2 graphs you have created for ANY assignment in this class and add interactivity using the `ggplotly()` function.
```{r}
# total harvest in pounds for each tomato variety
tomato_varieity_graph <- garden_harvest %>%
filter(vegetable=="tomatoes") %>% # subseting tomatoes
group_by(vegetable, variety) %>%
summarise(first_harvest_date=min(date), total_weight_lbs = sum(weight)*0.00220462) %>% # get the first harvest date for each variety, and get the sum of weight for each variety in pounds
arrange(first_harvest_date) %>% # sort by first_harvest_date => from earliest to latest
ggplot(aes(y=reorder(variety, first_harvest_date),x=total_weight_lbs))+ # plot
geom_col()+
labs(y="variety (from smallest to largest by first harvest date)") # label
ggplotly(tomato_varieity_graph)
# Change of Total Weight of Harvested Overtime
weight_harvested_graph <- garden_harvest %>%
mutate(month = month(date, label = TRUE)) %>%
group_by(month) %>%
summarize(total_weight = sum(weight* 0.00220462)) %>%
ggplot(aes(x = month, y = total_weight, fill=total_weight))+
labs(x = "Month", y ="Weight of Vegetables Harvested (lbs)", title = "Change of Total Weight of Harvested Overtime")+
theme_igray()+
geom_bar(stat = "identity")
ggplotly(weight_harvested_graph)
```
2. Use animation to tell an interesting story with the `small_trains` dataset that contains data from the SNCF (National Society of French Railways). These are Tidy Tuesday data! Read more about it [here](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-26).
```{r,eval=FALSE}
small_trains %>%
mutate(date=date(paste(year,month,"01",sep = "-"))) %>%
group_by(date) %>%
summarise(total_trips_by_month = sum(total_num_trips)) %>%
ggplot(aes(x=date, y= total_trips_by_month))+
geom_line(color="pink", size=1)+
labs(y = "Total number of trips by time")+
transition_reveal(date)
```
```{r, eval=FALSE, echo=FALSE}
anim_save("small_train_plot.gif")
```
```{r, echo=FALSE}
knitr::include_graphics("small_train_plot.gif")
```
## Garden data
3. In this exercise, you will create a stacked area plot that reveals itself over time (see the `geom_area()` examples [here](https://ggplot2.tidyverse.org/reference/position_stack.html)). You will look at cumulative harvest of tomato varieties over time. You should do the following:
* From the `garden_harvest` data, filter the data to the tomatoes and find the *daily* harvest in pounds for each variety.
* Then, for each variety, find the cumulative harvest in pounds.
* Use the data you just made to create a static cumulative harvest area plot, with the areas filled with different colors for each vegetable and arranged (HINT: `fct_reorder()`) from most to least harvested (most on the bottom).
* Add animation to reveal the plot over date.
I have started the code for you below. The `complete()` function creates a row for all unique `date`/`variety` combinations. If a variety is not harvested on one of the harvest dates in the dataset, it is filled with a value of 0.
```{r, eval=FALSE}
garden_harvest %>%
filter(vegetable == "tomatoes") %>%
group_by(date, variety) %>%
summarize(daily_harvest_lb = sum(weight)*0.00220462) %>%
ungroup() %>%
complete(variety, date, fill = list(daily_harvest_lb = 0)) %>%
mutate(cum_harvest_lb = cumsum(daily_harvest_lb)) %>%
ggplot(aes(x=date, y= cum_harvest_lb, fill=fct_reorder(variety, cum_harvest_lb)))+
geom_area()+
labs(y=" cumulative harvest in pounds")+
scale_fill_discrete(name = "variety")+
transition_reveal(date)
```
```{r, eval=FALSE, echo=FALSE}
anim_save("garden_harvest_plot.gif")
```
```{r, echo=FALSE}
knitr::include_graphics("garden_harvest_plot.gif")
```
## Maps, animation, and movement!
4. Map my `mallorca_bike_day7` bike ride using animation!
Requirements:
* Plot on a map using `ggmap`.
* Show "current" location with a red point.
* Show path up until the current point.
* Color the path according to elevation.
* Show the time in the subtitle.
* CHALLENGE: use the `ggimage` package and `geom_image` to add a bike image instead of a red point. You can use [this](https://raw.githubusercontent.com/llendway/animation_and_interactivity/master/bike.png) image. See [here](https://goodekat.github.io/presentations/2019-isugg-gganimate-spooky/slides.html#35) for an example.
* Add something of your own! And comment on if you prefer this to the static map and why or why not.
```{r, eval= FALSE}
bike_image = "https://raw.githubusercontent.com/llendway/animation_and_interactivity/master/bike.png"
mallorca_bike_day7_with_image <- mallorca_bike_day7 %>%
mutate(image = bike_image)
mallorca_map <- get_stamenmap(
bbox = c(left = 2.38, bottom = 39.55, right = 2.62, top = 39.7),
maptype = "terrain",
zoom = 11
)
ggmap(mallorca_map) +
geom_path(data = mallorca_bike_day7_with_image,
aes(x=lon, y=lat, color=ele),size=2) +
geom_image(data = mallorca_bike_day7_with_image,
aes(image = image),
size = 0.1)+
labs(subtitle = "Time: {frame_along}")+
annotate(geom = "point", x = 2.586255, y = 39.66033, color = "red",size=4)+
scale_color_gradient2( # added the color
midpoint = 275,
low = "black",
mid = "purple",
high = "orange")+
transition_reveal(time)
```
```{r, eval=FALSE, echo=FALSE}
anim_save("bike_plot.gif")
```
```{r, echo=FALSE}
knitr::include_graphics("bike_plot.gif")
```
I feel like this map is better as it can directly visualize the location at each specific time.
5. In this exercise, you get to meet my sister, Heather! She is a proud Mac grad, currently works as a Data Scientist at 3M where she uses R everyday, and for a few years (while still holding a full-time job) she was a pro triathlete. You are going to map one of her races. The data from each discipline of the Ironman 70.3 Pan Am championships, Panama is in a separate file - `panama_swim`, `panama_bike`, and `panama_run`. Create a similar map to the one you created with my cycling data. You will need to make some small changes: 1. combine the files (HINT: `bind_rows()`, 2. make the leading dot a different color depending on the event (for an extra challenge, make it a different image using `geom_image()!), 3. CHALLENGE (optional): color by speed, which you will need to compute on your own from the data. You can read Heather's race report [here](https://heatherlendway.com/2016/02/10/ironman-70-3-pan-american-championships-panama-race-report/). She is also in the Macalester Athletics [Hall of Fame](https://athletics.macalester.edu/honors/hall-of-fame/heather-lendway/184) and still has records at the pool.
```{r}
panama <- bind_rows(panama_swim,panama_bike,panama_run)
panama_swim
panama_bike
panama_run
```
```{r, eval= FALSE}
panama_map <- get_stamenmap(
bbox = c(left = -79.5745, bottom = 8.8924, right = -79.4948, top = 8.9969),
maptype = "terrain",
zoom = 13
)
ggmap(panama_map)+
geom_path(data = panama,
aes(x=lon, y=lat, group=event),size=1) +
labs(subtitle = "Time: {frame_along}")+
annotate(geom = "point", x = -79.54469, y = 8.928600, color = "red",size=4)+
annotate(geom = "point", x = -79.51783, y = 8.975294, color = "green",size=4)+
annotate(geom = "point", x = -79.55428, y = 8.939356, color = "blue",size=4)+
scale_color_gradient2( # added the color
midpoint = 45,
low = "black",
mid = "purple",
high = "orange")+
transition_reveal(time)
```
```{r, eval=FALSE, echo=FALSE}
anim_save("panama_plot.gif")
```
```{r, echo=FALSE}
knitr::include_graphics("panama_plot.gif")
```
## COVID-19 data
6. In this exercise, you are going to replicate many of the features in [this](https://aatishb.com/covidtrends/?region=US) visualization by Aitish Bhatia but include all US states. Requirements:
* Create a new variable that computes the number of new cases in the past week (HINT: use the `lag()` function you've used in a previous set of exercises). Replace missing values with 0's using `replace_na()`.
* Filter the data to omit rows where the cumulative case counts are less than 20.
* Create a static plot with cumulative cases on the x-axis and new cases in the past 7 days on the y-axis. Connect the points for each state over time. HINTS: use `geom_path()` and add a `group` aesthetic. Put the x and y axis on the log scale and make the tick labels look nice - `scales::comma` is one option. This plot will look pretty ugly as is.
* Animate the plot to reveal the pattern by date. Display the date as the subtitle. Add a leading point to each state's line (`geom_point()`) and add the state name as a label (`geom_text()` - you should look at the `check_overlap` argument).
* Use the `animate()` function to have 200 frames in your animation and make it 30 seconds long.
* Comment on what you observe.
```{r eval=FALSE}
covid19_plot <- covid19 %>%
group_by(state) %>%
mutate(seven_day = replace_na(lag(cases, 7, order_by = date), 0)) %>%
mutate(cases_in_the_past_week = (cases - seven_day)) %>%
filter(cases>=20) %>%
ggplot(aes(x=cases,y=cases_in_the_past_week,group=state))+
geom_path()+
geom_point(aes(label = state)) +
geom_text(aes(label = state), check_overlap = TRUE) +
scale_x_continuous(trans = "log10",labels = scales::comma)+
scale_y_continuous(trans = "log10",labels = scales::comma)+
labs(title = "Trajectory of US COVID-19 Confirmed Cases",
subtitle = "Date: {frame_along}",
x="Total COnfirmed Cases",
y="New Confirmed cases (in the Past Week)")+
transition_reveal(date)
animate(covid19_plot, nframes = 200, duration=30)
```
```{r, eval=FALSE, echo=FALSE}
anim_save("covid19_plot.gif")
```
```{r, echo=FALSE}
knitr::include_graphics("covid19_plot.gif")
```
From the graph we see that the rate of increase of cases increases very fast at first by slow down later.
7. In this exercise you will animate a map of the US, showing how cumulative COVID-19 cases per 10,000 residents has changed over time. This is similar to exercises 11 & 12 from the previous exercises, with the added animation! So, in the end, you should have something like the static map you made there, but animated over all the days. The code below gives the population estimates for each state and loads the `states_map` data. Here is a list of details you should include in the plot:
* Put date in the subtitle.
* Because there are so many dates, you are going to only do the animation for all Fridays. So, use `wday()` to create a day of week variable and filter to all the Fridays.
* Use the `animate()` function to make the animation 200 frames instead of the default 100 and to pause for 10 frames on the end frame.
* Use `group = date` in `aes()`.
* Comment on what you see.
%>%
ggplot(aes(fill = cases_per_10000)) +
scale_fill_gradient2() +
geom_map(aes(group=date, map_id = state), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
theme_map()+
labs(subtitle = "Date: {frame_along}")+
transition_reveal(date)
animate(covid19_map, nframes = 200, duration=30)
```{r eval=FALSE}
census_pop_est_2018 <- read_csv("https://www.dropbox.com/s/6txwv3b4ng7pepe/us_census_2018_state_pop_est.csv?dl=1") %>%
separate(state, into = c("dot","state"), extra = "merge") %>%
select(-dot) %>%
mutate(state = str_to_lower(state))
states_map <- map_data("state")
covid19_map <- covid19 %>%
mutate(state= tolower(state)) %>%
left_join(census_pop_est_2018) %>%
mutate(cases_per_10000 = (cases/est_pop_2018)*10000) %>%
mutate(day_of_week=wday(date)) %>%
filter(day_of_week==5) %>%
ggplot() +
scale_fill_viridis_c(option = "C")+
geom_map(aes(group=date, map_id = state, fill=cases_per_10000), map = states_map, color="white") +
expand_limits(x = states_map$long, y = states_map$lat) +
theme_map()+
labs(subtitle = "Date: {frame_time}")+
transition_time(date)
animate(covid19_map, nframes = 200,end_pause=10)
```
```{r, eval=FALSE, echo=FALSE}
anim_save("covid19_map.gif")
```
```{r, echo=FALSE}
knitr::include_graphics("covid19_map.gif")
```
From the map we can see that before October the cases per 10000 increase pretty slow, but the cases per 10000 increase dramatically afterwards.
## Your first `shiny` app (for next week!)
NOT DUE THIS WEEK! If any of you want to work ahead, this will be on next week's exercises.
8. This app will also use the COVID data. Make sure you load that data and all the libraries you need in the `app.R` file you create. Below, you will post a link to the app that you publish on shinyapps.io. You will create an app to compare states' cumulative number of COVID cases over time. The x-axis will be number of days since 20+ cases and the y-axis will be cumulative cases on the log scale (`scale_y_log10()`). We use number of days since 20+ cases on the x-axis so we can make better comparisons of the curve trajectories. You will have an input box where the user can choose which states to compare (`selectInput()`) and have a submit button to click once the user has chosen all states they're interested in comparing. The graph should display a different line for each state, with labels either on the graph or in a legend. Color can be used if needed.
## GitHub link
9. Below, provide a link to your GitHub page with this set of Weekly Exercises. Specifically, if the name of the file is 05_exercises.Rmd, provide a link to the 05_exercises.md file, which is the one that will be most readable on GitHub. If that file isn't very readable, then provide a link to your main GitHub page.
https://github.com/siwending/05_weekly_exercise/blob/master/05_exercises.md