-
Notifications
You must be signed in to change notification settings - Fork 16
/
04-Transform-data.Rmd
144 lines (97 loc) · 2.82 KB
/
04-Transform-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: "Transform Data"
output: html_document
---
<!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
```{r setup}
library(tidyverse)
library(gapminder)
# Toy dataset to use
pollution <- tribble(
~city, ~size, ~amount,
"New York", "large", 23,
"New York", "small", 14,
"London", "large", 22,
"London", "small", 16,
"Beijing", "large", 121,
"Beijing", "small", 56
)
```
## gapminder
```{r}
gapminder
```
## Your Turn 1
* `filter()` selects rows
* logical tests
See if you can use the logical operators to manipulate our code below to show:
The data for Canada
```{r}
filter(gapminder, country == "New Zealand")
```
All data for countries in Oceania
```{r}
filter(gapminder, country == "New Zealand")
```
Rows where the life expectancy is greater than 82
```{r}
filter(gapminder, country == "New Zealand")
```
## Your Turn 2
Use Boolean operators to alter the code below to return only the rows that contain:
* Canada before 1970
```{r}
filter(gapminder, country == "New Zealand", year > 2000)
```
* Countries where life expectancy in 2007 is below 50
```{r}
filter(gapminder, country == "New Zealand", year > 2000)
```
* Countries where life expectancy in 2007 is below 50, and are not in Africa.
```{r}
filter(gapminder, country == "New Zealand", year > 2000)
```
## Your Turn 3
Alter the code to:
* Add an africa column, which contains TRUE is the country is on the Africa continent.
* Add a rank_pop column to rank each row in gapminder from largest pop to smallest pop.
```{r}
mutate(gapminder)
```
## Your Turn 4
Use summarise() to compute three statistics about the data:
* The first (minimum) year in the dataset
* The last (maximum) year in the dataset
* The number of countries represented in the data (Hint: use cheatsheet)
```{r}
gapminder
```
## Your Turn 5
Extract the rows where continent == "Africa" and year == 2007.
Then use summarise() and summary functions to find:
1. The number of unique countries
2. The median life expectancy
```{r}
gapminder
```
```{r}
gapminder
```
## Your Turn 6
Find the median life expectancy by continent
```{r}
gapminder
```
## Your Turn 7
Brainstorm with your neighbor the sequence of operations to find: the country with biggest jump in life expectancy (between any two consecutive records) for each continent.
## Your Turn 8
Find the country with biggest jump in life expectancy (between any two consecutive records) for each continent.
```{r}
```
***
# Take aways
* Extract cases with `filter()`
* Make new variables, with `mutate()`
* Make tables of summaries with `summarise()`
* Do groupwise operations with `group_by()`
* Connect operations with `%>%`