-
Notifications
You must be signed in to change notification settings - Fork 5
/
02_add_data.Rmd
138 lines (104 loc) · 3.99 KB
/
02_add_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Adding data"
output:
html_document:
df_print: tibble
toc: yes
toc_float: yes
---
We have a (spatial) dataframe with the blank map-information, now we want to add our data-of-interest. Two steps:
1. Read in data-of-interst into R.
2. Join data-of-interest with the spatial dataframe.
```{r}
library(BelgiumMaps.StatBel)
library(tmap)
library(tmaptools)
library(sf)
library(dplyr)
library(readr)
library(readxl)
library(haven)
```
# Read in data-of-interest
Crucial: have a variable/column in your that contains the appropriate spatial identifier. E.g. NIS-code, NUTS-code, etc.
How to read in your data depends on the format, three recommended R-packages should cover mosts posibilities:
* [readr](https://readr.tidyverse.org/): read/write plain text formats such as CSV, TXT, etc.
* [readxl](https://readxl.tidyverse.org/): read in Excel-files (.xlsx, .xls).
* [haven](https://haven.tidyverse.org/): read/write datasets from SAS, SPSS, Stata.
```{r}
# read in Belfius socio-economic municipality typology
data_muni <- read_excel('data/muni_typology.xlsx') # Excel
data_muni <- read_sas('data/muni_typology.sas7bdat') # SAS
data_muni <- read_dta('data/muni_typology.dta') # Stata
data_muni <- read_sav('data/muni_typology.sav') # SPSS
# CSV
data_muni <- read_csv(
file = 'data/muni_typology.csv',
col_types = cols(.default = col_character())) # explicit: all strings
```
# Join spatial data and data-of interest
Recommended options:
1. general dataframe-[join functions from dplyr](https://stat545.com/bit001_dplyr-cheatsheet.html): `left_join()`.
2. map-specific [helper function from tmaptools](https://rdrr.io/cran/tmaptools/man/append_data.html): `append_data()`.
```{r, eval=FALSE}
# option 1 (dplyr):
library(dplyr)
data <- left_join(map_data, data_of_interest, by = "identifier")
data <- left_join(map_data, data_of_interest, by = c("map_identifier" = "data_identifier"))
```
```{r, eval = FALSE}
# option 2 (maptools):
library(tmaptools)
data <- append_data(map_data, data_of_interest,
key.shp = "map_identifier", key.data = "data_identifier")
```
# Examples
## Ex. Municipal socio-economic typology
```{r}
# load map data
data("BE_ADMIN_MUNTY")
map_muni <- st_as_sf(BE_ADMIN_MUNTY)
# load data-of-interest
data_muni <- read_csv('data/muni_typology.csv', col_types = cols(.default = col_character()))
# join with left_join()
muni <- left_join(map_muni, data_muni, by = c('CD_MUNTY_REFNIS' = 'gemeente_nis_code'))
```
```{r}
qtm(muni, fill = 'hoofdcluster_lbl', fill.title = 'Socio-economic cluster')
```
## Ex. Part-time workers in the EU
```{r}
# Read Eurostat data on percentage of part-time employment
worktime_data <- read_excel('data/eurostat_workingtime_2017.xlsx')
```
```{r, message=FALSE, warning=FALSE}
# alternatively, fetch this data directly:
library(eurostat)
worktime_data <- get_eurostat('lfsi_pt_a') %>%
filter(age == 'Y20-64',
worktime == 'TEMP',
sex == 'T',
time == '2017-01-01',
unit == 'PC_EMP')
```
```{r, message=FALSE, warning=FALSE}
# load EU NUTS0 (country) map data directly from Eurostat
map_data <- get_eurostat_geospatial(
resolution = "60", # detail
nuts_level = "0") # NUTS 0-3
# crop map data to "mainland" EU
map_data <- st_crop(map_data, c(xmin=-10, xmax=45, ymin=36, ymax=71))
```
```{r, message=FALSE, warning=FALSE}
# join map and workingtime data in one dataframe
worktime <- left_join(map_data, worktime_data, by = c('CNTR_CODE' = 'geo'))
```
```{r}
qtm(worktime, fill = 'values', fill.title = 'Percentage part-time')
```
**Tip**: Use the R [countrycode package](https://github.com/vincentarelbundock/countrycode#r-countrycode) to convert names, codes, etc. before merging. Contains 30+ different country coding schemes, and to 600+ variants of country names in different languages and formats.
```{r}
library(countrycode)
countrycode(worktime$NUTS_ID, 'eurostat', 'ecb') # official ECB code
countrycode(worktime$NUTS_ID, 'eurostat', 'un.name.fr') # full UN name in FR
```