-
Notifications
You must be signed in to change notification settings - Fork 0
/
tidycensus_tutorial.Rmd
620 lines (495 loc) · 30.2 KB
/
tidycensus_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
---
title: "tidycensus Tutorial"
author: "Alex Brasch"
date: "10/11/2020"
output:
html_document:
includes:
before_body: ./header.html
after_body: ./footer.html
code_folding: show
highlight: zenburn
self_contained: yes
theme: darkly
toc: yes
toc_depth: '2'
toc_float:
collapsed: yes
toc_float: yes
pdf_document:
toc: yes
toc_depth: '2'
word_document:
toc: yes
toc_depth: '2'
editor_options:
chunk_output_type: console
always_allow_html: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
```
```{r echo=FALSE, warning=FALSE, error=FALSE, results='hide', message=FALSE}
# Require the pacman package to easily load all necessary packages
if(!require(pacman)){install.packages("pacman");library(pacman)}
suppressPackageStartupMessages(p_load(
tidyverse,
tidycensus,
tigris,
sf,
mapview,
leaflet,
kableExtra,
extrafont))
```
# Reference & Startup
The documentation and examples within this tutorial were gleaned from the following resources:
[tidycensus](https://walkerke.github.io/tidycensus/ "tidycensus")
[tigris](https://www.rdocumentation.org/packages/tigris/versions/1.0 "tigris")
[tidyverse](https://www.tidyverse.org "tidyverse")
[Census Developers](https://www.census.gov/developers/ "Census Developers")
[Census Geography Program](https://www.census.gov/programs-surveys/geography.html "Census Geography Program")
[Leaflet for R](https://rstudio.github.io/leaflet/ "Leaflet for R")
`tidycensus` is an R package that allows users to interface with the U.S. Census Bureau’s decennial census and American Community Survey (ACS) APIs, in order to retrieve demographic and economic data for specified geographies. As noted by its author, Kyle Walker, `tidycensus` "returns tidyverse-ready data frames of selected variables, and the option to include simple feature (sf) geometries".
The main function of the decennial census is to provide counts of people for the purpose of congressional apportionment and federal funding allocation, while the primary purpose of the ACS is to measure the changing social and economic characteristics of the U.S. population, including education, housing, jobs, and more. Due to their differing purposes, variables that exist in the decennial census may not be included in the ACS and vice versa. The same is true between ACS years, due to the evolving nature of the surveys (i.e., questions may be added, removed, or revised).
The decennial census is an enumeration, meaning it aims to count the entire population of the country (at the location where each person usually lives). The decennial census asks a relatively small set of questions of people in homes and group living situations, including how many people live or stay in each home, as well as the sex, age, and race of each person.
ACS data differ from decennial census data in that ACS data are based on an annual sample of households, rather than a complete enumeration. In turn, ACS data points are estimates characterized by a margin of error (MOE). `tidycensus` will always return the estimate and MOE for any requested variables. When requesting ACS data with `tidycensus`, it is not necessary to specify the "E" or "M" suffix for a variable name. Available survey types include ACS 5-year estimates (`acs5`) or ACS 1-year estimates (`acs1`). Note that the latter is only available for geographies with populations of 65,000 and greater.
## Install and load packages
To get started, load the `tidycensus` and `tidyverse` packages. Additional packages used within this RMarkdown file include tigris, readxl, writexl, arcgisbinding, leaflet, kableExtra, janitor, and extrafont.
A Census API key is also required, which can be obtained from http://api.census.gov/data/key_signup.html. Entry of the key using the `census_api_key` function only needs to occur once (i.e., it is tied to RStudio, rather than a single R script or Markdown file).
```{r Install}
# Uncomment as-needed
# install.packages("tidyverse")
# install.packages("tidycensus")
# library(tidyverse)
# library(tidycensus)
# census_api_key("YOUR API KEY GOES HERE")
```
# Usage
The following section contains examples of using the `tidycensus` package and Census API to retrieve, prepare, reshape, and blend demographic data for analysis and visualization. To save on processing time and avoid local memory limitations, data-intensive code chunks have been commented out (e.g., retrieving large amounts of census blocks using the `tidycensus` or `tigris` packages). Readers can view the underlying code while the input data is read-in as part of the R Project's local data.
## Review Variables
```{r Load variables}
# Load the 2010 decennial census variables
DC2010_sf1 <- load_variables(year = 2010, dataset = "sf1", cache = TRUE)
# Load the 2014-2018 5-year ACS variables
ACS2018_acs5 <- load_variables(year = 2018, dataset = "acs5", cache = TRUE)
```
```{r View variable tables, class.source='fold-hide'}
# View a portion of the 2010 decennial census table
DC2010_sf1 %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
# View a portion of the 2018 ACS table
ACS2018_acs5 %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Example 1 - Decennial Census
Retrieve the 2010 decennial census total population for all U.S. states.
- Define a geography (e.g., states, counties, tracts)
- Define a single variable
```{r Example 1 - Decennial Census, results='hide', message = FALSE, warning = FALSE}
# Retrieve the 2010 decennial census total population for all U.S. states.
st_2010_dc <- tidycensus::get_decennial( # API call
geography = 'state', # Specify geography
variables = 'P001001', # Specify variable
year = 2010, # Specify year
geometry = FALSE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
```{r View Example 1 - Decennial Census, class.source='fold-hide'}
# View data
st_2010_dc %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Example 2 - Decennial Census
Retrieve 2010 decennial census variables and geometries for a select set of a specified geography.
- Define a geography and specify a subset
- Define and name multiple variables
- Retrieve geometries
```{r Example 2 - Decennial Census, message=FALSE, warning=FALSE, results='hide'}
# Retrieve 2010 decennial census variables and geometries for a select set of a specified geography.
co_2010_dc <- tidycensus::get_decennial( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
variables = c(pop_tot = 'P001001', # Specify and name multiple variables
pop_sex_m_tot = 'P012002',
pop_sex_f_tot = 'P012026'),
year = 2010, # Specify year
geometry = TRUE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
```{r View Example 2 - Decennial Census, class.source='fold-hide'}
# View data
co_2010_dc %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Example 3 - ACS
Retrieve 2018 ACS 5-year variables and geometries for a select set of a specified geography.
- Define a geography and specify a subset
- Define and name multiple variables
- Retrieve geometries, specifically the TIGER/Line shapefiles
Concerning geometries, `tidycensus` used the geographic coordinate system NAD 1983 (EPSG: 4269), which is the default for Census spatial data files. `tidycensus` uses the Census cartographic boundary shapefiles for faster processing; if you prefer the TIGER/Line shapefiles (i.e., Topologically Integrated Geographic Encoding and Referencing), set `cb = FALSE` in the function call. Per Census documentation, the cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s Master Address File (MAF)/TIGER geographic database. These boundary files are specifically designed for small scale thematic mapping. When possible, generalization is performed with intent to maintain the hierarchical relationships among geographies and to maintain the alignment of geographies within a file set for a given year. To improve the appearance of shapes, areas are represented with fewer vertices than detailed TIGER/Line equivalents. Some small holes or discontiguous parts of areas are not included in generalized files. Generalized boundary files are clipped to a simplified version of the U.S. outline. As a result, some off-shore areas may be excluded from the generalized files. Consult this [TIGER Data Products Guide](https://www.census.gov/programs-surveys/geography/guidance/tiger-data-products-guide.html "TIGER Data Products Guide") to determine which file type is best for your purposes.
```{r Example 3 - ACS, results='hide', message = FALSE, warning = FALSE}
# Retrieve 2018 ACS 5-year variables and geometries for a select set of a specified geography.
co_2018_acs5 <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
hh_foodst_total = 'B22003_001',
hh_foodst_rec = 'B22003_002'),
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = TRUE, # Include or omit spatial geometry
cb = FALSE, # Specify cartographic boundary files or TIGER/Line shapefiles
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
```{r View Example 3 - ACS, class.source='fold-hide'}
# View data
co_2018_acs5 %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
Concerning the structure of the data frame, a wide format contains a single row for each observation with many columns representing all variables (human-readable), while a long/tidy format contains many rows per observation (assuming more than one variable) with name-value pairs for each variable and associated value (machine-readable. For more details, see Hadley Wickham's seminal paper [Tidy Data](https://vita.had.co.nz/papers/tidy-data.pdf "Tidy Data").
## Example 4 - ACS
Retrieve 2015 ACS 1-year variables and geometries for a select set of a specified geography.
- Define a geography and specify a subset
- Define and name multiple variables
- Retrieve geometries, specifically the cartographic boundaries
- Output in long/tidy format
```{r Example 4 - ACS, results='hide', message = FALSE, warning = FALSE}
# Retrieve 2015 ACS 1-year variables and geometries for a select set of a specified geography.
co_2015_acs1 <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
hh_foodst_total = 'B22003_001',
hh_foodst_rec = 'B22003_002'),
year = 2015, # Specify year
survey = 'acs1', # Specify survey type
geometry = T, # Include or omit spatial geometry
# By default, cartographic boundary files are used
output = 'tidy', # Set to wide or tidy/long format
cache_table = T # Cache the table so it can be called quicker in the future, or not to save memory
)
```
Compare the structure of the data sets.
```{r View Example 4 - ACS, class.source='fold-hide'}
# View the 2018 ACS 5-year table in wide format
co_2018_acs5 %>% st_set_geometry(NULL) %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
# View the 2015 ACS 1-year table in long/tidy format
co_2015_acs1 %>% st_set_geometry(NULL) %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Example 5 - ACS
Retrieve 2018 ACS 5-year variables and geometries for a geography within a larger geography (e.g., all counties within a state).
- Create a vector of all members of a specified geography (e.g., all counties in Oregon)
- Define and name multiple variables
- Retrieve geometries, specifically the cartographic boundaries
```{r Example 5 - ACS, results='hide', message = FALSE, warning = FALSE}
# Retrieve 2018 ACS 5-year variables and geometries for all members of a specified geography within a larger geography (e.g., all counties within a state).
county_vector <- tidycensus::fips_codes %>% # Retrieve table of all state/county names and associated FIPS codes
filter(state_name == "Oregon") %>% # Specify state
select(county_code, county) # Maintain only the county code and name variables
# Retrieve variables and geometries for all counties within the vector
co_OR_2018_acs5 <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = county_vector$county_code, # Vector of all counties
variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
hh_foodst_total = 'B22003_001',
hh_foodst_rec = 'B22003_002'),
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = TRUE, # Include or omit spatial geometry
cb = TRUE, # Specify cartographic boundary files or TIGER/Line shapefiles
output = 'tidy', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
```{r View Example 5 - ACS, class.source='fold-hide'}
# View data
co_OR_2018_acs5 %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Example 6 - ACS
Retrieve a complete table of 2018 ACS 5-year variables and geometries for a select set of a specified geography.
- Define a geography and specify a subset
- Define a table of variables
- Retrieve geometries, specifically the TIGER/Line shapefiles
```{r Example 6 - ACS, results='hide', message = FALSE, warning = FALSE}
# Retrieve a complete table of 2018 ACS 5-year variables and geometries for a select set of a specified geography.
co_2018_acs5_income <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
table = "B19001",
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = TRUE, # Include or omit spatial geometry
cb = FALSE, # Specify cartographic boundary files or TIGER/Line shapefiles
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
## Example 7 - ACS
At a time in 2019, retrieving all decennial census block group data for a specified state or county generates an [error](https://github.com/walkerke/tidycensus/issues/193 "error"). This has since been resolved, but it provides a good example of how smaller nested geographies can be aggregated to larger geographies (e.g., blocks to block groups).
**This works now...**
Retrieve block group data for an entire county.
```{r Example 7 - ACS chunk 1, results='hide', message = FALSE, warning = FALSE}
bg_2010_dc <- tidycensus::get_decennial( # API call
geography = 'block group', # Specify geography for which to retrieve data
state = 'WA', # Specify state abbreviation
county = 'King', # Specify county
variables = 'P001001', # Specify variable
year = 2010, # Specify year
geometry = FALSE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
**But if it didn't...**
Retrieve block data for an entire county.
```{r Example 7 - ACS chunk 2, results='hide', message = FALSE, warning = FALSE}
# Retrieve decennial census variables for all blocks
bl_2010_dc <- tidycensus::get_decennial(
geography = 'block', # Specify statistical area for which to retrieve data
state = 'WA', # Specify state abbreviation
county = 'King', # Specify county
variables = 'P001001',
year = 2010, # Specify year
geometry = FALSE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
```
Aggregate data to block groups via creation of the block group GEOID by removing the last 3 characters in the block GEOID and grouping by/summarizing to block groups.
```{r Example 7 - ACS chunk 3, results='hide', message = FALSE, warning = FALSE}
bl_2010_dc_to_bg <- bl_2010_dc %>%
mutate(GEOID_BG = stringr::str_sub(GEOID, 1, -4)) %>% # Create BG GEOID
select(GEOID_BG, 1:length(.), -GEOID, -NAME) %>% # Reorder/remove columns
rename(GEOID = GEOID_BG) %>% # Rename GEOID
gather(variable, est, 2:length(.)) %>% # Reshape from wide to long form by creating name-value pairs for all variables
group_by(GEOID, variable) %>% # Group by GEOID and variable names
summarize(est = sum(est, na.rm = T)) %>% # Sum values per group
ungroup() %>% # Ungroup
spread(variable, est) # Reshape from long to wide form by extending values into their own columns
```
Compare the data sets.
```{r Example 7 - ACS chunk 4, }
bgbl_2010_dc <- bg_2010_dc %>%
inner_join(bl_2010_dc_to_bg, by = 'GEOID')
```
```{r View Example 7, class.source='fold-hide'}
# View the comparison data set
bgbl_2010_dc %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Example 8 - ACS and tigris
In some cases, you may want retrieve the tabular and spatial data separately (to avoid very large data sets during analysis) and join the data sets after analysis. In those cases, `tidycensus` can be used in combination with `tigris`, which is an R package that allows users to directly download and use TIGER/Line shapefiles from the US Census Bureau.
Retrieve geometries for a specified geography for a geography within a larger geography (e.g., all tracts within a county).
- Create a vector of variables
- Create a vector of geographies
- Use `tidyensus` to retrieve tabular data
- Use `tigris` to retrieve spatial geometries
- Join the tabular and spatial data
Retrieve variables for a specified geography via the `tidycensus` package.
```{r Example 8 - ACS and tigris, results='hide', message = FALSE, warning = FALSE}
# Create a vector of variables for use in the `tidycensus` call.
DC2010_sf1_var <- c(pop_sex_total = 'P012001',
pop_sex_m_tot = 'P012002',
pop_sex_m_00_04 = 'P012003',
pop_sex_m_05_09 = 'P012004',
pop_sex_m_10_14 = 'P012005',
pop_sex_m_15_17 = 'P012006',
pop_sex_m_18_19 = 'P012007',
pop_sex_f_tot = 'P012026',
pop_sex_f_00_04 = 'P012027',
pop_sex_f_05_09 = 'P012028',
pop_sex_f_10_14 = 'P012029',
pop_sex_f_15_17 = 'P012030',
pop_sex_f_18_19 = 'P012031')
# Retrieve variables for specified geography
tr_2010_dc <- tidycensus::get_decennial( # API call
geography = 'tract', # Specify geography
variables = DC2010_sf1_var, # Vector of variables
state = 'OR', # Specify state abbreviation
county = county_vector$county_code, # Vector of all counties
year = 2010, # Specify year
geometry = F, # Include or omit spatial geometry
output = 'wide', # Set to wide or long/tidy format
cache_table = T # Cache the table so it can be called quicker in the future
)
```
Retrieve geometries for a specified geography via the `tigris` package.
By default `tigris` retrieves the most recent vintage of a data set, so specify a different year if-needed. The default coordinate reference system (CRS) is NAD83 (EPSG 4269 https://spatialreference.org/ref/epsg/nad83/). The default type of geometry is TIGER/Line file. If cb is set to TRUE, `tigris` will download a generalized (1:500k) set of geometries.
```{r tigris call, results='hide', message = FALSE, warning = FALSE}
# A call of gc causes a garbage collection to take place. The primary purpose of calling gc is for the report on memory usage. Use this before and after a data-heavy processing task.
gc()
tr_2010_dc_shp <- tigris::tracts("OR", year = 2010, cb = FALSE) # Specify state abbreviation, year, and geometry type
gc()
```
Join the variables to the geometries.
```{r tigris tidycensus join, results='hide', message = FALSE, warning = FALSE}
tr_2010_dc_geo <- tr_2010_dc_shp %>%
inner_join(tr_2010_dc, by = c("GEOID10" = "GEOID")) # Join based on GEOID
class(tr_2010_dc_geo) # Note the object's class
st_crs(tr_2010_dc_geo) # Note the object's coordinate system
```
Note that the resulting object's class is dependent on the join order. The left side's class takes priority; therefore, in the above, the attributes (right side) are being joined to the geometries (left side), so the resulting object class is `sf` (simple feature). If the order is flipped (below) and the geometries (right side) are joined to the attributes (let side), the object class is not `sf`. To make it so, add `%>% st_as_sf()`
```{r tigris tidycensus join reverse, results='hide', message = FALSE, warning = FALSE}
tr_2010_dc_geo_REVERSE1 <- tr_2010_dc %>%
inner_join(tr_2010_dc_shp, by = c("GEOID" = "GEOID10"))
class(tr_2010_dc_geo_REVERSE1)
st_crs(tr_2010_dc_geo_REVERSE1)
tr_2010_dc_geo_REVERSE2 <- tr_2010_dc %>%
inner_join(tr_2010_dc_shp, by = c("GEOID" = "GEOID10")) %>%
st_as_sf()
class(tr_2010_dc_geo_REVERSE2)
st_crs(tr_2010_dc_geo_REVERSE2)
```
# Visualization
The following visualizations are created using the `ggplot2` package (which is part of the `tidyverse`), `mapview` package, and `leaflet` for R package.
## Dot Plot
Create a plot of 2010 decennial census state populations.
```{r Dot Plot, results='hide', message = FALSE, warning = FALSE}
st_2010_dc %>%
ggplot(aes(x = P001001, y = reorder(NAME, P001001,))) + # Set the x-axis to the variable and the y-axis to the observation (sort in descending order)
ggtitle("2010 State Population") + # Add a title
xlab("Total Population") + # Name the x-axis
ylab("State Name") + # Name the y-axis
geom_point() # Add point geometry
```
## Dot Plot MOEs
Create a plot of 2014-2018 ACS 5-year estimates and MOEs for all Oregon counties.
```{r Dot Plot MOEs, results='hide', message = FALSE, warning = FALSE}
co_OR_2018_acs5 %>%
mutate(NAME = gsub(" County, Oregon", "", NAME)) %>% # Rename unnecessary portion of county name
filter(variable == 'hh_medinc_total') %>% # Query a single variable
ggplot(aes(x = estimate, y = reorder(NAME, estimate))) + # Set the x-axis to the variable and the y-axis to the observation (and sort)
geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe)) + # Create error bars
geom_point(color = "red", size = 3) + # Symbolize the points
labs(title = "Median Household Income by County", # Add title
subtitle = "2014-2018 (5-year) American Community Survey", # Add subtitle
y = "County Name", # Label y-axis
x = "ACS estimate (bars represent margin of error)") # Label x-axis
```
## Choropleth Map
Create a choropleth map of a single variable for a geography within a single county.
```{r Choropleth Map, results='hide', message = FALSE, warning = FALSE}
# Retrieve variables and geographies for census block groups in Multnomah County, Oregon
bg_Mult_2018_acs5 <- tidycensus::get_acs( # API call
geography = 'block group', # Specify geography
state = 'OR', # Specify state name
county = c('Multnomah'), # Specify county name
variables = c(hh_medinc_total = 'B19013_001'),
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = T, # Include or omit spatial geometry
cb = FALSE, # Use TIGER
output = 'tidy', # Set to wide or tidy/long format
cache_table = T # Cache the table so it can be called quicker in the future, or not to save memory
)
# Create choropleth map
bg_Mult_2018_acs5 %>%
ggplot(aes(fill = estimate)) + # Specify the aesthetic as fill based on the estimate values
geom_sf(color = NA) + # Specify outline color
coord_sf(crs = 26910) + # Specify coordinate system/projection (e.g., UTM10N)
ggtitle("Multnomah County Median Household Income", subtitle = "2014-2018 (5-year) ACS Data") + # Specify title and subtitle
scale_fill_viridis_c(option = "magma") # Specify fill symbology
```
## Faceted Choropleth Map
Created faceted choropleths maps of multiple variables across geographies within a single county.
As mentioned by Kyle Walker in his `tidycensus` tutorial, "one of the most powerful features of `ggplot2` is its support for small multiples, which works very well with the tidy data format returned by `tidycensus`. Many Census and ACS variables return _counts_, which are generally inappropriate for choropleth mapping. In turn, `get_decennial` and `get_acs` have an optional argument, `summary_var`, that can work as a multi-group denominator when appropriate." For example, view the racial/ethnic population distribution within a given county.
```{r Faceted Choropleth Map, results='hide', message = FALSE, warning = FALSE}
# Create a vector of variables
DC2010_sf1_race_vars <- c(White = "P005003",
Black = "P005004",
Asian = "P005006",
Hispanic = "P004003")
# Retrieve variables and geographies for census block groups in Multnomah County, Oregon
bg_Mult_2010_dc <- tidycensus::get_decennial( # API call
geography = 'tract', # Specify geography
state = 'OR', # Specify state name
county = c('Multnomah'), # Specify county name
variables = DC2010_sf1_race_vars, # Specify the vector of variables
year = 2010, # Specify year
geometry = T, # Include or omit spatial geometry
output = 'tidy', # Set to wide or tidy/long format
summary_var = "P001001" # Specify summary variable (e.g., total population)
)
bg_Mult_2010_dc %>%
mutate(pct = 100 * (value / summary_value)) %>% # Calculate the race/ethnicity proportion of the total population
ggplot(aes(fill = pct)) + # Specify the aesthetic as fill based on the proportion values
facet_wrap(~variable) + # Facet wrap on race/ethnicity
geom_sf(color = NA) + # Specify outline color
coord_sf(crs = 26910) + # Specify coordinate system/projection (e.g., UTM10N)
ggtitle("Multnomah County", subtitle = "Population by Race/Ethnicity") + # Specify title and subtitle
scale_fill_viridis_c() # Specify fill symbology
```
## mapview
Create an unformatted, interactive map using the `mapview` package.
```{r mapview, message = FALSE, warning = FALSE}
mapview(co_2018_acs5)
```
## leaflet
Create an interactive map using `leaflet`.
```{r leaflet, message = FALSE, warning = FALSE}
# Use leaflet's color ramp creator
medinc_colors <- leaflet::colorBin(
palette = 'viridis', # This will accept any ColorBrewer color pallette
domain = bg_Mult_2018_acs5$estimate, # Specify the variable
)
leaflet(bg_Mult_2018_acs5) %>%
addProviderTiles(providers$OpenStreetMap, group = 'Open Street Map') %>%
addProviderTiles(providers$Stamen.Toner, group = 'Stamen Toner') %>%
addProviderTiles(providers$CartoDB.Positron, group = 'Carto DB') %>%
addProviderTiles(providers$Esri.NatGeoWorldMap, group = 'Esri NatGeo') %>%
addPolygons(
weight = 1, # Specify outline width
color = 'black', # Specify outline color
fillColor = medinc_colors(bg_Mult_2018_acs5$estimate), # Use created color ramp
fillOpacity = 0.7, # Specify fill opacity
group = "Multnomah Co. Block Groups (2018 ACS 5-year)", # Specify legend item name
popup = paste0( # Create pop-up
"<b>2018 ACS 5-year Block Group</b> ",
paste0("<br><b>Median HH Income Estimate</b>: ", prettyNum(bg_Mult_2018_acs5$estimate, ','), "<br><b>Median HH Income MOE</b>: ", prettyNum(bg_Mult_2018_acs5$moe, ','))
)
) %>%
addLegend("bottomright",
pal = medinc_colors,
values = bg_Mult_2018_acs5$estimate,
title = "Median Household Income",
opacity = 1) %>%
# Layers control
addLayersControl(
overlayGroups = c("Multnomah Co. Block Groups (2018 ACS 5-year)"),
baseGroups = c('Open Street Map', 'Stamen Toner', 'Carto DB', 'Esri NatGeo'),
options = layersControlOptions(collapsed = FALSE)
)
```
# Output & Input
Write-out data to various file formats.
```{r Output, warning=FALSE, error=FALSE, message=FALSE, results='hide'}
# readr::write_csv(DC2010_sf1, "./Data/DC2010_sf1.csv")
# library(writexl)
# write_xlsx(st_2010_dc, "./Data/st_2010_dc.xlsx")
# saveRDS(bg_Mult_2018_acs5,"./Data/bg_Mult_2018_acs5.rds")
# sf::st_write(bg_Mult_2010_dc,"./Data/bg_Mult_2010_dc.shp", delete_dsn = TRUE)
# library(arcgisbinding)
# arc.check_product() # Initialize connection to ArcGIS
# arc.write(path = "./Data/Data.gdb/co_OR_2018_acs5", data = co_OR_2018_acs5, shape_info = list(type = 'Polygon', WKT = 'GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'), overwrite = TRUE)
```
Read-in previously retrieved data in various file formats to avoid duplicative calls to Census APIs.
```{r Input, warning=FALSE, error=FALSE, message=FALSE, results='hide'}
# readin_csv <- readr::read_csv("./Data/DC2010_sf1.csv")
# library(readxl)
# readin_xlsx <- read_excel("./Data/st_2010_dc.xlsx", sheet = "sheet_name")
# readin_rds <- readRDS("./Data/bg_Mult_2018_acs5.rds")
# readin_shp <- sf::st_read("./Data/bg_Mult_2010_dc.shp")
# library(arcgisbinding)
# arc.check_product() # Initialize connection to ArcGIS
# arc.open reads in the feature class as an object with "formal class arc.feature_impl"
# arc.select converts the data set into a "arc.data" / "data.frame" object
# arc.data2sf converts the data set into an equivalent sf object
# readin_fc <- arc.open("./Data/Data.gdb/co_OR_2018_acs5") %>% arc.select() %>% arc.data2sf()
```