-
Notifications
You must be signed in to change notification settings - Fork 7
/
uncf_hw_1.Rmd
162 lines (81 loc) · 4.7 KB
/
uncf_hw_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
title: "R Notebook - Module 1 Homework 1"
output:
html_document:
df_print: paged
---
Please follow the instructions "to install R" THEN "to install R studio" . You need them both on the following website. You do not need to install the SDSFoundations Package
There are instructions for both Mac and Windows users
[Instructions](https://courses.edx.org/courses/UTAustinX/UT.7.01x/3T2014/56c5437b88fa43cf828bff5371c6a924/)
When you are done, open R studio and you can start your homework.
# Module 1 Unit 1 Homework
Note, since you are running this on your personal computer and NOT the ADRF,
you will need to install packages beforehand. This is the case for module #1.
```{r, eval=FALSE}
install.packages(c("tidyverse", "car"))
```
When promoted with the following question, answer `"Yes" `
`Do you want to install from sources the package which needs compilation? (Yes/no/cancel) : Yes`
```{r}
library(tidyverse)
library(car)
```
In this workbook, we will be using 2019 American Community Survey Public Use Microdata (PUMS). These are public-use data sets containing information about responses to the Census Bureau American Community Survey.
These are public-use data sets containing information about responses to the Census Bureau American Community Survey. Information about the ACS project can be found at [https://www.census.gov/programs-surveys/acs](https://www.census.gov/programs-surveys/acs).
We will be using the ACS PUMS dataset for Texas and California in our examples in this workbook.
You can find more information about the ACS datasets [here](https://www.census.gov/programs-surveys/acs/microdata/access.2019.html) and the Codebook for the data [here](https://raw.githubusercontent.com/coreysparks/data/master/pums_vars.csv).
## Import census data
```{r}
census <- read_csv(url("https://raw.githubusercontent.com/coreysparks/r_courses/master/pums_tx_ca_2019.csv"), show_col_types = F)
```
Please review the website with [videos on module 1](https://ada.coleridgeinitiative.org/r-1 )
In the R Unit 1: Introduction to R and Data Frames the functions "glimpse" "head" and "spec" are used to characterize features and the dimensions of the dataframe.
Use the "glimpse" "head" and "spec" functions to answer the following questions about the census dataframe we generated above:
## Module 1 Checkpoint 1 Questions
### QUESTION 1
Use the "glimpse" function to determine the number of columns and rows in the census dataframe
```{r}
```
### QUESTION 2
Use the "spec" function to identify 3 character and 3 numeric variables in the census dataframe: col_character = character, col_double = numeric
```{r}
```
### QUESTION 3
Use the "head" function to visualize the top 6 rows of the census dataframe.
You will likely use this feature the most as you proceed with the analysis of your own data.
```{r}
```
## Module 1, Checkpoint 2 Questions -- SUBSETTING DATA
Subsetting a dataset is very important. As it pertains to your project, this would help you create a dataset that contained only HBCU graduates, rather than graduates from all types of institutions. This is useful as it takes less time to analyze smaller datasets, and the dataset you will be working with are VERY large.
### QUESTION 4
Indicate the line of code you would use to visualize ONLY the first three (3) variables, and ten (10) rows of the census dataframe
```{r}
```
### QUESTION 5
Indicate the line of code you would use to create a dataframe that contains only females (variable = SEX_label: Female = 2 and Male = 1) under the age (variable = AGEP) of 25.
How many rows and columns are in this dataframe?
```{r}
```
### QUESTION 6
Show the line of code you would use to create a dataframe that contains people between 18 AND and 65 years old.
```{r}
```
## Module 1, Checkpoint 3A Questions
## DESCRIBING DATA - SUMMARY STATISTICS
### QUESTION 7
Use the summary function to determine the mean age of people in the census dataset
```{r}
```
### QUESTION 8
Use the group by and summarize function to determine the number of individuals in the census dataframe by race. How many people in the census dataframe were two (2) or more races?
```{r}
```
## Module 1, Checkpoint 3B Questions -- Binning & Grouping Data
### QUESTION 9
Create a table that groups individuals in census dataframe by age group: young <=29, middle= 30-60 and old >60.
How many individuals are in the young age group?
```{r}
```
### Create a HTML file of your output and answers to the questions above using the Rstudio Knit function
![](C:/Users/ozd504/OneDrive - University of Texas at San Antonio/Pictures/Screenshot 2022-01-17 100514.png)
And choose Knit to HTML. R will create a html document with you answers to the questions above.