-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
268 lines (219 loc) · 11.1 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
---
title: "Association between COVID-19 and Races in The United States"
author: "Nicky Nie nq990124@gmail.com"
output:
html_document:
toc: TRUE
toc_float: TRUE
---
<br>
This is my PM566 Final Project website. I will showcase major finding in report here. Full report can be downloaded from the website.
<br>
```{r setup, message=FALSE, echo=FALSE, warning=FALSE, results='hide'}
options(scipen = 1, digits = 3)
library(data.table)
library(dplyr)
library(tidyverse)
library(ggforce)
library(leaflet)
library(plotly)
library(usmap)
# Initialize code chunk options
opts_chunk$set(
warning = FALSE,
message = FALSE,
eval=TRUE,
echo = TRUE,
cache = FALSE,
fig.width = 7,
fig.align = 'center',
fig.asp = 0.618,
out.width = "700px",
class.source = "code-r")
```
```{css, echo = FALSE}
.code-r { /* Code block */
font-size: 15px;
}
.code-r-small { /* Code block */
font-size: 10px;
}
```
<br>
# **Introduction**
America is a smelting furnace of different races. The racial diversity brings in communication between different cultures, but also results in conflicts between them. Racism has been a great problem in America for a long time. With the pandemic, the racism violence increased a lot. For example, COVID-19 was called "Chinese Virus" from some racists, and more people were hostile to all the Asians they saw in the street. I even saw some Asian YouTubers being splashed water and even being beaten when they were doing live streaming. This project aims to study in association between COVID-19 and races.
This dataset is a collaboration between the COVID Tracking Project and the Boston University Center for Antiracist Research, which shows the counts of cases/deaths of COVID-19 categorized by races for each state in United States. The data was collected from April 12, 2020 to March 07, 2021.
# **Method**
### Download data
I download data in csv form from The COVID Tracking Project https://covidtracking.com/race
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
race <- fread("data/CRDT.csv")
```
### Preprocessing data
```{r message=FALSE, echo=FALSE, warning=FALSE}
tab <- race[, .(
Total_objects = nrow(race),
NA_total = sum(is.na(race$Cases_Total)),
NA_white = sum(is.na(race$Cases_White)),
NA_black = sum(is.na(race$Cases_Black)),
NA_latinx = sum(is.na(race$Cases_Latinx)),
NA_asian = sum(is.na(race$Cases_Asian)),
NA_AIAN = sum(is.na(race$Cases_AIAN)),
NA_NHPI = sum(is.na(race$Cases_NHPI)),
NA_multi = sum(is.na(race$Cases_Multiracial))
)]
knitr::kable(tab)
```
Based on this table, I decided to take a close look at data of Total, White, Black and Asian since races like Latinx, AIAN, NHPI and multiracial have almost half of missing data, which is not suitable for further analysis. And for the three races categories left, I dropped all the NAs for further analysis.
I think racial inequality may contribute to the huge number of missing data in minority groups. Some states may think that few people care about data from minority groups and are not willing to devote much attention and labor force to collect data from them. Or the population of those minority groups in those states are too small, hence they ignored it.
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
# pick up interested variables
race_state <- unique(race[,.(Date,State,Cases_Total,Cases_White,Cases_Black,Cases_Asian,Deaths_Total,Deaths_White,Deaths_Black,Deaths_Asian)])
# since date here shows to be integer, I changed them into str
race_state$Date = as.Date(as.character(race$Date),"%Y%m%d")
# drop NAs
race_state <- race_state[!is.na(Cases_Total) & !is.na(Cases_White) & !is.na(Cases_Black) & !is.na(Cases_Asian) & !is.na(Deaths_Total) & !is.na(Deaths_White) & !is.na(Deaths_Black) & !is.na(Deaths_Asian)]
sum(is.na(race_state))
# Calculate total cases of different races based on date
race_country <- race_state[,.(
Case_total = sum(Cases_Total),
Case_white = sum(Cases_White),
Case_black = sum(Cases_Black),
Case_asian = sum(Cases_Asian),
Death_total = sum(Deaths_Total),
Death_white = sum(Deaths_White),
Death_black = sum(Deaths_Black),
Death_asian = sum(Deaths_Asian)
), by = Date]
# Calculate death rate
race_state [, DR_total := Deaths_Total/Cases_Total]
race_state [, DR_white := Deaths_White/Cases_White]
race_state [, DR_black := Deaths_Black/Cases_Black]
race_state [, DR_asian := Deaths_Asian/Cases_Asian]
race_country[, DR_total := Death_total/Case_total]
race_country[, DR_white := Death_white/Case_white]
race_country[, DR_black := Death_black/Case_black]
race_country[, DR_asian := Death_asian/Case_asian]
# Stick to CA data
CA <- filter(race_state, State == "CA")
# Race with highest DR in each state.
latest <- filter(race_state, Date == "2021-03-07")
latest <- latest[, highest_DR :=
fifelse(DR_white == max(DR_white,DR_black,DR_asian),"White",
fifelse(DR_black == max(DR_white,DR_black,DR_asian),"Black","Asian"
)), by = State]
```
Because of different population base in each state, the original counts of deaths and cases may be not very meaningful to analyze, so I choose to calculate death rate instead.
# **Preliminary Results**
```{r message=FALSE, echo=FALSE, warning=FALSE}
ggplot(data = race_country) +
geom_smooth(mapping = aes(x = Date, y = DR_white, color = "DR_white"))+
geom_smooth(mapping = aes(x = Date, y = DR_black, color = "DR_black"))+
geom_smooth(mapping = aes(x = Date, y = DR_asian, color = "DR_asian"))+
geom_smooth(mapping = aes(x = Date, y = DR_total,color = "DR_total"))+
labs(x = "Date", y = "Death Rate")+
labs(title = "COVID-19 Death rate of different races in United States")+
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=1))+
scale_x_date(date_breaks = "months" , date_labels = "%b-%y")
```
From this graph, we can see that the trend of all DR is gradually decreasing, the total death rate is always the lowest one since there are a lot of cases of other or unknown races which largely increase the denominator. And although death rate of white is higher than the other two races initially, they approach to the same level finally with a little bit of difference.
## Death rate in each state
```{r message=FALSE, echo=FALSE, warning=FALSE}
tab <- latest[, .(
State = State,
DR_total = DR_total,
DR_white = DR_white,
DR_black = DR_black,
DR_asian = DR_asian
)]
knitr::kable(tab)
```
Looking back to the table of death rate in each state, I find that Texas has a ridiculous result in death rate of white(0.7144365) death rate of black(0.3355429) and death rate of asian (0.9624866), which is abnormally high. So I think that there must be some problems in this data and hence I delete the data of Texas when I draw the map.
## Death rate map{.tabset}
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
set_map <- list(
scope = 'usa',
projection = list(type = 'conic')
)
latest$hover <- with(latest, paste('<br>', "Total death rate in ", State, '<br>', "is ", DR_total))
DR_total <- plot_geo(latest, locationmode = 'USA-states') %>%
add_trace(
z = ~DR_total, text=~hover, locations = ~State,
color=~DR_total, colors='Greens'
) %>%
colorbar(title = "Death Rate of COVID-19")%>%
layout(title="Total Death Rate of COVID-19 in Each State",
geo=set_map)
```
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
latest2 <- latest[!State=="TX"]
```
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
set_map <- list(
scope = 'usa',
projection = list(type = 'conic')
)
latest2$hover <- with(latest2, paste('<br>', "Death rate of White in ", State, '<br>', "is ", DR_white))
p1 <- plot_geo(latest2, locationmode = 'USA-states') %>%
add_trace(
z = ~DR_white, text=~hover, locations = ~State,
color=~DR_white, colors='Reds'
) %>%
colorbar(title = "Death Rate in Black of COVID-19")%>%
layout(title="Death Rate in White of COVID-19 in Each State",
geo=set_map)
```
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
set_map <- list(
scope = 'usa',
projection = list(type = 'conic')
)
latest2$hover <- with(latest2, paste('<br>', "Death rate of Black in ", State, '<br>', "is ", DR_black))
p2 <- plot_geo(latest2, locationmode = 'USA-states') %>%
add_trace(
z = ~DR_black, text=~hover, locations = ~State,
color=~DR_black, colors='Blues'
) %>%
colorbar(title = "Death Rate in Black of COVID-19")%>%
layout(title="Death Rate in Black of COVID-19 in Each State",
geo=set_map)
```
```{r message=FALSE, echo=FALSE, warning=FALSE, results='hide',fig.show='hide'}
set_map <- list(
scope = 'usa',
projection = list(type = 'conic')
)
latest2$hover <- with(latest2, paste('<br>', "Death rate of Asian in ", State, '<br>', "is ", DR_asian))
p3 <- plot_geo(latest2, locationmode = 'USA-states') %>%
add_trace(
z = ~DR_asian, text=~hover, locations = ~State,
color=~DR_asian, colors='Purples'
) %>%
colorbar(title = "Death Rate in Asian of COVID-19")%>%
layout(title="Death Rate in Asian of COVID-19 in Each State",
geo=set_map)
```
### Total Death Rate
```{r message=FALSE, echo=FALSE, warning=FALSE}
DR_total
```
### Death Rate White
```{r, message=FALSE, echo=FALSE, warning=FALSE}
p1
```
### Death Rate Black
```{r, message=FALSE, echo=FALSE, warning=FALSE}
p2
```
### Death Rate Asian
```{r, message=FALSE, echo=FALSE, warning=FALSE}
p3
```
## {-}
Generally it shows a radial pattern: the middle part has a lowest death rate and gradually become higher while approching to the coastline. And total death rate of the eastern and north-eastern area is highest. Relatively higher population and more samples of those states along the coastline may lead to greater burden of medical system, and finally lead to a higher death rate. Also, there are more international transportation for those states along the coastline, which may bring in more cases and deaths of COVID-19.
The patterns of death rate in those three maps with different races are similar to the pattern of total death rate. The difference such as high death rate of black in Michigan, high death rate of asian in California and Hawaii is caused by relatively higher population of such races in those states, which provides more data.
# **Conclusion**
Just from these data, I think there is no clear association between races and death rate of COVID-19. The death rate of all three races approach to the same level finally. In my opinion, the death rate is highly depend on the action of the State rather than races, for example, the burden of medical system, the policy to restrict COVID-19 etc.
I saw a lot of data showed a different result from these data, which indicated that death rate of black is highest among all races. That might be caused by inequality of medical distribution at the stage of onset of COVID-19. As the burden of medical resource relieved and government took action to restrict COVID-19 and gave supports to citizens, I believe that death rate that approaches to the same level is a right trend.
<br>
<br>