-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lab4.rmd
198 lines (143 loc) · 6.07 KB
/
Lab4.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
---
title: "Lab 04 - Data Visualization"
output: html_document
author: Hanke Zheng
link-citations: yes
---
```{r setup, message=FALSE, echo=FALSE, warning=FALSE}
#install.packages(c("data.table","leaflet"))
library(data.table)
library(leaflet)
library(tidyverse)
```
# Learning Goals
- Read in and prepare the meteorological dataset
- Create several graphs with different `geoms()` in `ggplot2`
- Create a facet graph
- Conduct some customizations to the graphs
- Create a more detailed map using `leaflet()`
# Lab Description
We will again work with the meteorological data presented in lecture.
**The objective of the lab is to examine the association between monthly average dew point temperature and relative humidity in four regions of the US and by elevation.**
# Steps
### 1. Read in the data
First download and then read in with data.table:fread()
```{r, echo=TRUE, message=FALSE}
download.file("https://raw.githubusercontent.com/USCbiostats/data-science-data/master/02_met/met_all.gz", "met_all.gz", method="libcurl", timeout = 60)
met <- data.table::fread("met_all.gz")
```
### 2. Prepare the data
- Remove temperatures less than -17C
- Make sure there are no missing data in the key variables coded as 9999, 999, etc
- Take monthly averages by weather station
- Create a region variable for NW, SW, NE, SE based on lon = -98.00 and lat = 39.71 degrees
- Create a categorical variable for elevation as in the lecture slides
```{r}
met<-met[met$temp > -17]
met[met$elev==9999.0] <- NA
#na.rm=TRUE: remove the NA for calculation
met_avg<-met[,.(temp=mean(temp,na.rm=TRUE), rh=mean(rh,na.rm=TRUE),wind.sp=mean(wind.sp,na.rm=TRUE), dew.point=mean(dew.point, na.rm=TRUE), vis.dist=mean(vis.dist,na.rm=TRUE), lat=mean(lat), lon=mean(lon),
elev=mean(elev,na.rm=TRUE)), by=c("USAFID")]
met_avg$elev_cat <- ifelse(met_avg$elev> 252, "high", "low")
met_avg$region <- ifelse(met_avg$lon> -98 & met_avg$lat > 39.71, "north east",
ifelse(met_avg$lon> -98 & met_avg$lat < 39.71, "south east",
ifelse(met_avg$lon< -98 & met_avg$lat > 39.71, "north west", "south west")))
table(met_avg$region)
# met_avg$region <- ifelse(met_avg$lon> -98, "east", "west")
```
### 3. Use `geom_boxplot` to examine the dew point temperature and relative humidity by region
- Use facets
- Make sure to deal with `NA` categoryan
- Describe what you observe in the graph
```{r}
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot()+
geom_boxplot(mapping=aes(x=region, y=dew.point))
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot() +
geom_boxplot(mapping=aes(y=dew.point, fill=region)) +
facet_wrap(~region,nrow=2)
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot() +
geom_boxplot(mapping=aes(y=rh, fill=region)) +
facet_wrap(~region,nrow=2)
# 2 by 2
```
- Dew point tem is higher in the south east region
- Relative humidity is higher in south east region
### 4. Use `geom_point` with `stat_smooth` to examine the association between dew point temperature and relative humidity by region
- Colour points by region
- Make sure to deal with `NA` category
- Fit a linear regression line by region
- Describe what you observe in the graph
```{r}
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot(mapping=aes(x=dew.point, y=rh, color=region))+
geom_point()+
stat_smooth(method=lm)
# stat_smooth: linear regression line
```
- The association between dew point temp and relative humidity is positive in all region.
- The slope is greater in north west.
### 5. Use `geom_bar` to create barplots of the weather stations by elevation category coloured by region
- Bars by elevation category
- Change colours from the default. Colour by region using `scale_fill_brewer` see [this](http://rstudio-pubs-static.s3.amazonaws.com/5312_98fc1aba2d5740dd849a5ab797cc2c8d.html)
- Create nice labels on axes and add a title
- Describe what you observe in the graph
- Make sure to deal with NAs
```{r warning=FALSE, message=FALSE}
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot()+
geom_bar(mapping=aes(x=elev_cat,fill=region))+
# change the default palette
scale_fill_brewer(palette = "PuOr")+
labs(title="Number of weather stations by levation category and region", x="Elevation Category", y="Count")
theme()
```
### 6. Use `stat_summary` to examine mean dew point and relative humidity by region with standard deviation error bars
- Make sure to remove `NA`
- Use fun.data="mean_sdl" in `stat_summary`
- Describe the graph and what you observe
```{r}
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot(mapping = aes(x=region,y=dew.point))+
stat_summary(fun.data = "mean_sdl")
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot(mapping = aes(x=region,y=rh))+
stat_summary(fun.data = "mean_sdl")
```
- Dew point temperature is the highest and concentrated in south east whereas it's the lowest and more spread-out in north west.
- Relative humidity is higher and more concentrated in north east and south east while it's dryer in south west and north west.
### 7. Make a map showing the spatial trend in relative humidity in the US
- Make sure to remove `NA`
- Use leaflet()
- Make a colour palette with custom colours
- Add a legend
```{r}
met_avg2 <- met_avg [!is.na(rh)]
rh_pal=colorNumeric(c('blue', 'purple', 'red'), domain=met_avg2$rh)
leaflet(met_avg2) %>%
addProviderTiles('OpenStreetMap') %>%
addCircles(lat=~lat, lng=~lon, color=~rh_pal(rh),label=~paste0(round(rh,2),'rh'), opacity = 1,fillOpacity = 1, radius=500) %>%
addLegend('bottomleft', pal=rh_pal, values = met_avg$rh, title="Relative Humidity", opacity = 1)
```
- The relative humidity is increasing from west to east.
### 8. Use a ggplot extension
- Pick and extension (except cowplot) from [here](https://exts.ggplot2.tidyverse.org/gallery/) and make a plot of your choice using the met data (or met_avg)
- Might want to try examples that come with the extension first (e.g. ggtech, gganimate, ggforce)
```{r}
library(ggstance)
met_avg %>%
filter(!(region %in% NA)) %>%
ggplot(mapping = aes(y=region,x=dew.point, fill=elev_cat))+
geom_boxploth()
###write.csv(met_avg,"path/.csv")
###saveRDS(met_avg,"path/.rds")
```