-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathregimes.Rmd
210 lines (139 loc) · 7.04 KB
/
regimes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Regimes"
---
# Overview
In this notebook, we will analyze data that answers the question "What regime is associated with the most pollution?". In this case, when we say regime, we mean varieties in time and location that are associated with different airport activity. For instance, SN45 is situated close to the northern runway at Logan. That runway has periods when it is being used, and when it is not being used; thus, the sensor data during those two different regimes could result in different sensor readings.
Another way of phrasing this analysis is "how does wind speed and direction affect sensor readings?" and "how do LTOs affect sensor readings?" We'll be answering this question using the regimes mentioned above, and we'll analyze different regimes by generating boxplots.
# Importing
## Import Necessary Libraries
```{r}
library(lubridate)
library(openair)
library(data.table)
library(dplyr)
library(baseline)
library(Rmisc)
library(ggplot2)
```
## Import Data
Since the data's dates will need to be formatted when imported, we'll call the format_date function below:
```{r}
format_dates <- function(sensor){
sensor$date <- ymd_hms(sensor$date, tz = "UTC")
sensor$date_local <- ymd_hms(sensor$date_local, tz = "America/New_York")
return(sensor)
}
```
```{r}
#importing snfiles
temp2 = list.files(path = "./data/final/", pattern = "*before.csv", full.names = TRUE) #specify files we're looking for
snfiles = sapply(temp2, function(x) fread(file=x, data.table = TRUE) , simplify = FALSE,USE.NAMES = TRUE) #import them in
names(snfiles) <- lapply(names(snfiles), function(x) substr(x, 3, 6)) #change the names
#reformatting the dates
snfiles <- lapply(snfiles, function(x) format_dates(x))
```
# Wind Speed and Direction
To test how wind speed and direction affect sensor readings, we'll create a set of functions that allow the user to easily input which wind directions and wind speeds they want to check, then plot them on a box plot.
We'll walk through these functions and what they do by using SN45 as an example.
## Create a regimes vector
First, we're going to create an additional column in the dataset, that states which regime each data row is in, given some conditions. The user chooses what the regimes are, and how they are defined.
In order to do this, first we'll start by just creating a column of NAs in SN45.
```{r}
sn45 <- snfiles$sn45 # create sn45 vector
sn45$regime <- NA # create column of NAs
```
Next, we'll populate that column based on some set of conditions. In this case, I'm using one of the regime definitions defined in the Regimes Description spreadhseet, on the Eastie Google Drive.
```{r}
sn45$regime[which(sn45$ws >= 2.5 & sn45$wd> 140 & sn45$wd < 270)] <- "140 - 270 and higher speeds"
sn45$regime[which(sn45$ws >= 2.5 & sn45$wd> 140 & sn45$wd < 180)] <- "140 - 180 downwind and higher speeds"
sn45$regime[which(sn45$ws < 2.5)] <- "downwind and lower speeds"
sn45$regime[which(is.na(sn45$regime))] <- "upwind"
```
We can see what this looks like:
```{r}
View(sn45$regime)
```
Now, we'll create a generalized function out of the above code. This code assumes that there are two regimes which have different wind directions but wind speeds higher than a certain value, one regime that has a lower wind speed, and one other regime that encompasses the rest.
```{r}
create_regime_vector <- function(dataset, ws_val, wdlist, stringlist){
dataset$regime <- NA
dataset$regime[which(dataset$ws >= ws_val & dataset$wd> wdlist[[1]] & dataset$wd < wdlist[[2]])] <- stringlist[[1]]
dataset$regime[which(dataset$ws >= ws_val & dataset$wd> wdlist[[3]] & dataset$wd < wdlist[[4]])] <- stringlist[[2]]
dataset$regime[which(dataset$ws < ws_val)] <- stringlist[[3]]
dataset$regime[which(is.na(dataset$regime))] <- stringlist[[4]]
return(dataset)
}
```
We can apply this function to all our datasets at once by creating nested lists. The order in the list corresponds to the order that snfiles are in (ie sn45, sn46, sn49.. etc)
```{r}
wd = list(list(140, 270, 140, 180), list(140, 270, 140, 180))
stringlistoflists = list(list( "140 - 270 and higher speeds", "140 - 180 downwind and higher speeds", "downwind and lower speeds", "upwind"), list( "140 - 270 and higher speeds", "140 - 180 downwind and higher speeds", "downwind and lower speeds", "upwind"))
wslist = list(2.5, 2.5)
trial <- mapply(create_regime_vector, snfiles, wslist, wd, stringlistoflists)
snfiles2 <- snfiles
for(i in 1:length(snfiles)){
snfiles2[[i]] <- create_regime_vector(snfiles[[i]], wslist[[i]], wd[[i]], stringlistoflists[[i]])
}
```
## Transform Data
In order to use our boxplots function, the snfiles data has to be in a much different format. We're going to put it in the right format in this section.
First, we're going to make a function that creates a column in the data, that just states the name of the dataset it belongs to.
```{r}
map_sensor <- function(dataset, sensorname){
dataset$sn <- sensorname
return(dataset)
}
```
And apply it:
```{r}
names = c("sn45", "sn45copy")
snfiles3 <- mapply("map_sensor", dataset = snfiles2, sensorname = names, SIMPLIFY = FALSE)
```
Next, we're going to convert the data from short format into long format. We can read more about this here [](). We're only going to extract the variables we need, using the measure.vars parameter. The id variables identify where the data is coming from.
```{r}
melt_dataframes <- function(dataset){
dataset.melted <- melt.data.table(dataset, id.vars = c("regime", "sn"), measure.vars = c("no", "no2", "pm1") )
return(dataset.melted)
}
```
```{r}
snfiles.melted <- lapply(snfiles3, function(x) melt_dataframes(x))
```
Now, we're going to bind all of the data together, so that we can plot it on the same boxplot.
```{r}
merged_df <- do.call("rbind", snfiles.melted) #combine into one dataset
```
To make it easier to identify the differences between data, we're going to make sensor name and regime into one column
```{r}
merged_df$group <- paste(merged_df$sn, merged_df$regime)
```
## Generating Boxplots
```{r}
calc_stat <- function(x) {
coef <- 1.5
n <- sum(!is.na(x))
# calculate quantiles
stats <- quantile(x, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
return(stats)
}
g <- ggplot(data = transform(merged_df, x= factor(group)), aes(x=group, y=value, fill=regime, group = group), position = position_dodge(width = 0.9)) +
stat_summary(fun.data = calc_stat, geom="boxplot") +
facet_wrap(vars(variable,sn), scales="free", ncol = 2)
g+ theme( axis.text.x = element_blank())
```
# Wind and LTOs
We're going to do a similar process here compared to the wind speed section above.
## Preparing LTO Data
One of the components that will be different is preparing a dataset of flight times to be used for LTO regimes.
A lot of the code in this section will be done in separate helper functions that will get called in.
We'll import and format the flight.
```{r}
#import the data
#View what it looks like
#head()
```
```{r}
#import the formatting function
#run the formatting function
```