-
Notifications
You must be signed in to change notification settings - Fork 11
/
README.Rmd
458 lines (319 loc) · 33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
[![Travis-CI Build Status](https://travis-ci.org/geanders/countyweather.svg?branch=master)](https://travis-ci.org/geanders/countyweather)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/countyweather)](https://cran.r-project.org/package=countyweather)
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
error = TRUE
)
```
```{r echo = FALSE, message = FALSE}
library(countyweather)
options("noaakey" = Sys.getenv("noaakey"))
```
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
error = TRUE
)
```
While data from weather stations is available at the specific location of the weather station, it is often useful to have estimates of daily or hourly weather aggregated on a larger spatial level. For U.S.-based studies, it can be particularly useful to be able to pull time series of weather by county. For example, the health data used in environmental epidemiology studies is often aggregated at the county level for U.S. studies, making it very useful for environmental epidemiology applications to be able to create weather datasets by county.
This package builds on functions from the `rnoaa` package to identify weather stations within a county based on its FIPS code and then pull weather data for a specified date range from those weather stations. It then does some additional cleaning and aggregating to produce a single, county-level weather dataset. Further, it maps the weather stations used for that county and date range and allows you to create and write datasets for many different counties using a single function call.
If you are pulling weather data from single weather station, you should use `rnoaa` directly. However, `countyweather` allows you to pull and aggregate data from weather stations more easily at the county level for the US.
## Required set-up for this package
Currently, this package exists in a development version on GitHub. To use the package, you need to install it directly from GitHub using the `install_github` function from `devtools`.
You can use the following code to install the development version of `countyweather`:
```{r eval = FALSE}
library(devtools)
install_github("leighseverson/countyweather")
library(countyweather)
```
As a note, many of the dependencies in `countyweather` have their own dependencies (e.g., `lawn`, `acs`, `geojsonio`, `Rcurl`). You may be prompted to install additional packages to be able to install `countyweather`.
You will also need an API key from NOAA to be able to access the weather data. This API key is input with some of your data requests to NOAA within functions in this package. You can request an API key from NOAA here: http://www.ncdc.noaa.gov/cdo-web/token. You should keep this key private.
Once you have this NOAA API key, you'll need to pass it through to some of the functions in this package that pull data from NOAA. The most secure way to use this API key is to store it in your `.Renviron` configuration file, and then you can set it up as the value of an object in R code or R markdown documents without having to include the key itself in the script. To store the NOAA API key in your `.Renviron` configuration file, first check and see if you already have an `.Renviron` file in your home directory. You can check this by running the following from your R command line:
```{r eval = FALSE}
any(grepl("^\\.Renviron", list.files("~", all.files = TRUE)))
```
If this call returns `TRUE`, then you already have an `.Renviron` file.
If you already have it, open that file (for example, with `system("open ~/.Renviron")`). If you do not yet have an `.Renviron` file, open a new text file (in RStudio, do this by navigating to *File* > *New File* > *Text File*) and save this text file as `.Renviron` in your home directory. If prompted with a complaint, you DO want to use a filename that begins with a dot `.`
Once you have opened or created an `.Renviron` file, type the following into the file, replacing "your_emailed_key" with the actual string that NOAA emails you:
```{r eval = FALSE}
noaakey=your_emailed_key
```
Do not put quotation marks or anything else around your key. Do make sure to add a blank line as the last line of the file. If you find you're having problems getting this to work, go back and confirm that you've included a blank line as the last line in your `.Renviron` file. This is the most common reason for this part not working.
Next, you'll need to restart R. Once you restart R, you can get the value of this NOAA API key from `.Renviron` anytime with the call `Sys.getenv("noaakey")`. Before using functions that require the API key, set up the object `rnoaakey` to have your NOAA API key by running:
```{r}
options("noaakey" = Sys.getenv("noaakey"))
```
This will pull your NOAA API key from the `.Renviron` file and save it as the object `noaakey`, which functions in this package need to pull weather data from NOAA's web services. You will want to put this line of code as one of the first lines of code in any R script or R Markdown file you write that uses functions from this package.
## Basic examples of using the package
Weather data is collected at weather station, and there are often multiple weather stations within a county. The `countyweather` package allows you to pull weather data from all stations in a specified county over a specified date range. The two main functions in the countyweather package are `daily_fips` and `hourly_fips`, which pull daily and hourly weather data, respectively. By default, the weather data pulled from all weather stations in a county will then be averaged for each time point to create an average time series of daily or hourly measurements for that county. There is also an option that allows the user to opt out of the default aggregation across weather stations, and instead pull separate time series for each weather station in the county. This option is explained in more detail later in this document. Opting out of the default aggregation can be useful if you would like to use a method other than a simple average to aggregate across weather stations within a county.
Throughout, functions in this package identify a county using the county's Federal Information Processing Standard (FIPS) code. FIPS codes are 5-digit codes that uniquely identify every U.S. county. The first two digits of a county FIPS code specify state and the last three specify the county within the state. This package pulls data based on FIPS designations as of the 2010 Census. Users will not be able to pull data for the few FIPS codes that have undergone substantial changes since 2010 - for a list of those codes see the Census Bureau's [summary](https://www.census.gov/geo/reference/county-changes.html) of these counties for the 2010s.
Currently, this package can pull daily and hourly weather data for variables like temperature and precipitation. For resources with complete lists of weather variables available through this package, as well as sources of this weather data, see the section later in this document titled "More on the weather data".
### Pulling daily data
The `daily_fips` function can be used to pull daily weather data for all weather stations within the geographic boundaries of a county. This daily weather data comes from NOAA's Global Historical Climatology Network. When pulling data for a county, the user can specify date ranges (`date_min`, `date_max`), which weather variables to include in the output dataset (`var`), and restrictions on how much non-missing data a weather station must have over the time period to be included when generating daily county average values (`coverage`). This function will pull any available data for weather stations in the county under the specified restrictions and output both a dataset of average daily observations across all county weather stations, as well as a map plotting the stations used in the county-wide averaged data.
Here is an example of creating a dataset with daily precipitation for Miami-Dade county (FIPS code = 12086) for August 1992, when Hurricane Andrew stuck:
```{r warning = FALSE, message = FALSE}
andrew_precip <- daily_fips(fips = "12086", date_min = "1992-08-01",
date_max = "1992-08-31", var = "prcp")
```
```{r}
names(andrew_precip)
```
The output from this function call is a list that includes three elements: a daily timeseries of weather data for the county (`andrew_precip$daily_data`), a dataframe with meta-data about the weather stations used to create the timeseries data, as well as statistical information about the weather values pulled from these stations (`andrew_precip$station_metadata`), and a map showing the locations of weather stations included in the county-averaged dataset (`andrew_precip$station_map`).
Here are the first few rows of the dataset:
```{r}
head(andrew_precip$daily_data)
```
The dataset includes columns for date (`date`), precipitation (in mm, `prcp`), and also the number of stations used to calculate each daily average precipitation observation (`prcp_reporting`).
This function performs some simple data cleaning and quality control on the weather data originally pulled from NOAA's web services; see the "More on the weather data" section later in this document for more details, including the units for the weather observations collected by this function.
Here is a plot of this data, with colors used to show the number of weather stations included in each daily observation:
```{r fig.width = 7, fig.height = 3}
library(ggplot2)
ggplot(andrew_precip$daily_data, aes(x = date, y = prcp, color = prcp_reporting)) +
geom_line() + geom_point() + theme_minimal() +
xlab("Date in 1992") + ylab("Daily rainfall (mm)") +
scale_color_continuous(name = "# stations\nreporting")
```
From this plot, you can see both the extreme precipitation associated with Hurricane Andrew (Aug. 24) and that the storm knocked out quite a few of the weather stations normally available.
A map is also included in the output of `daily_fips` with the weather stations used for the county average, as the `station_map` element:
```{r warning = FALSE}
andrew_precip$station_map
```
This map uses U.S. Census TIGER/Line shapefiles (vintage 2011) and functions from the `sf` package to overlay weather station locations on a shaped map showing the county's boundaries.
The `station_metadata` dataframe gives information about all of the stations contributing data to the `daily_data` dataframe, as well as information about how the values by each station vary within each weather variable. If a weather station is contributing data for multiple variables, it will show up in this dataframe multiple times. Here's what the `station_metadata` dataframe looks like for the `andrew_precip` list:
```{r}
andrew_precip$station_metadata
```
For each station, the dataframe gives an `id` and `name`, as well as `latitude` and `longitude`. `var` indicates the variable for which the station is pulling data. If a station is contributing data for multiple variables, that station will show up in the dataframe once for each of those variables. For each variable and station combination, the dataframe also shows `calc_coverage`, which is the calculated percent of non-missing values. You can filter these by using the `daily_fips` option `coverage`. `standard_dev` gives the standard deviation for each sample of weather data from each station and weather variable, `min` and `max` give the minumum and maximum values, and `range` gives the range of these values. These last four statistical calculations (standard deviation, maximum, minimum, and range) are only included for the seven core hourly weather variables (which include `wind_direction`, `wind_speed`, `ceiling_height`, `visibility_distance`, `temperature`, and `temperature_dewpoint` -- for more details on these variables, see the "More on the weather data" section below). The values of these columns are set to "NA" for other variables, such as quality flag data.
If you are interested in looking at the weather values for certain stations, you can use the `average_data = FALSE` option in `daily_fips`. For more on this option and a few others, see the "Futher options available in the package" section below.
### Pulling hourly data
You can use the `hourly_fips` function to pull hourly weather data by county from NOAA's Integrated Surface Data (ISD) weather dataset. In this case, NOAA's web services will not identify weather stations by FIPS, so instead this function will pull all stations within a certain radius of the county's population mean center to represent weather within that county. While there are seven main weather variables that are possible to pull (listed below in the "More on the weather data" section), `temperature` and `wind_speed` tend to be non-missing most often.
An estimated radius is calculated for each county using 2010 U.S. Census Land Area data -- each county is assumed to be roughly ciruclar. The calculated radius (in km), as well as the longitude and latitude of the geographic center for each county are included as elements in the list returned from `hourly_fips`.
Here is an example of pulling hourly data for Miami-Dade, for the year of Hurricane Andrew. While daily weather data can be pulled using a date range specified to the day, hourly data can only be pulled by year (for one or multiple years) using the `year` argument:
```{r}
andrew_hourly <- hourly_fips(fips = "12086", year = 1992,
var = c("wind_speed", "temperature"))
```
The output from this call is a list object that includes six elements. `andrew_hourly$hourly_data` is an hourly timeseries of weather data for the county. The other five elements, `station_metadata`, `station_map`, `radius`, `lat_center`, and `lon_center`, are explained in more detail below.
Here are the first few rows of of the `hourly_data` dataset:
```{r}
head(andrew_hourly$hourly_data)
```
If you need to get the timestamp for each observation in local time, you can use the `add_local_time` function from the `countytimezones` package to do that:
```{r}
andrew_hourly_data <- as.data.frame(andrew_hourly$hourly_data)
# install.packages("countytimezones") # Install package if necessary
library(countytimezones)
```
```{r}
andrew_hourly_data <- add_local_time(df = andrew_hourly_data, fips = "12086",
datetime_colname = "date_time")
head(andrew_hourly_data)
```
Here is a plot of hourly wind speeds for Miami-Dade County, FL, for the month of Hurricane Andrew:
```{r fig.width = 7, fig.height = 3, message = FALSE, warning = FALSE}
library(dplyr)
library(lubridate)
to_plot <- andrew_hourly$hourly_data %>%
filter(months(date_time) == "August")
ggplot(to_plot, aes(x = date_time, y = wind_speed,
color = wind_speed_reporting)) +
geom_line() + theme_minimal() +
xlab("Date in August 1992") +
ylab("Wind speed (m / s)") +
scale_color_continuous(name = "# stations\nreporting")
```
Again, the intensity of conditions during Hurricane Andrew is clear, as is the reduction in the number of reporting weather stations during the storm.
The list object returned by `hourly_fips` also includes a map of weather station locations (`station_map`):
```{r}
andrew_hourly$station_map
```
Because hourly data is pulled by radius from each county's geographic center, this plot inlcudes the calculated radius from which stations are pulled. This radius is calculated for each county using 2010 U.S. Census Land Area data. U.S. Census TIGER/Line shapefiles are used to provide county outlines, included on this plot as well. Because stations are pulled within a radius from the county's center, stations from outside of the county's boundaries may sometimes be providing data for that county.
Other list elements returned by `hourly_fips` include `station_metadata`, `radius`, `lat_center`, and `lon_center`. `radius` is the estimated radius (in km) for the county calculated using 2010 U.S. Census Land Area data -- the county is assumed to be roughly ciruclar. `lat_center` and `lon_center` are the longitude and latitude of the geographic center for the county, respectively.
The `station_metadata` dataframe gives information about all of the stations contributing data to the `hourly_data` dataframe, as well as information about how the values by each station vary within each weather variable. If a weather station is contributing data for multiple variables, it will show up in this dataframe multiple times. Here's what the `station_metadata` dataframe looks like for the `andrew_hourly` list:
```{r}
andrew_hourly$station_metadata
```
`usaf` and `wban` are station ids. `station` is a unique identifier for each station -- usaf and wban ids have been pasted together, separated by "-". (Note: values for `wban` or `usaf` are sometimes missing (originally indicated by "99999" or "999999"), which could result in a `station` value like `722024-NA`.) `station_name` is the name for each station, and `var` indicates the variable for which the station is pulling data. If a station is contributing data for multiple variables, that station will show up in the dataframe once for each of those variables. For each variable and station combination, the dataframe also shows `calc_coverage`, which is the calculated percent of non-missing values. You can filter these by using the `hourly_fips` option `coverage`. `standard_dev` gives the standard deviation for each sample of weather data from each station and weather variable, and `range` gives the range of these values. Here, we can see in row 7 that the OPA LOCKA station has a very low percent coverage for temperature (0.0018), and a correspondingly high standard deviation (17.94). If you are interested in looking at the weather values for certain stations, you can use the `average_data = FALSE` option in `hourly_fips`. For more on this option and a few others, see the "Futher options available in the package" section below. The dataframe also gives station countries, states, elevation (in meters), the earliest and latest dates for which the station has available data (`begin` and `end`, respectively), longitude, and latitude.
## Writing out time series files
There are a few functions that allow the user to write out daily or hourly timeseries datasets for many different counties to a specified local directory, as well as plots of this data. For daily weather data, see the functions `write_daily_timeseries` and `plot_daily_timeseries`. For hourly, see `write_hourly_timeseries` and `plot_hourly_timeseries`.
For example, if we wanted to compare daily weather in the month of August for three counties in southern Florida, we could run:
```{r eval = FALSE}
fl_counties <- c("12086", "12087", "12011")
write_daily_timeseries(fips = fl_counties, date_min = "1992-08-01",
date_max = "1992-08-31", var = "prcp",
out_directory = "~/Documents/andrew_data")
```
The `write_daily_timeseries` function saves each county's timeseries as a separate file in a subdirectory called "data" of the directory specified in the `out_directory` option. The `data_type` argument allows the user to specify either .rds or .csv files (the default is to write .rds files). Each file is a time series dataframe of daily weather data. A dataframe of station metadata is saved in a second subdirectory called "metadata", and maps showing locations of weather stations contributing to each time series are saved in a subdirectory called "maps." At this stage, if you were to include a county in the `fips` argument without available data, a file would not be created for that county.
The function `plot_daily_timeseries` creates and saves plots for each of these files. (Note: the `data_type` argument for this function also defaults to read .rds files, so if you chose to write .csv files, make sure to change that arugment in this function as well to `data_type = "csv"`.)
```{r eval = FALSE}
plot_daily_timeseries("prcp", data_directory = "~/Documents/andrew_data/data",
plot_directory = "~/Documents/andrew_plots",
date_min = "1992-08-01", date_max = "1992-08-31")
```
Here's an example of what the time series plots for the three Florida counties would look like:
```{r warning = FALSE, message = FALSE, echo = FALSE, eval = FALSE}
two <- daily_fips(fips = "12087", date_min = "1992-08-01",
date_max = "1992-08-31", var = "prcp")
save(two, file = "vignettes/data/two.RData")
```
```{r echo = FALSE}
load("vignettes/data/two.RData")
```
```{r warning = FALSE, message = FALSE, echo = FALSE, eval = FALSE}
three <- daily_fips(fips = "12011", date_min = "1992-08-01",
date_max = "1992-08-31", var = "prcp")
save(three, file = "vignettes/data/three.RData")
```
```{r echo = FALSE}
load("vignettes/data/three.RData")
```
```{r echo = FALSE, fig.width = 8, fig.height = 2}
oldpar <- par(mfrow = c(1, 3))
df <- andrew_precip$daily_data
plot(df$date, df$prcp, type = "l", col = "red", main = "12086", xlab = "date",
ylab = "prcp", xlim = c(as.Date("1992-08-01"), as.Date("1992-08-31")))
df2 <- two$daily_data
plot(df2$date, df2$prcp, type = "l", col = "red", main = "12087", xlab = "date",
ylab = "prcp", xlim = c(as.Date("1992-08-01"), as.Date("1992-08-31")))
df3 <- three$daily_data
plot(df3$date, df3$prcp, type = "l", col = "red", main = "12011", xlab = "date",
ylab = "prcp", xlim = c(as.Date("1992-08-01"), as.Date("1992-08-31")))
par(oldpar)
```
## Futher options available in the package
### `coverage`
For `hourly_fips` and `daily_fips`, and time series functions, the user can choose to filter out any weather stations that report variables for less that a certain percent of time (`coverage`). For example, if you were to set `coverage` to 0.90, only weather stations that reported non-missing values at least 90% of the time would be included in your data.
### `average_data`
In both `daily_fips` and `hourly_fips`, the default is to return a single daily average for the county for each day in the time series, giving the value averaged across all available weather stations on that day. However, there is also an option called `average_data` which allows the user to specify whether they would like the weather data returned before it has been averaged across weather stations. If this argument is set to `FALSE`, the functions will return separate daily data for each weather station in the county. For our Hurricane Andrew example, if we specify `average_data = FALSE`:
```{r}
not_averaged <- daily_fips(fips = "12086",
date_min = "1992-08-01",
date_max = "1992-08-31",
var = "prcp", average_data = FALSE,
station_label = TRUE)
not_averaged_data <- not_averaged$daily_data
head(not_averaged_data)
unique(not_averaged_data$id)
```
In this example, there are six weather stations contributing weather data to the time series. We can plot the data by station to get a sense for how values from each weather station compare, and which weather stations were presumably knocked out by the storm, with different colors used to show values for different weather stations:
```{r fig.width = 7, fig.height = 3, warning = FALSE, message = FALSE}
library(ggplot2)
ggplot(not_averaged_data, aes(x = date, y = prcp,
colour = id)) +
geom_line() +
theme_minimal()
```
It might be interesting here to compare this plot with the station map, this time with station labels included (done using `station_label = TRUE` when we pulled this data using `daily_fips`):
```{r warning = FALSE, message = FALSE, fig.width = 7}
not_averaged$station_map
```
### Quality Flags
The hourly Integrated Surface Data includes quality codes for each of the main weather variables. For more information about the hourly weather variables, see the "More on the weather data" section below. We can use these codes to remove suspect or erroneous values from our data. The values in `wind_speed_quality`, for example, take on the following values: (Values in this table were pulled from the [ISD documentation file](ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf).)
```{r echo = FALSE}
qual <- data.frame(code = c(0, 1, 2, 3, 4, 5, 6, 7, 9), definition = c(" Passed gross limits check", "Passed all quality control checks", "Suspect", "Erroneous", "Passed gross limits check , data originate from an NCEI data source", "Passed all quality control checks, data originate from an NCEI data source", "Suspect, data originate from an NCEI data source", "Erroneous, data originate from an NCEI data source", "Passed gross limits check if element is present"))
library(knitr)
kable(qual, format = "markdown")
```
Because it doesn't make sense to average these codes across stations, the codes should only be pulled when using the option to pull station-specific values (`average_data = FALSE`).
```{r}
ex <- hourly_fips("12086", 1992, var = c("wind_speed", "wind_speed_quality"),
average_data = FALSE)
ex_data <- ex$hourly_data
head(ex_data)
```
We can replace all wind speed observations with quality codes of 2, 3, 6, or 7 with `NA`s.
```{r}
ex_data$wind_speed_quality <- as.numeric(ex_data$wind_speed_quality)
ex_data$wind_speed[ex_data$wind_speed_quality %in% c(2, 3, 6, 7)] <- NA
```
## More on the weather data
### Daily weather data
Functions in this package that pull daily weather values (`daily_fips()`, for example) are pulling data from the Daily Global Historical Climatology Network (GHCN-Daily) through NOAA's FTP server. The data is archived at the National Centers for Environmental Information (NCEI) (formerly the National Climatic Data Center (NCDC)), and spans from the 1800s to the current year.
Users can specify which weather variables they would like to pull. The five core daily weather variables are precipitation (`prcp`), snowfall (`snow`), snow depth (`snwd`), maximum temperature (`tmax`) and minimum temperature (`tmin`). The daily weather data is filtered so that included weather variables fall within a range of possible values. These ranges were chosen to include national maximum recorded values.
```{r echo = FALSE, warning = FALSE, message = FALSE}
daily_vars <- data.frame(variables = c("prcp", "snow", "snwd", "tmax", "tmin"),
description = c("precipitation", "snowfall", "snow depth", "maximum temperature", "minumum temperature"),
units = c("mm", "mm", "mm", "degrees Celsius", "degrees Celsius"),
most_extreme_value = c("1100 mm", "1600 mm", "11500 mm", "57 degrees C", "-62 degrees C"))
library(knitr)
kable(daily_vars, format = "markdown", col.names = c("Variable", "Description",
"Units",
"Most extreme value"))
```
`tmax`, `tmin`, and `prcp` were originally recorded in tenths of units, and are listed as such in NOAA documentation. These values are converted to standard units (degrees Celsius and mm, respectively) in `countyweather` output.
There are several additional, non-core variables available. For example, `acmc` gives the "average cloudiness midnight to midnight from 30-second ceilometer data (percent)." The complete list of available weather variables can be found under 'element' from the GHCND's [readme file](http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt).
While the datasets resulting from functions in this package return a cleaned and aggregated dataset, Menne et al. (2012) give more information aboout the raw data in the GHCND database.
### Hourly weather data
Hourly weather data in this package is pulled from NOAA's Integrated Surface Data (ISD), and is available from 1901 to the current year. The data is archived at the National Centers for Environmental Information (NCEI) (formerly the National Climatic Data Center (NCDC)), and is also pulled through NOAA's FTP server.
The seven core hourly weather variables are `wind_direction`, `wind_speed`, `ceiling_height`, `visibility_distance`, `temperature`, `temperature_dewpoint`, and `air_pressure`. Values in this table were pulled from the [ISD documentation file](ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf).
```{r echo = FALSE}
hourly_vars <- data.frame(Variable = c("wind_direction", "wind_speed",
"ceiling_height", "visibility_distance",
"temperature", "temperature_dewpoint",
"air_pressure"),
Description = c("The angle, measured in a clockwise direction, between true north and the direction from which the wind is blowing",
"The rate of horizontal travel of air past a fixed point",
"The height above ground level of the lowest cloud or obscuring phenomena layer aloft with 5/8 or more summation total sky cover, which may be predominately opaque, or the vertical visibility into a surface-based obstruction",
"The horizontal distance at which an object can be seen and identified",
"The temperature of the air",
"The temperature to which a given parcel of air must be cooled at constant pressure and water vapor content in order for saturation to occur",
"The air pressure relative to Mean Sea Level"),
Units = c("Angular Degrees", "Meters per Second",
"Meters", "Meters", "Degrees Celsuis",
"Degrees Celsius", "Hectopascals"),
Minimum = c("1", "0", "0", "0", "-93.2", "-98.2", "860"),
Maximum = c("360", "90", "22000 (indicates 'Unlimited')", "160000", "61.8",
"36.8", "1090"))
```
```{r echo = FALSE, warning = FALSE, message = FALSE}
library(knitr)
kable(hourly_vars, format = "markdown")
```
There are other columns available in addition to these weather variables, such as quality codes (e.g., `wind_direction_quality` --- each of the main weather variables has a corresponding quality code that can be pulled by adding `_quality` to the end of the variable name).
For more information about the weather variables described in the above table and other available columns, see the [ISD documentation file](ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf).
## Error and warning messages you may get
### Not able to pull data from a weather station
The following error message will come up after running functions pulling daily data if there isn't available data (for your specified date range, coverage, and weather variables) for a particular weather station or stations:
```{r eval = FALSE}
In rnoaa::meteo_pull_monitors(monitors = stations, keep_flags = FALSE, :
The following stations could not be pulled from the GHCN ftp:
USR0000FTEN
Any other monitors were successfully pulled from GHCN.
```
The following error message will come up after running functions pulling hourly data (`hourly_fips()`) if there isn't available data for any of the weather stations in your specified county. Note: some weather variables tend to be missing more often than others.
```{r eval = FALSE}
Error in isd_monitors_data(fips = fips, year = x, var = var, radius = radius) :
None of the stations had available data.
```
The following error message will come up after running `daily_timeseries` or `hourly_timeseries` if the function is unable to pull data for a particular fips code in your `fips` vector:
```{r eval = FALSE}
Unable to pull weather data for FIPS code "12086" for the specified percent coverage, year(s), and/or weather variables.
```
### Need an API key for NOAA data
If you run functions that use NOAA API calls without first requesting an API key from NOAA and setting up the key in your R session, you will see the following error message:
```{r eval = FALSE}
Error in getOption("noaakey", stop("need an API key for NOAA data")) :
need an API key for NOAA data
```
If you get this error message, run the code:
```{r}
options("noaakey" = Sys.getenv("noaakey"))
```
and then try again. If you still get an error, you may not have set up your NOAA API key correctly in your `.Renviron` file. See the "Required set-up" section of this document for more details on doing that correctly.
### NOAA web services down
Sometimes, some of NOAA's web services will be off-line. In this case, you may get an error message when you try to pull data like:
```
Error in gzfile(file, mode) : cannot open the connection
```
In this case, try again in a new R session. If you're still getting this error, wait a few hours and then try again.
### Other errors
If you get other error messages or run into problems with this package, please submit a reproducible example on this repository's Issues page.
## References
Menne, Matthew J, Imke Durre, Russell S Vose, Byron E Gleason, and Tamara G Houston. 2012. “An Overview of the Global Historical Climatology Network-Daily Database.” Journal of Atmospheric and Oceanic Technology 29 (7): 897–910.