-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weather uses two timezones - not clear which matches flights #19
Comments
The weather data is in UTC, which I think is standard. If you look at the code generating "time_hour" in weather.r, the time zone is not specified, which makes me think that it acquires whatever local time zone offset was in place when the package was built (from ?ISOdatetime: " ‘""’ is the current time zone") |
It looks like flights <- mutate(flights, time_hour = make_datetime(year, month, day, hour, tz = "America/New_York"))
weather <- mutate(weather, time_hour = make_datetime(year, month, day, hour, tz = "UTC")) produces values that seem to match up properly on a join. |
Let's put everything on the table with the following reprex: # devtools::install_github("hadley/nycflights13")
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(nycflights13))
flights %>%
select(year, month, day, hour, time_hour) %>%
slice(1)
#> # A tibble: 1 x 5
#> year month day hour time_hour
#> <int> <int> <int> <dbl> <dttm>
#> 1 2013 1 1 5 2013-01-01 05:00:00
flights$time_hour[1]
#> [1] "2013-01-01 05:00:00 UTC"
weather %>%
select(year, month, day, hour, time_hour) %>%
slice(1)
#> # A tibble: 1 x 5
#> year month day hour time_hour
#> <dbl> <dbl> <int> <int> <dttm>
#> 1 2013 1 1 0 2012-12-31 19:00:00
weather$time_hour[1]
#> [1] "2012-12-31 19:00:00 EST"
weather %>%
filter(origin == "EWR", month == 6) %>%
ggplot(aes(x = hour, y = temp)) +
geom_point() +
geom_smooth() +
labs(title = "June 2013 hourly temperatures at EWR") +
geom_vline(xintercept = 15, col = "red", size = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x' Created on 2018-06-06 by the reprex package (v0.2.0). Question for @hadley: What do you think the timezone/output of
|
I think both the time zones for both datasets should be @rudeboybert do you want to apply @ltierney's fix in your PR? Or should we close that one and start anew? |
I'll close #23 and start anew sometime next week. |
library(nycflights13)
flights[1, c("year", "month", "day", "hour", "time_hour")]
#> # A tibble: 1 x 5
#> year month day hour time_hour
#> <int> <int> <int> <dbl> <dttm>
#> 1 2013 1 1 5 2013-01-01 05:00:00
flights$time_hour[1]
#> [1] "2013-01-01 05:00:00 EST"
attr(flights$time_hour, "tzone")
#> [1] "America/New_York"
weather[1, c("year", "month", "day", "hour", "time_hour")]
#> # A tibble: 1 x 5
#> year month day hour time_hour
#> <dbl> <dbl> <int> <int> <dttm>
#> 1 2013 1 1 1 2013-01-01 01:00:00
weather$time_hour[1]
#> [1] "2013-01-01 01:00:00 EST"
attr(weather$time_hour, "tzone")
#> [1] "America/New_York" Created on 2018-06-20 by the reprex package (v0.2.0). |
I'll plan to submit to CRAN in one week (July 27), so I'd really appreciate it if someone could double check my work and let me know if I've missed anything |
Looks good on my end. Thanks! suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(nycflights13))
# Correct EST Standard vs EDT Daylight Savings (2013-03-10 thru 2013-11-03) for weather
weather$time_hour[1]
#> [1] "2013-01-01 01:00:00 EST"
weather$time_hour[13000]
#> [1] "2013-06-29 06:00:00 EDT"
# Correct EST Standard vs EDT Daylight Savings (2013-03-10 thru 2013-11-03) for flights
flights$time_hour[1]
#> [1] "2013-01-01 05:00:00 EST"
flights$time_hour[150000]
#> [1] "2013-03-15 17:00:00 EDT"
# Roughly hottest point of the day corresponds to 3pm
weather %>%
filter(origin == "EWR", month == 6) %>%
ggplot(aes(x = hour, y = temp)) +
geom_point() +
geom_smooth() +
labs(title = "June 2013 hourly temperatures at EWR") +
geom_vline(xintercept = 15, col = "red", size = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x' Created on 2018-06-20 by the reprex package (v0.2.0). |
This looks good to me, thanks for revising this. |
In
weather
, thetime_hour
variable is offset by five hours from the time displayed across theyear
,month
,day
, andhour
variables.It is not clear which time matches the times in
flights
(whereyear
,month
,day
,hour
, andtime_hour
all agree). Given the offset, it is possible thattime_hour
is in the America/New_York timezone and the other variables are in UTC.The text was updated successfully, but these errors were encountered: