-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the storms dataset #6320
Conversation
Hi Steve! Quick suggestion: to add the other status codes to the line below - if someone does want to keep the filter commented out, they won’t have to go back and clean the status for those. Thanks again! status = factor(recode(status, "HU" = "hurricane", "TS" = "tropical storm", "TD" = "tropical depression")) |
Trying to figure out what's going on with storm categorization. Bug in my parser? Or inconsistency in NOAA's categorization? These records should be classified as hurricanes (winds > 64 knots) but are subtropical storms, tropical storms, or other lows: > storms %>%
+ filter(category > 0, !(status %in% c("hurricane", "EX"))) %>%
+ select(name, year, month, day, hour, lat, long, status, wind, pressure)
# A tibble: 6 × 10
name year month day hour lat long status wind pressure
<chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <int> <int>
1 AL091968 1968 9 20 12 35.5 -49.5 SS 75 976
2 AL091968 1968 9 21 12 39.6 -44.7 SS 65 982
3 AL181979 1979 10 24 18 40.5 -62 SS 65 985
4 EMILY 2005 7 20 18 25 -98.7 tropical storm 70 975
5 DORIAN 2019 9 7 18 42.8 -64.6 LO 80 954
6 DORIAN 2019 9 8 0 45.2 -62.9 LO 80 956 |
I think it may not be either issues. I think the data is likely correct. Subtropical, Extratropical, Lows and Disturbances can all have high wind intensity, but that doesn't mean they are hurricanes. A storm needs to be determined to be a tropical cyclone before it can rise to the level of a hurricane (based on wind speed/intensity). Definitions of types of storms here: https://www.nhc.noaa.gov/aboutgloss.shtml https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-atl-1851-2021.pdf HU (Spaces 20-21, before 4th comma) – Status of system. Options are: So if a storm does not meet the requirements to be classified as a tropical cyclone, regardless of wind speed, it will never have a status of hurricane. For example, you've probably experienced wind conditions 34 - 47 knots, which is a gale, unless the wind was associated with a storm that was already determined to be a tropical cyclone (based on additional criteria aside from wind). Or you may have been in a winter Nor'easter (extratropical storm that can have winds over 65 knots but isn't a hurricane). I don't know the measurements behind it, but reading the definition and the link below it appears there are multiple characteristics/metrics used to determine if a storm is a tropical cyclone. Category > 0 had me stumped also at first, but then I realized the categories are based on a wind scale. Non-tropical cyclone storms also get assigned categories based on wind in the data. At first glance they appear to coincide with the Saffir-Simpson Wind scale, but I don't know that for sure. Having followed weather reports closely as a sailor, though, I've never heard NOAA refer to a category 1 gale or category 1 nor'easter, so I think NOAA only uses categories to describe hurricanes. https://www.nhc.noaa.gov/aboutsshws.php (see my second comment below - just realized the category data did not come from the original file). luis.df <- storms %>% 47 | 1995 | Luis | 2 | hurricane | 95 53 | 1995 | Luis | 2 | EX | 85 Hurricane Luis, for example, was an extratropical storm at some point, with wind categories 1, 2, and 3, but the other characteristic of the storm at that time did not meet the criteria for a tropical cyclone anymore, even though the winds were often higher than when it was a tropical cyclone. So I guess the take-away would be not to filter the data based on category thinking you are only going to get hurricanes. |
Actually, I just realized when looking at the original hurdat2 format file linked above, it seems category is not a column coming from NOAA. Since it is a column added in/calculated as part of the dyplr file, maybe only add the category for the tropical depressions, tropical storms and hurricanes, and leave the other storm types with an NA? I'm not sure of what the purpose of the -1 and 0 category are for tropical depressions and tropical storms, since the Saffir-Simpson Wind scale doesn't start until 1 for hurricanes. If you decide to use category for only hurricanes, all the other storm status categories could be 0. |
Yes, category is calculated from windspeed. I've made that a bit more clear in docs and set it to NA for everything that's not a hurricane. |
R/data-storms.R
Outdated
#' } | ||
#' @examples | ||
#' | ||
#' # show a plot of the storm paths | ||
#' # show a plot of the storm paths in 1975 or later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1975?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The facets get too squished in the figure if too many years are included. Is there a way to make the figure bigger in the docs? https://dplyr.tidyverse.org/reference/storms.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a little comment explaining that in the example
R/data-storms.R
Outdated
#' ggplot(storms) + | ||
#' storms %>% | ||
#' filter(year >= 1975) %>% | ||
#' ggplot() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' ggplot() + | |
#' ggplot() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe put the aes()
in the ggplot()
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aes on its own line is just my one-thing-per-line code style. Feel free to change it though. Commit coming soon...
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
@steveharoz do you want to finish this off? |
@hadley Yeah. I'll finish it later this week. |
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
Co-authored-by: Davis Vaughan <davis@rstudio.com>
Thanks for the update! I think the last question to resolve is whether it's worth while to include the rows prior to 1975 — I'm worried that this has a high likelihood of breaking existing graphics for little additional gain. I think it's probably safer to not include the historical data here. |
@hadley Yeah, I see the benefit of only having the clean and more complete data. |
Thanks! I did a couple more docs tweaks because I realised that this is the perfect place to use inline R code. |
Good call on the inline R! |
(closes #6319)
Point 3 might be worth discussing. Whoever originally added the dataset to dplyr dropped storms before 1975. I've been doing the same since I've been updating it, but I haven't seen a clear rationale. Considerations for adding the early data: