-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add in english names to jpnprefs dataset #21
Conversation
data-raw/jpnprefs.R
Outdated
@@ -14,6 +14,10 @@ library(tidyverse) | |||
# dplyr # 0.7.6 | |||
# tidyr # 0.8.1 | |||
# purrr # 0.2.5 | |||
# (stringr) # 1.3.1 | |||
|
|||
library(polite) # 0.0.0.9004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we introduce the polite package?
This package is certainly useful, but has not yet been registered with CRAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I suppose it's not entirely necessary to use this package for now. We're only scraping from Wikipedia anyways. I use it as part of my workflow but I understand from a package development/maintenance point of view that it's not necessary.
We can just replace it with the regular rvest code instead:
url <- "https://en.wikipedia.org/wiki/Prefectures_of_Japan"
jpn_pref_raw <- read_html(url) %>%
html_nodes("table.wikitable:nth-child(49)") %>%
html_table() %>%
purrr::flatten_df()
url2 <- "https://en.wikipedia.org/wiki/List_of_Japanese_prefectures_by_population"
jpn_pref2_raw <- read_html(url) %>%
html_nodes("table.wikitable:nth-child(7)") %>%
html_table() %>%
purrr::flatten_df()
data-raw/jpnprefs.R
Outdated
purrr::flatten_df() | ||
|
||
jpn_pref_df <- jpn_pref_raw %>% | ||
janitor::clean_names() %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not feel motivated to use janitor for this process.
Is it possible to change to a method using dplyr::select()
which explicitly selects and renames a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure! I'll do it similar to how you set the names for the Japanese table using set_colnames()
:
jpn_pref_df <- jpn_pref_raw %>%
select(2, 4, 5) %>%
set_colnames(c("kanji", "region_en", "major_island_en")) %>%
mutate(region_en = region_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"))
data-raw/jpnprefs.R
Outdated
purrr::flatten_df() | ||
|
||
jpn_pref2_df <- jpn_pref2_raw %>% | ||
janitor::clean_names() %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jpn_pref2_df <- jpn_pref2_raw %>%
select(3, 2, 4) %>%
set_colnames(c("kanji", "prefecture_en", "capital_en")) %>%
mutate(prefecture_en = prefecture_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"),
capital_en = capital_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"))
Thanks your contribution :) 🎉 |
When working with the jpndistrict package I often find myself needing the English names of the prefectures, capitals, regions, etc. For the convenience of other non-Japanese users of this package I thought I would add in the English names for the prefectures, prefecture capital, region, and major island taken from Wikipedia to the "jpnprefs" dataset. I have also updated the tests as well.
Please review the changes at your convenience, thanks!