Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add in english names to jpnprefs dataset #21

Merged
merged 2 commits into from
Sep 20, 2018
Merged

Conversation

Ryo-N7
Copy link
Contributor

@Ryo-N7 Ryo-N7 commented Sep 20, 2018

When working with the jpndistrict package I often find myself needing the English names of the prefectures, capitals, regions, etc. For the convenience of other non-Japanese users of this package I thought I would add in the English names for the prefectures, prefecture capital, region, and major island taken from Wikipedia to the "jpnprefs" dataset. I have also updated the tests as well.

Please review the changes at your convenience, thanks!

@uribo uribo self-requested a review September 20, 2018 10:03
@@ -14,6 +14,10 @@ library(tidyverse)
# dplyr # 0.7.6
# tidyr # 0.8.1
# purrr # 0.2.5
# (stringr) # 1.3.1

library(polite) # 0.0.0.9004
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we introduce the polite package?
This package is certainly useful, but has not yet been registered with CRAN.

Copy link
Contributor Author

@Ryo-N7 Ryo-N7 Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I suppose it's not entirely necessary to use this package for now. We're only scraping from Wikipedia anyways. I use it as part of my workflow but I understand from a package development/maintenance point of view that it's not necessary.

We can just replace it with the regular rvest code instead:

url <- "https://en.wikipedia.org/wiki/Prefectures_of_Japan"
jpn_pref_raw <- read_html(url) %>% 
    html_nodes("table.wikitable:nth-child(49)") %>% 
    html_table() %>% 
    purrr::flatten_df()

url2 <- "https://en.wikipedia.org/wiki/List_of_Japanese_prefectures_by_population"
jpn_pref2_raw <- read_html(url) %>% 
  html_nodes("table.wikitable:nth-child(7)") %>% 
  html_table() %>% 
  purrr::flatten_df()

purrr::flatten_df()

jpn_pref_df <- jpn_pref_raw %>%
janitor::clean_names() %>%
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel motivated to use janitor for this process.
Is it possible to change to a method using dplyr::select() which explicitly selects and renames a variable?

Copy link
Contributor Author

@Ryo-N7 Ryo-N7 Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure! I'll do it similar to how you set the names for the Japanese table using set_colnames():

jpn_pref_df <- jpn_pref_raw %>% 
  select(2, 4, 5) %>% 
  set_colnames(c("kanji", "region_en", "major_island_en")) %>% 
  mutate(region_en = region_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT")) 

purrr::flatten_df()

jpn_pref2_df <- jpn_pref2_raw %>%
janitor::clean_names() %>%
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jpn_pref2_df <- jpn_pref2_raw %>% 
  select(3, 2, 4) %>% 
  set_colnames(c("kanji", "prefecture_en", "capital_en")) %>% 
  mutate(prefecture_en = prefecture_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"),
         capital_en = capital_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"))

@uribo uribo merged commit 72f0ab1 into uribo:master Sep 20, 2018
@uribo
Copy link
Owner

uribo commented Sep 20, 2018

Thanks your contribution :) 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants