-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add in english names to jpnprefs dataset #21
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,10 @@ library(tidyverse) | |
# dplyr # 0.7.6 | ||
# tidyr # 0.8.1 | ||
# purrr # 0.2.5 | ||
# (stringr) # 1.3.1 | ||
|
||
library(polite) # 0.0.0.9004 | ||
|
||
|
||
|
||
# Japanese ---------------------------------------------------------------- | ||
|
@@ -22,7 +26,7 @@ x <- | |
|
||
df <- | ||
x %>% | ||
html_nodes(css = "#mw-content-text > div > table.wikitable.sortable") %>% | ||
html_nodes(css = "table.wikitable:nth-child(104)") %>% # css to correct table as wiki page was edited | ||
html_table(fill = TRUE) %>% | ||
purrr::flatten_df() %>% | ||
select(2, 4, 6, 11) %>% | ||
|
@@ -92,10 +96,54 @@ jpnprefs %<>% | |
select(jis_code, prefecture, capital, region, major_island, capital_latitude = latitude, capital_longitude = longitude) %>% | ||
as_tibble() | ||
|
||
# ---- English region and island names | ||
url <- "https://en.wikipedia.org/wiki/Prefectures_of_Japan" | ||
|
||
session <- bow(url) | ||
|
||
jpn_pref_raw <- scrape(session) %>% | ||
html_nodes("table.wikitable:nth-child(49)") %>% | ||
#.[[1]] %>% | ||
html_table() %>% | ||
purrr::flatten_df() | ||
|
||
jpn_pref_df <- jpn_pref_raw %>% | ||
janitor::clean_names() %>% | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not feel motivated to use janitor for this process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, sure! I'll do it similar to how you set the names for the Japanese table using jpn_pref_df <- jpn_pref_raw %>%
select(2, 4, 5) %>%
set_colnames(c("kanji", "region_en", "major_island_en")) %>%
mutate(region_en = region_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT")) |
||
select(kanji, region_en = region, major_island_en = major_island) %>% | ||
mutate(region_en = region_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT")) | ||
|
||
# ---- English prefecture and capital names | ||
url2 <- "https://en.wikipedia.org/wiki/List_of_Japanese_prefectures_by_population" | ||
|
||
session2 <- bow(url2) | ||
|
||
jpn_pref2_raw <- scrape(session2) %>% | ||
html_nodes("table.wikitable:nth-child(7)") %>% | ||
#.[[1]] %>% | ||
html_table() %>% | ||
purrr::flatten_df() | ||
|
||
jpn_pref2_df <- jpn_pref2_raw %>% | ||
janitor::clean_names() %>% | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. jpn_pref2_df <- jpn_pref2_raw %>%
select(3, 2, 4) %>%
set_colnames(c("kanji", "prefecture_en", "capital_en")) %>%
mutate(prefecture_en = prefecture_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"),
capital_en = capital_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT")) |
||
select(kanji = japanese, prefecture_en = prefectures, capital_en = capital) %>% | ||
mutate(prefecture_en = prefecture_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT"), | ||
capital_en = capital_en %>% iconv(from = "UTF-8", to = "ASCII//TRANSLIT")) | ||
|
||
# ---- Join with jpnprefs | ||
jpnprefs <- jpnprefs %>% | ||
left_join(jpn_pref_df, by = c("prefecture" = "kanji")) %>% | ||
left_join(jpn_pref2_df, by = c("prefecture" = "kanji")) %>% | ||
select(jis_code, prefecture, capital, region, major_island, | ||
prefecture_en, capital_en, region_en, major_island_en, | ||
capital_latitude, capital_longitude) %>% | ||
as_tibble() | ||
|
||
expect_named(jpnprefs, | ||
c("jis_code", "prefecture", "capital", "region", "major_island", "capital_latitude", "capital_longitude")) | ||
c("jis_code", "prefecture", "capital", "region", "major_island", | ||
"prefecture_en", "capital_en", "region_en", "major_island_en", | ||
"capital_latitude", "capital_longitude")) | ||
expect_equal(dim(jpnprefs), | ||
c(47, 7)) | ||
c(47, 11)) | ||
expect_s3_class(jpnprefs, | ||
c("data.frame", "tbl_df")) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we introduce the polite package?
This package is certainly useful, but has not yet been registered with CRAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I suppose it's not entirely necessary to use this package for now. We're only scraping from Wikipedia anyways. I use it as part of my workflow but I understand from a package development/maintenance point of view that it's not necessary.
We can just replace it with the regular rvest code instead: