Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.

downloads data but gets columns wrong #6

Open
ellisp opened this issue Oct 26, 2017 · 4 comments
Open

downloads data but gets columns wrong #6

ellisp opened this issue Oct 26, 2017 · 4 comments
Labels

Comments

@ellisp
Copy link
Collaborator

ellisp commented Oct 26, 2017

for example - mixes in id and date; cause and location - doesn't understand separate columns:

res <- search_data("name:fire", limit = 20)
res %>% filter(can_use == "yes") %>% slice(3) %>% get_data %>% View
@ellisp
Copy link
Collaborator Author

ellisp commented Oct 26, 2017

This looks tricky. Basically rio::import() doesn't parse everything as well as it should. To help, I've let the user add arguments to show_data that go through to import(), but in the end, not all data is easy to import...

@HughParsonage
Copy link
Collaborator

HughParsonage commented Oct 26, 2017

Yes, that particular file is quite strange. Every second row is blank and the download process has to try it multiply.

Also data.table complains not unreasonably about this line

856647,2009-05-01 04:34:00,20 ADELAIDE,FIP - NORMAL ON ARRIVAL, LINE FAULT/OPEN LINE

The comma after NORMAL ON ARRIVAL is part of the field, not a separator. But the field isn't quoted. readr::read_csv doesn't error, but also discards information (with stern warnings). base::read.csv gets the closest, but would require manual work.

@HughParsonage
Copy link
Collaborator

Perhaps the best we can do is gracefully error in cases like this, with a prayer to the end-user wishing the best of luck manually parsing it.

@ellisp ellisp added the wontfix label Oct 26, 2017
@ellisp
Copy link
Collaborator Author

ellisp commented Oct 27, 2017

This happens quite often (unsurprisingly) when the data are in Excel format. For example:

library(datagovau)
library(dplyr)
#----------------------business income------------------------
# an example of a dataset that doesn't import well - probably because it is in Excel


income_md <- search_data("name:income", limit = 1000)

business <- income_md %>%
  filter(name == "Business income by entity, state, industry and size for 2013-14 income year") %>%
  get_data()

business

#----------------------queensland and australian income----
qld <- income_md %>%
  filter(name == "Income of Qld and Aust 1999-2000 to 2013-14") %>%
  get_data()

qld

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants