Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_stats19() sometime fails when reading the same data twice #205

Closed
agila5 opened this issue Jul 20, 2021 · 8 comments · Fixed by #206
Closed

get_stats19() sometime fails when reading the same data twice #205

agila5 opened this issue Jul 20, 2021 · 8 comments · Fixed by #206

Comments

@agila5
Copy link
Collaborator

agila5 commented Jul 20, 2021

Not a big deal, but maybe worth exploring what's going on sooner or later:

library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

a17 <- get_stats19(2017, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a17 <- get_stats19(2017, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/dftRoadSafetyData_Accidents_2017/Acc.csv': Invalid argument

a18 <- get_stats19(2018, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a18 <- get_stats19(2018, silent = TRUE)
#> date and time columns present, creating formatted datetime column

a19 <- get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a19 <- get_stats19(2019, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

a1719 <- get_stats19(2017:2019, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/dftRoadSafetyData_Accidents_2017/Acc.csv': Invalid argument

Created on 2021-07-20 by the reprex package (v2.0.0)

@Robinlovelace
Copy link
Member

Good catch! I think this may be responsible for some of the errors I'm seeing on CRAN submissions. Idea: do something like

if(file.exists(file_that_is_unzipped)) {
  # don't try to unzip the file
...
} else {
  # ...
}

@Robinlovelace
Copy link
Member

Error I see on CRAN checks:

 Quitting from lines 291-295 (stats19.Rmd)
  Error: processing vignette 'stats19.Rmd' failed with diagnostics:
  cannot open file 'D:/temp/RtmpoBoYdg/working_dir/RtmpUNi5SU/dftRoadSafetyData_Casualties_2017/Cas.csv': Invalid argument
  --- failed re-building 'stats19.Rmd'

@Robinlovelace
Copy link
Member

Not 100% sure it's the same thing. Here's the message on CRAN: https://win-builder.r-project.org/incoming_pretest/stats19_1.4.2_20210719_224923/Windows/00check.log

@agila5
Copy link
Collaborator Author

agila5 commented Jul 20, 2021

might be, I will check right now

@agila5
Copy link
Collaborator Author

agila5 commented Jul 20, 2021

I really don't understand what's going on 😅 Any help is greatly appreciated. The only finding I can add:

# packages
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

# works
dl_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Data saved at C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
dl_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Data saved at C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv

# fails
a2019 <- get_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Data saved at C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
#> Reading in:
#> C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
#> date and time columns present, creating formatted datetime column
a2019 <- get_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

# fails
dl_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

Created on 2021-07-20 by the reprex package (v2.0.0)

I think it might be a bug in R + unzip + windows since (for whatever reason) the problem is fixed when I restart R but I cannot reproduce the problems outside of get_stats19. Will check again in a few days.

@agila5
Copy link
Collaborator Author

agila5 commented Jul 20, 2021

Nevermind, should be a problematic interaction with the recent upgrade in readr (i.e. the current CRAN version):

# current packages
remotes::install_github("ropensci/stats19")
#> Skipping install of 'stats19' from a github remote, the SHA1 (c1c8fde2) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_cran("readr", quiet = TRUE)

# test
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

Created on 2021-07-20 by the reprex package (v2.0.0)

while

# current packages
remotes::install_github("ropensci/stats19", quiet = TRUE)
remotes::install_version("readr", "1.4.0", quiet = TRUE)

# test
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column

Created on 2021-07-20 by the reprex package (v2.0.0)

@Robinlovelace
Copy link
Member

Thanks for testing it Andrea, that would explain why it has only just appeared as an issue. Could you try running this line of code before the tests?

readr::local_edition(1) 

If that fixes it, we can, I guess, do

readr::with_edition(1, readr::read_csv("my_file.csv")) 

to solve the problem. Source: https://www.tidyverse.org/blog/2021/07/readr-2-0-0/#readr-2nd-edition

@Robinlovelace
Copy link
Member

Or just add

readr::local_edition(1)

At the beginning of each function that uses readr functions: https://readr.tidyverse.org/reference/with_edition.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants