Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore even more efficient CSV reading #198

Closed
rok-cesnovar opened this issue Jun 15, 2020 · 0 comments · Fixed by #318
Closed

Explore even more efficient CSV reading #198

rok-cesnovar opened this issue Jun 15, 2020 · 0 comments · Fixed by #318
Labels
help wanted Extra attention is needed I/O internal-code Tests, code cleanup, refactoring, or other things not user facing performance
Milestone

Comments

@rok-cesnovar
Copy link
Member

The CSV reading is now quite fast after we replaced utils::read_csv() with vroom. We could however use vroom even more efficiently. Currently the altrep argument is set to FALSE here.

The default option is TRUE, which indexes the CSV once reads the data lazily as you need it. We currently index and read the data immediately and repeat the process every time we read the same CSV.

I tried using this in the original vroom PR, but the issue was that I was not able to delete the CSVs in the same session. Which means that the fit$save_output_files() failed.

There is an issue on the vroom package: tidyverse/vroom#177
It was closed, though I dont think this was resolved. There might be other workarounds worth exploring.

I might try playing with this again at a later point, but if someone wants to play with this, you are more then welcome to.

@rok-cesnovar rok-cesnovar added help wanted Extra attention is needed internal-code Tests, code cleanup, refactoring, or other things not user facing I/O performance labels Jun 15, 2020
@rok-cesnovar rok-cesnovar added this to the future milestone Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed I/O internal-code Tests, code cleanup, refactoring, or other things not user facing performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant