Explore even more efficient CSV reading #198
Labels
help wanted
Extra attention is needed
I/O
internal-code
Tests, code cleanup, refactoring, or other things not user facing
performance
Milestone
The CSV reading is now quite fast after we replaced
utils::read_csv()
with vroom. We could however use vroom even more efficiently. Currently the altrep argument is set to FALSE here.The default option is TRUE, which indexes the CSV once reads the data lazily as you need it. We currently index and read the data immediately and repeat the process every time we read the same CSV.
I tried using this in the original vroom PR, but the issue was that I was not able to delete the CSVs in the same session. Which means that the
fit$save_output_files()
failed.There is an issue on the vroom package: tidyverse/vroom#177
It was closed, though I dont think this was resolved. There might be other workarounds worth exploring.
I might try playing with this again at a later point, but if someone wants to play with this, you are more then welcome to.
The text was updated successfully, but these errors were encountered: