Explore even more efficient CSV reading #198

rok-cesnovar · 2020-06-15T12:02:10Z

The CSV reading is now quite fast after we replaced utils::read_csv() with vroom. We could however use vroom even more efficiently. Currently the altrep argument is set to FALSE here.

The default option is TRUE, which indexes the CSV once reads the data lazily as you need it. We currently index and read the data immediately and repeat the process every time we read the same CSV.

I tried using this in the original vroom PR, but the issue was that I was not able to delete the CSVs in the same session. Which means that the fit$save_output_files() failed.

There is an issue on the vroom package: tidyverse/vroom#177
It was closed, though I dont think this was resolved. There might be other workarounds worth exploring.

I might try playing with this again at a later point, but if someone wants to play with this, you are more then welcome to.

The text was updated successfully, but these errors were encountered:

rok-cesnovar added help wanted Extra attention is needed internal-code Tests, code cleanup, refactoring, or other things not user facing I/O performance labels Jun 15, 2020

rok-cesnovar added this to the future milestone Jun 15, 2020

This was referenced Oct 12, 2020

read_cmdstan_csv efficiency issue for large number of parameters #299

Closed

Replace vroom with data.table::fread #318

Merged

jgabry closed this as completed in #318 Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore even more efficient CSV reading #198

Explore even more efficient CSV reading #198

rok-cesnovar commented Jun 15, 2020

Explore even more efficient CSV reading #198

Explore even more efficient CSV reading #198

Comments

rok-cesnovar commented Jun 15, 2020