Revert default in-memory for small datasets #2458

albertvillanova · 2021-06-08T15:51:41Z

Users are reporting issues and confusion about setting default in-memory to True for small datasets.

We see 2 clear use cases of Datasets:

the "canonical" way, where you can work with very large datasets, as they are memory-mapped and cached (after every transformation)
some edge cases (speed benchmarks, interactive/exploratory analysis,...), where default in-memory can explicitly be enabled, and no caching will be done

After discussing with @lhoestq we have agreed to:

revert this feature (implemented in Set default in-memory value depending on the dataset size #2182)
explain in the docs how to optimize speed/performance by setting default in-memory

albertvillanova · 2021-06-08T18:57:11Z

cc: @krandiash (pinged in reverted PR).

albertvillanova added the enhancement New feature or request label Jun 8, 2021

albertvillanova self-assigned this Jun 8, 2021

albertvillanova added this to the 1.8 milestone Jun 8, 2021

albertvillanova mentioned this issue Jun 8, 2021

Revert default in-memory for small datasets #2460

Merged

lhoestq closed this as completed in #2460 Jun 8, 2021

This was referenced Jun 8, 2021

Add cache dir for in-memory datasets #2329

Closed

Calls to map are not cached. #2322

Closed

Provide feedback