Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert default in-memory for small datasets #2458

Closed
albertvillanova opened this issue Jun 8, 2021 · 1 comment · Fixed by #2460
Closed

Revert default in-memory for small datasets #2458

albertvillanova opened this issue Jun 8, 2021 · 1 comment · Fixed by #2460
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@albertvillanova
Copy link
Member

albertvillanova commented Jun 8, 2021

Users are reporting issues and confusion about setting default in-memory to True for small datasets.

We see 2 clear use cases of Datasets:

  • the "canonical" way, where you can work with very large datasets, as they are memory-mapped and cached (after every transformation)
  • some edge cases (speed benchmarks, interactive/exploratory analysis,...), where default in-memory can explicitly be enabled, and no caching will be done

After discussing with @lhoestq we have agreed to:

cc: @stas00 #2409 (comment)

@albertvillanova albertvillanova added the enhancement New feature or request label Jun 8, 2021
@albertvillanova albertvillanova self-assigned this Jun 8, 2021
@albertvillanova albertvillanova added this to the 1.8 milestone Jun 8, 2021
@albertvillanova
Copy link
Member Author

cc: @krandiash (pinged in reverted PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant