-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.divisions are lost on repartition when npartions==1 #975
Comments
Yes, the npartition=1 is a shortcut to avoid computing the quantiles, which is a huge performance pain in most cases. Is there a scenario where you need the divisions there? |
@luxcem can you explain why you are interested in divisions in this example? Dask itself won't use them internally as soon as we're on a single partitioned dataframe since the algorithms for single partitions don't require divisions. Therefore, with query planning we are not calculating them. The legacy dataframe performs a possibly expensive computation to get them. If you are interested in the min/max values, instead, I recommend doing |
Typically, I employ this approach with a variable For instance, the |
This is a bug and we would appreciate it if you could share a reproducer. We certainly don't want to trigger any exceptions just because divisions are not set. The optimizer must deal with this automatically. Regarding the availability of divisions themselves, I would rather consider this a best effort attribute. We will not always guarantee this to be set with meaningful values and in the single partition case this is one of the cases that we choose to not set them. I recommend to not rely on this being set yourself. |
Ok I'll work on a reproducer. |
Thanks for working on a reproducer. I am curious to see where things are wrong. @fjetter is correct that we normally don't need divisions for one partition dfs, since we can work with them independently of divisions |
DataFrame.divisions are lost when using
repartition
orset_index
withnpartitions == 1
Environment:
2024.3.0
3.12.2
The text was updated successfully, but these errors were encountered: