-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change meta estimation #409
Conversation
+ (dataframe.id.cumsum()).astype(str) | ||
str(partition_info["number"]) | ||
+ "_" | ||
+ (dataframe.id.cumsum()).astype(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we set the id as str
here. Was there no problem with inferring divisions with this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's mainly an issue if you cast an index that is already an integer to a string (or vice-versa) because you then change the order/sorting from numerical to lexicographical.
This shouldn't be the case here since we're creating a string index from the start. Dask will infer the divisions based on the lexicographical order of the index from the start (and then it should be preserved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thx!
PR that fixes an issue with the load from hub component when setting an index dynamically. Previous implementation relied on estimating the `meta/schema` dynamically but this lead to some errors since sometimes the wrong type was inferred. This PR fixes this issue by reading in the schema from the component spec
PR that fixes an issue with the load from hub component when setting an index dynamically. Previous implementation relied on estimating the
meta/schema
dynamically but this lead to some errors since sometimes the wrong type was inferred.This PR fixes this issue by reading in the schema from the component spec