Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support even partitioning on Dask: Phase 1 #303

Closed
goodwanghan opened this issue Feb 5, 2022 · 0 comments · Fixed by #301
Closed

[FEATURE] Support even partitioning on Dask: Phase 1 #303

goodwanghan opened this issue Feb 5, 2022 · 0 comments · Fixed by #301
Labels
dask enhancement New feature or request
Milestone

Comments

@goodwanghan
Copy link
Collaborator

goodwanghan commented Feb 5, 2022

Currently even partitioning doesn't take effect on Dask dataframes.

And this issue was not resolved on Dask side.

But when even partitioning is needed, in most cases the input dataframe is very small because even partitioning is commonly used in small data large compute. So a trade-off is that we can compute the dataframe as a pandas dataframe, and reconstruct a dask dataframe to partition evenly.

As phase one, we only support even partition on small data. On large data, it may have scalability issues

@goodwanghan goodwanghan added enhancement New feature or request dask labels Feb 5, 2022
@goodwanghan goodwanghan added this to the 0.6.6 milestone Feb 5, 2022
@goodwanghan goodwanghan linked a pull request Feb 5, 2022 that will close this issue
@goodwanghan goodwanghan changed the title [FEATURE] Support even partitioning on Dask [FEATURE] Support even partitioning on Dask: Phase 1 Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant