[FEATURE] Support even partitioning on Dask: Phase 1 #303

goodwanghan · 2022-02-05T19:50:34Z

Currently even partitioning doesn't take effect on Dask dataframes.

And this issue was not resolved on Dask side.

But when even partitioning is needed, in most cases the input dataframe is very small because even partitioning is commonly used in small data large compute. So a trade-off is that we can compute the dataframe as a pandas dataframe, and reconstruct a dask dataframe to partition evenly.

As phase one, we only support even partition on small data. On large data, it may have scalability issues

goodwanghan added enhancement New feature or request dask labels Feb 5, 2022

goodwanghan added this to the 0.6.6 milestone Feb 5, 2022

goodwanghan linked a pull request Feb 5, 2022 that will close this issue

Create DuckDaskExecutionEngine with Dask Improvements #301

Merged

goodwanghan changed the title ~~[FEATURE] Support even partitioning on Dask~~ [FEATURE] Support even partitioning on Dask: Phase 1 Feb 9, 2022

goodwanghan closed this as completed in #301 Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support even partitioning on Dask: Phase 1 #303

[FEATURE] Support even partitioning on Dask: Phase 1 #303

goodwanghan commented Feb 5, 2022 •

edited

Loading

[FEATURE] Support even partitioning on Dask: Phase 1 #303

[FEATURE] Support even partitioning on Dask: Phase 1 #303

Comments

goodwanghan commented Feb 5, 2022 • edited Loading

goodwanghan commented Feb 5, 2022 •

edited

Loading