-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support broadcast exchange #342
Comments
One quick question regarding this, after those dataset are copied to each executor, should they kept in-memory or spilled to disk, if keep them in memory for a while, memory usage might be a concern. |
@mingmwang I think for broadcasting exchange the same thing applies as normal exchanges, they are spilled to disk by default and might be maintained in memory if memory budget allows. I believe Ballista doesn't support the latter yet(?). A limit may be chosen, like 100MB, so it will fit in memory most often, but even when written to disk for joins it might give impressive speedups as the other side of the join could be way larger. |
@Dandandan Sounds nice.
|
Today, For partitioned hash join, DataFusion already support CollectLeft model, I think it is similar to the Broadcast HashJoin. I do not get a chance to test it on Ballista yet, but I think it should work in the distribution model. The downside is
|
That's a good observation @mingmwang ! Indeed, I think the trade off is that doing a bit more on the left side (i.e. building the hash table in each worker) we save the work on the right side (shuffle). |
I added some details for implementing a broadcast join optimization rule here: #348 . |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Broadcasting partitions helps for when joins on the build side are small. In that case we can transform partitioned joins to broadcast joins.
Describe the solution you'd like
We should support broadcasts in the physical plan.
Broadcasting means copying the entire dataset to each worker.
This could be used in broadcast joins, i.e. by broadcasting smaller dataframes to every worker, which can provide big speedups as the other (big) side doesn't have to be shuffled.
Describe alternatives you've considered
Additional context
Probably we can reuse some heuristics from Spark for conditions when to perform broadcasting for joins.
The text was updated successfully, but these errors were encountered: