You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Other Spark accelerators, such as Spark RAPIDS and Apache Gluten, replace SortMergeJoin with ShuffleHashJoin for improved performance. We should evaluate this approach for Comet.
Spark RAPIDS
valENABLE_REPLACE_SORTMERGEJOIN= conf("spark.rapids.sql.replaceSortMergeJoin.enabled")
.doc("Allow replacing sortMergeJoin with HashJoin")
.booleanConf
.createWithDefault(true)
/** * If force ShuffledHashJoin, convert [[SortMergeJoinExec]] to [[ShuffledHashJoinExec]]. There is no * need to select a smaller table as buildSide here, it will be reselected when offloading.*/objectRewriteJoinextendsRewriteSingleNodewithJoinSelectionHelper {
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
It sounds reasonable. The vectorized implementation of SMJ looks inefficient in DataFusion. I'm not sure if there is any optimized algorithm for SMJ in vectorized execution. If not, using SHJ to replace SMJ will be good for performance.
What is the problem the feature request solves?
Other Spark accelerators, such as Spark RAPIDS and Apache Gluten, replace SortMergeJoin with ShuffleHashJoin for improved performance. We should evaluate this approach for Comet.
Spark RAPIDS
Apache Gluten
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: