-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11675][SQL] Remove shuffle hash joins. #9645
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val runFunc = (sqlContext: SQLContext) => { | ||
logWarning( | ||
s"Property ${SQLConf.Deprecated.SORTMERGE_JOIN} is deprecated and " + | ||
s"will be ignored. Unsafe mode will continue to be used.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
LGTM |
Test build #45691 has finished for PR 9645 at commit
|
Test build #2047 has finished for PR 9645 at commit
|
I'm going to merge this. |
asfgit
pushed a commit
that referenced
this pull request
Nov 12, 2015
Author: Reynold Xin <rxin@databricks.com> Closes #9645 from rxin/SPARK-11675. (cherry picked from commit e49e723) Signed-off-by: Reynold Xin <rxin@databricks.com>
dskrvk
pushed a commit
to dskrvk/spark
that referenced
this pull request
Nov 13, 2015
Author: Reynold Xin <rxin@databricks.com> Closes apache#9645 from rxin/SPARK-11675.
ghost
pushed a commit
to dbtsai/spark
that referenced
this pull request
Mar 18, 2016
## What changes were proposed in this pull request? ShuffledHashJoin (also outer join) is removed in 1.6, in favor of SortMergeJoin, which is more robust and also fast. ShuffledHashJoin is still useful in this case: 1) one table is much smaller than the other one, then cost to build a hash table on smaller table is smaller than sorting the larger table 2) any partition of the small table could fit in memory. This PR brings back ShuffledHashJoin, basically revert apache#9645, and fix the conflict. Also merging outer join and left-semi join into the same class. This PR does not implement full outer join, because it's not implemented efficiently (requiring build hash table on both side). A simple benchmark (one table is 5x smaller than other one) show that ShuffledHashJoin could be 2X faster than SortMergeJoin. ## How was this patch tested? Added new unit tests for ShuffledHashJoin. Author: Davies Liu <davies@databricks.com> Closes apache#11788 from davies/shuffle_join.
roygao94
pushed a commit
to roygao94/spark
that referenced
this pull request
Mar 22, 2016
## What changes were proposed in this pull request? ShuffledHashJoin (also outer join) is removed in 1.6, in favor of SortMergeJoin, which is more robust and also fast. ShuffledHashJoin is still useful in this case: 1) one table is much smaller than the other one, then cost to build a hash table on smaller table is smaller than sorting the larger table 2) any partition of the small table could fit in memory. This PR brings back ShuffledHashJoin, basically revert apache#9645, and fix the conflict. Also merging outer join and left-semi join into the same class. This PR does not implement full outer join, because it's not implemented efficiently (requiring build hash table on both side). A simple benchmark (one table is 5x smaller than other one) show that ShuffledHashJoin could be 2X faster than SortMergeJoin. ## How was this patch tested? Added new unit tests for ShuffledHashJoin. Author: Davies Liu <davies@databricks.com> Closes apache#11788 from davies/shuffle_join.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.