[SPARK-32201][SQL] More general skew join pattern matching #29021

LantaoJin · 2020-07-07T07:25:22Z

What changes were proposed in this pull request?

Current the AQE skew join handling logic is very specified.
It only can handle the pattern like this (2 tables):

  SMJ
     Sort
       Shuffle
     Sort
       Shuffle

We propose a more general skew Join pattern matching patch with less code changes.
In this patch, we can handle N-table join, join with aggregation, and so on.
PS: Here, N tables SMJ will be optimized to N-1 BCJ + 1 SMJ after AE. This PR won't handle the case N SMJ after AE. I will handle it in another PR.

Why are the changes needed?

In our production user cases, we found lots of slow jobs due to data skewing even we have enabled AQE skewed join. After investigated their patterns, we found current skewed join handle logic is so specified which can satisfied less production queries. The production queries are much more complicated than this pattern.

  SMJ
     Sort
       Shuffle
     Sort
       Shuffle

A straightforward case I will introduce here:

In above plan, there are 5 tables join case. This is not a simple case could be matched by the above pattern. But we still could see it is very similar with the pattern if we removed all the red boxes.

From the stage graph, the plan is much more straightforward:

The green boxes pattern is what we want to handle whatever red boxes exist or not.

Does this PR introduce any user-facing change?

No

How was this patch tested?

We give two unit tests.

2 tables SMJ with aggregation, we can handle the side of skew join without agg.

Before:

After:

N tables join with BCJ(AE changed SMJ to BCJ), we can handle both sides of skew join.

Before:

After:

SparkQA · 2020-07-07T07:33:56Z

Test build #125187 has finished for PR 29021 at commit 479d56b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

LantaoJin · 2020-07-07T07:40:04Z

Also this PR can work with #28947 to match more pattern together.

cloud-fan · 2020-07-07T08:39:18Z

SMJ
     Sort
       Shuffle
     Sort
       HashAggregate
         Shuffle

This is an interesting use case. We must be careful when dealing with it. The key of skew join handling is to split the skew partition into smaller parts. For HashAggregate, I'm not sure if this works, as now the values of the same key may exist in different after-split partitions. This makes HashAggregate incorrect, as it requires the values of the same key stay in one partition so that it can group by the key.

LantaoJin · 2020-07-07T08:45:24Z

Yes. You are correct. I have recognized this case. I will should skip aggregation :(

SparkQA · 2020-07-07T11:33:12Z

Test build #125189 has finished for PR 29021 at commit 947927a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-07T16:55:12Z

Test build #125215 has finished for PR 29021 at commit 607eb08.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-08T05:43:41Z

Test build #125293 has finished for PR 29021 at commit 80bef0d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

SparkQA · 2020-07-10T13:06:59Z

Test build #125582 has finished for PR 29021 at commit dd09e70.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CoalescedHashPartitioning(

SparkQA · 2020-07-11T07:05:01Z

Test build #125667 has finished for PR 29021 at commit 76dc3ea.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-15T20:32:42Z

Test build #125893 has finished for PR 29021 at commit 7465bfa.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2020-07-16T12:39:18Z

Test build #125950 has finished for PR 29021 at commit 42a52a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-16T15:56:49Z

Test build #125961 has finished for PR 29021 at commit 973d87e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-16T17:38:48Z

Test build #125968 has finished for PR 29021 at commit 6f7dbc2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-16T18:39:06Z

Test build #125967 has finished for PR 29021 at commit d65a210.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LantaoJin · 2020-07-17T02:53:37Z

Gentle ping @cloud-fan

JkSelf · 2020-07-17T07:37:42Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala

@@ -142,6 +142,14 @@ object OptimizeLocalShuffleReader {
  def canUseLocalShuffleReader(plan: SparkPlan): Boolean = plan match {
    case s: ShuffleQueryStageExec =>
      s.shuffle.canChangeNumPartitions
+    // This CustomShuffleReaderExec used in skew side, its numPartitions increased.


It means the rule of OptimizeLocalShuffleReader is disabled when enable the rule of OptimizedSkwedJoin rule ?

Not exactly. In this more general skew join handling, we can match more patterns. For example, we can handle skew join like https://user-images.githubusercontent.com/1853780/87743215-01e9e780-c81b-11ea-97d9-f274b379912e.png. The number partitions of CustomShuffleReader in the the BCJ (changed from SMJ by AE) after OptimizeLocalShuffleReader is not equals to the anther side. So simply, I disable createLocalReader.

JkSelf · 2020-07-17T07:38:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala

@@ -340,3 +340,28 @@ case class BroadcastPartitioning(mode: BroadcastMode) extends Partitioning {
    case _ => false
  }
 }
+
+/**


Why we need to add a new Partitioning ?

CoalescedHashPartitioning can satisfy the ClusteredDistribution because a skew join may match the case which contains Aggregation (non-skew side). UnknownPartitioning cannot satisfy ClusteredDistribution and add an additional shuffle.

Hi @JkSelf I will provide another approach that removes this CoalescedHashPartitioning and simplify the code. But current implementation with CoalescedHashPartitioning might be more general for more cases.

SparkQA · 2020-07-20T12:21:39Z

Test build #126165 has finished for PR 29021 at commit 3cd411f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-21T12:51:15Z

Test build #126238 has finished for PR 29021 at commit 5bed68c.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

LantaoJin · 2020-07-22T01:49:54Z

retest this please

SparkQA · 2020-07-22T07:05:01Z

Test build #126291 has finished for PR 29021 at commit 5bed68c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

LantaoJin · 2020-07-22T07:18:04Z

retest this please

SparkQA · 2020-07-22T13:12:11Z

Test build #126315 has finished for PR 29021 at commit 5bed68c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LantaoJin · 2020-07-24T03:08:47Z

Hi @cloud-fan @JkSelf , please help to review this PR. I am going to file a new PR for handling three tables SMJ skew which scope is beyond this PR.

cloud-fan · 2020-07-24T03:27:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

@@ -263,18 +299,31 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] {
    val shuffleStages = collectShuffleStages(plan)

    if (shuffleStages.length == 2) {


why not we break this limitation first?

Because this PR is not to address the case which has multiple SMJ. We have another PR to change this limitation:

optimizeSingleStageSkewJoin. This is the case one table is a bucket table and the SMJ is bucketing join with one side shuffle and skewing

optimizeThreeShuffleStageSkewJoin. This is to address three tables SMJ (Two SMJs in one stage and no one can be changed to BCJ in AQE).

cloud-fan · 2020-07-24T03:29:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala

+        val right = rightOpt.get
+        assert(left.partitionsWithSizes.length == right.partitionsWithSizes.length)
+        val numPartitions = left.partitionsWithSizes.length
+        // We use the median size of the original shuffle partitions to detect skewed partitions.


This PR is very hard to reason about. We need to clearly define:

what nodes can appear between the shuffle stage and SMJ. As we discussed before, Agg can't appear at the skew side.

how to estimate the size? Since there are nodes in the middle, the stats of the shuffle stage may not be accurate for the final join child. (e.g. Filter in the middle)

what nodes can appear between the shuffle stage and SMJ. As we discussed before, Agg can't appear at the skew side.

In the canSplitLeftSide and canSplitRightSide, I added a allUnspecifiedDistribution(plan) check. Current we only support the nodes with UnspecifiedDistribution.

how to estimate the size? Since there are nodes in the middle, the stats of the shuffle stage may not be accurate for the final join child. (e.g. Filter in the middle)

Filter should be pushdown to leaf, I didn't see this user case. Project may be a command case in the middle? Yes. the input size of shuffle stage may not be accurate. But the disadvantage is launching more tasks. I think the benefit from handling the skewing is more important than the disadvantage.

github-actions · 2020-11-02T00:34:57Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-32201][SQL] More general skew join pattern matching

479d56b

probot-autolabeler bot added the SQL label Jul 7, 2020

LantaoJin commented Jul 7, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Show resolved Hide resolved

remove the println

947927a

LantaoJin mentioned this pull request Jul 7, 2020

[SPARK-32129][SQL] Support AQE skew join with Union #28947

Closed

LantaoJin changed the title ~~[SPARK-32201][SQL] More general skew join pattern matching~~ [WIP][SPARK-32201][SQL] More general skew join pattern matching Jul 7, 2020

cannot split if AggExec in one side

607eb08

add more agg exec

80bef0d

LantaoJin commented Jul 8, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala Outdated Show resolved Hide resolved

Add a new parititioning CoalescedHashPartitioning

dd09e70

fix ut

76dc3ea

more general canSplitXSide

7465bfa

LantaoJin added 3 commits July 16, 2020 13:31

fix ut

0950e9a

Merge branch 'master' into SPARK-32201

42a52a1

add another skew test case

973d87e

LantaoJin changed the title ~~[WIP][SPARK-32201][SQL] More general skew join pattern matching~~ [SPARK-32201][SQL] More general skew join pattern matching Jul 16, 2020

canUseLocalShuffleReader should consider skew optimization

d65a210

LantaoJin changed the title ~~[SPARK-32201][SQL] More general skew join pattern matching~~ [WIP][SPARK-32201][SQL] More general skew join pattern matching Jul 16, 2020

fix ut

6f7dbc2

LantaoJin changed the title ~~[WIP][SPARK-32201][SQL] More general skew join pattern matching~~ [SPARK-32201][SQL] More general skew join pattern matching Jul 17, 2020

JkSelf reviewed Jul 17, 2020

View reviewed changes

LantaoJin added 2 commits July 20, 2020 15:48

approach 2: remove 'CoalescedHashPartitioning' and refine code

aa3af93

remove dead code

3cd411f

Merge remote-tracking branch 'upstream/master' into SPARK-32201

5bed68c

cloud-fan reviewed Jul 24, 2020

View reviewed changes

github-actions bot added the Stale label Nov 2, 2020

github-actions bot closed this Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32201][SQL] More general skew join pattern matching #29021

[SPARK-32201][SQL] More general skew join pattern matching #29021

LantaoJin commented Jul 7, 2020 •

edited

Loading

SparkQA commented Jul 7, 2020

LantaoJin commented Jul 7, 2020

cloud-fan commented Jul 7, 2020 •

edited

Loading

LantaoJin commented Jul 7, 2020

SparkQA commented Jul 7, 2020

SparkQA commented Jul 7, 2020

SparkQA commented Jul 8, 2020

SparkQA commented Jul 10, 2020

SparkQA commented Jul 11, 2020

SparkQA commented Jul 15, 2020

SparkQA commented Jul 16, 2020

SparkQA commented Jul 16, 2020

SparkQA commented Jul 16, 2020

SparkQA commented Jul 16, 2020

LantaoJin commented Jul 17, 2020

JkSelf Jul 17, 2020

LantaoJin Jul 17, 2020 •

edited

Loading

JkSelf Jul 17, 2020

LantaoJin Jul 17, 2020 •

edited

Loading

LantaoJin Jul 20, 2020

SparkQA commented Jul 20, 2020

SparkQA commented Jul 21, 2020

LantaoJin commented Jul 22, 2020

SparkQA commented Jul 22, 2020

LantaoJin commented Jul 22, 2020

SparkQA commented Jul 22, 2020

LantaoJin commented Jul 24, 2020

cloud-fan Jul 24, 2020

LantaoJin Jul 24, 2020

cloud-fan Jul 24, 2020

LantaoJin Jul 24, 2020 •

edited

Loading

github-actions bot commented Nov 2, 2020

		@@ -263,18 +299,31 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] {
		val shuffleStages = collectShuffleStages(plan)

		if (shuffleStages.length == 2) {

[SPARK-32201][SQL] More general skew join pattern matching #29021

[SPARK-32201][SQL] More general skew join pattern matching #29021

Conversation

LantaoJin commented Jul 7, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jul 7, 2020

LantaoJin commented Jul 7, 2020

cloud-fan commented Jul 7, 2020 • edited Loading

LantaoJin commented Jul 7, 2020

SparkQA commented Jul 7, 2020

SparkQA commented Jul 7, 2020

SparkQA commented Jul 8, 2020

SparkQA commented Jul 10, 2020

SparkQA commented Jul 11, 2020

SparkQA commented Jul 15, 2020

SparkQA commented Jul 16, 2020

SparkQA commented Jul 16, 2020

SparkQA commented Jul 16, 2020

SparkQA commented Jul 16, 2020

LantaoJin commented Jul 17, 2020

JkSelf Jul 17, 2020

Choose a reason for hiding this comment

LantaoJin Jul 17, 2020 • edited Loading

Choose a reason for hiding this comment

JkSelf Jul 17, 2020

Choose a reason for hiding this comment

LantaoJin Jul 17, 2020 • edited Loading

Choose a reason for hiding this comment

LantaoJin Jul 20, 2020

Choose a reason for hiding this comment

SparkQA commented Jul 20, 2020

SparkQA commented Jul 21, 2020

LantaoJin commented Jul 22, 2020

SparkQA commented Jul 22, 2020

LantaoJin commented Jul 22, 2020

SparkQA commented Jul 22, 2020

LantaoJin commented Jul 24, 2020

cloud-fan Jul 24, 2020

Choose a reason for hiding this comment

LantaoJin Jul 24, 2020

Choose a reason for hiding this comment

cloud-fan Jul 24, 2020

Choose a reason for hiding this comment

LantaoJin Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Nov 2, 2020

LantaoJin commented Jul 7, 2020 •

edited

Loading

cloud-fan commented Jul 7, 2020 •

edited

Loading

LantaoJin Jul 17, 2020 •

edited

Loading

LantaoJin Jul 17, 2020 •

edited

Loading

LantaoJin Jul 24, 2020 •

edited

Loading