[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by #26946

jackylee-ch · 2019-12-19T08:18:45Z

Why are the changes needed?

EnsureRequirements adds ShuffleExchangeExec (RangePartitioning) after Sort if RoundRobinPartitioning behinds it. This will cause 2 shuffles, and the number of partitions in the final stage is not the number specified by `RoundRobinPartitioning.

Example SQL

SELECT /*+ REPARTITION(5) */ * FROM test ORDER BY a

BEFORE

== Physical Plan ==
*(1) Sort [a#0 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(a#0 ASC NULLS FIRST, 200), true, [id=#11]
   +- Exchange RoundRobinPartitioning(5), false, [id=#9]
      +- Scan hive default.test [a#0, b#1], HiveTableRelation `default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#0, b#1]

AFTER

== Physical Plan ==
*(1) Sort [a#0 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(a#0 ASC NULLS FIRST, 5), true, [id=#11]
   +- Scan hive default.test [a#0, b#1], HiveTableRelation `default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#0, b#1]

Does this PR introduce any user-facing change?

No

How was this patch tested?

Run suite Tests and add new test for this.

Change-Id: I9ec887eece29abed048192b559f7d69a9e67afe3

Change-Id: If6b4c1f818c38b1862f69acc63f79feea127bbee

Change-Id: Ieb757a218588e2f35efd1b0eac4d076fb75eb1c8

jackylee-ch · 2019-12-19T08:26:06Z

cc @HyukjinKwon @cloud-fan @xuanyuanking

ulysses-you · 2019-12-19T10:10:10Z

I think we should add check at CollapseRepartition instead of EnsureRequirements if we want to prune repartition with sort.

cloud-fan · 2019-12-19T10:26:03Z

ok to test

jackylee-ch · 2019-12-19T11:47:39Z

I think we should add check at CollapseRepartition instead of EnsureRequirements if we want to prune repartition with sort.

@ulysses-you Adding RangePartitioning doesn't happen in optimizer, thus we can't check this in CollapseRepartition. Besides, It is not easy to keep partition number after pruning repartition with sort.

SparkQA · 2019-12-19T14:59:11Z

Test build #115563 has finished for PR 26946 at commit 02267db.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-12-19T18:49:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

+      case (ShuffleExchangeExec(partitioning: RoundRobinPartitioning, child, _),
+          distribution: OrderedDistribution) =>
+        ShuffleExchangeExec(
+          distribution.createPartitioning(partitioning.numPartitions), child)


Could you update like the following, @stczwd ?

- case (ShuffleExchangeExec(partitioning: RoundRobinPartitioning, child, _), - distribution: OrderedDistribution) => - ShuffleExchangeExec( - distribution.createPartitioning(partitioning.numPartitions), child) + case (ShuffleExchangeExec(partitioning, child, _), distribution) => + ShuffleExchangeExec(distribution.createPartitioning(partitioning.numPartitions), child)

You means this should work in other Partitioning? Let me run some test for it.

thanks, i have change my code

dongjoon-hyun · 2019-12-19T19:39:58Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+        DummySparkPlan(outputPartitioning = partitioning)))
+    val outputPlan = EnsureRequirements(spark.sessionState.conf).apply(inputPlan)
+    assert(outputPlan.find{
+      case e: ShuffleExchangeExec => e.outputPartitioning.isInstanceOf[RoundRobinPartitioning]


- case e: ShuffleExchangeExec => e.outputPartitioning.isInstanceOf[RoundRobinPartitioning] + case ShuffleExchangeExec(_: RoundRobinPartitioning, _, _) => true

dongjoon-hyun · 2019-12-19T19:40:11Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+        partitioning,
+        DummySparkPlan(outputPartitioning = partitioning)))
+    val outputPlan = EnsureRequirements(spark.sessionState.conf).apply(inputPlan)
+    assert(outputPlan.find{


.find{ -> .find {.

dongjoon-hyun · 2019-12-19T19:42:01Z

I updated the PR description a little, @stczwd .

ulysses-you · 2019-12-20T00:28:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

@@ -55,6 +55,10 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
        child
      case (child, BroadcastDistribution(mode)) =>
        BroadcastExchangeExec(mode, child)
+      case (ShuffleExchangeExec(partitioning: RoundRobinPartitioning, child, _),


How about use Partitioning instead of RoundRobinPartitioning . Since we already support this SELECT /*+ REPARTITION(5, a) */ * FROM test ORDER BY a.

thanks, I will change it

maropu · 2019-12-20T01:12:54Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

@@ -421,6 +421,24 @@ class PlannerSuite extends SharedSparkSession {
    }
  }

+  test("SPARK-30036: EnsureRequirements replace Exchange " +
+      "if child has SortExec and RoundRobinPartitioning") {


How about just saying Remove unnecessary RoundRobinPartitioning in the test title? Also, can you make the PR title obivious, too?

Good for test title, thanks.
But it is not suitable for PR title, there are other situations in this titile.

How about Avoid RoundRobinPartitioning that EnsureRequirements Redundantly adds?

Because HashPartitioning should also be concerned.

jackylee-ch · 2019-12-20T01:57:51Z

I updated the PR description a little, @stczwd .

@dongjoon-hyun thanks

Change-Id: I6b102f32b4084625875b395990e8ac4673c56bac

SparkQA · 2019-12-20T07:29:15Z

Test build #115598 has finished for PR 26946 at commit 56a1101.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala

cloud-fan · 2019-12-20T09:31:21Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    val outputPlan = EnsureRequirements(spark.sessionState.conf).apply(inputPlan)
+    assert(outputPlan.find {
+      case ShuffleExchangeExec(_: RoundRobinPartitioning, _, _) => true
+      case _ => false}.isEmpty,


nit:

...find { case ... case ... }.isEmpty

cloud-fan · 2019-12-20T09:32:01Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    val outputPlan = EnsureRequirements(spark.sessionState.conf).apply(inputPlan)
+    assert(outputPlan.find {
+      case ShuffleExchangeExec(_: HashPartitioning, _, _) => true
+      case _ => false}.isEmpty,


cloud-fan · 2019-12-20T09:32:34Z

shall we add an end-to-end test for SELECT /*+ REPARTITION(5) */ * FROM test ORDER BY a?

jackylee-ch · 2019-12-20T15:59:30Z

shall we add an end-to-end test for SELECT /*+ REPARTITION(5) */ * FROM test ORDER BY a?
@cloud-fan The modification of this patch only affects the number of final partitions and does not affect the overall result. Is it enough to check the rule? Or, maybe check the number of partitions?

Change-Id: I0b55a61e1a9ac3555177322ac44d2b216d45bd24

SparkQA · 2019-12-20T20:46:06Z

Test build #115630 has finished for PR 26946 at commit 5915a12.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-12-21T23:42:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

+      case (ShuffleExchangeExec(partitioning, child, _), distribution: OrderedDistribution) =>
+        ShuffleExchangeExec(distribution.createPartitioning(partitioning.numPartitions), child)


This considers a special case for OrderedDistribution. Generally, if ShuffleExchangeExec is followed by any unsatisfying distribution , we should always trim the ShuffleExchangeExec and apply the partitioning of distribution. Don't we?

Sound reasonable. Any suitable cases?

I just tried few possible cases, but can not have a concrete case like this. Maybe this is the only case possibly. So I think this should be fine.

cloud-fan · 2019-12-23T05:29:52Z

We can add an end-to-end test, check the physical plan of a query, and count shuffles.

jackylee-ch · 2019-12-23T07:21:22Z

We can add an end-to-end test, check the physical plan of a query, and count shuffles.

Sure，I will add some tests for these cases.

cloud-fan · 2019-12-23T07:51:36Z

sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala

-      // Range has range partitioning in its output now. To have a range shuffle, we
-      // need to run a repartition first.
-      val data = spark.range(0, n, 1, 1).repartition(10).sort($"id".desc)
+      // Range has range partitioning in its output now.


shall we remove this comment now? it's not useful as we do add shuffle, the range output partitioning doesn't matter.

cloud-fan · 2019-12-23T07:52:26Z

sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala

@@ -55,12 +54,12 @@ class ConfigBehaviorSuite extends QueryTest with SharedSparkSession {

    withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> numPartitions.toString) {
      // The default chi-sq value should be low
-      assert(computeChiSquareTest() < 100)
+      assert(computeChiSquareTest() < 10)


the physical plan is same as before, what caused this change?

They are not same, we had two shuffles before, one was RoundRobinPartitioning, the other was RangePartitioning.

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

SparkQA · 2019-12-24T08:05:02Z

Test build #115689 has finished for PR 26946 at commit 52ce660.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-12-24T08:35:43Z

retest this please

HyukjinKwon · 2019-12-24T08:39:57Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

@@ -421,6 +421,52 @@ class PlannerSuite extends SharedSparkSession {
    }
  }

+  test("SPARK-30036: Romove unnecessary RoundRobinPartitioning " +


nit Romove -> Remove

Change-Id: I175b3824ba9ce46fba0ebba6ebf0b220d64de42c

HyukjinKwon · 2019-12-24T09:06:43Z

Looks fine to me

SparkQA · 2019-12-24T09:46:15Z

Test build #115719 has finished for PR 26946 at commit d2615b6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-24T09:52:31Z

retest this please

SparkQA · 2019-12-24T12:53:14Z

Test build #115718 has finished for PR 26946 at commit 52ce660.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-24T13:20:09Z

Test build #115729 has finished for PR 26946 at commit d2615b6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-12-25T11:28:27Z

retest this please

SparkQA · 2019-12-25T15:29:03Z

Test build #115766 has finished for PR 26946 at commit d2615b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-27T04:00:00Z

Merged to master, I guess :-).

cloud-fan · 2019-12-27T05:39:21Z

yea merged to master!

ulysses-you · 2020-01-03T10:41:08Z

Wait, another point.
First this scence also exists in join or window operator, as @viirya say, exists other Distribution.
For e.g. join

val df = spark.range(1, 10, 2)
df.join(df.repartition(10), Seq("id"), "left").explain(true)

// physical plan like this
= Physical Plan ==
*(5) Project [id#0L]
+- SortMergeJoin [id#0L], [id#83L], LeftOuter
   :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(id#0L, 200), true, [id=#378]
   :     +- *(1) Range (1, 10, step=1, splits=40)
   +- *(4) Sort [id#83L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(id#83L, 200), true, [id=#384]
         +- Exchange RoundRobinPartitioning(10), false, [id=#383]
            +- *(3) Range (1, 10, step=1, splits=40)

And then there is a little difference between 2 -> 10 -> 200 and 2 -> 10 because of different operator complexity. Repartition may be is a light operator compare with sort or join or else algorithm. So it's not sure 2 -> 10 is always run faster than 2 -> 10 -> 200.

The last, if end user really want result partition is 10, should use df.sort("id").repartition(10) instead, not the df.repartition(10).sort("id"). Pruning shuffle may mislead user.

cc @HyukjinKwon @cloud-fan @maropu

cloud-fan · 2020-01-03T13:23:59Z

for join, it doesn't require OrderedDistribution, but HashClusteredDistribution.

This PR only affects sort.

jackylee-ch · 2020-01-03T13:28:40Z

if end user really want result partition is 10, should use df.sort("id").repartition(10) instead, not the df.repartition(10).sort("id"). Pruning shuffle may mislead user.

df.sort("id").repartition(10) returns wrong result. Global sort result would be repartitioned with disordered.

ulysses-you · 2020-01-04T11:54:48Z

This PR only affects sort.

Yes it is. But it is similar with outer join. df.join(df.repartition(10), Seq("id"), "left") result the 200 partitions, and now df.repartition(10).sort("id") result the 10 partitions. Should they be same ?

ulysses-you · 2020-01-04T12:00:08Z

df.sort("id").repartition(10) returns wrong result. Global sort result would be repartitioned with disordered.

Sorry for the wrong example. I mean user should use the right way to change partition. Obviously df.repartition(10).sort("id") should return the spark sql shuffle partitions.

jackylee-ch · 2020-01-06T07:26:58Z

Yes it is. But it is similar with outer join. df.join(df.repartition(10), Seq("id"), "left") result the 200 partitions, and now df.repartition(10).sort("id") result the 10 partitions. Should they be same ?
Sorry for the wrong example. I mean user should use the right way to change partition. Obviously df.repartition(10).sort("id") should return the spark sql shuffle partitions.

Thanks for pay attention on this. The main problem you described is whether we should change partition num for OrderedDistribution.
Hm, it's you add REPARTITION hint in SPARK-28746, you may know what it means to users. In other cases, REPARTITION hint will change result partition number with shuffles, but it didn't work with order by, which confused users. REPARTITION is a great way for users to control final result num, we should keep it works on every queries.
Besides, sort("id") is a global OrderDistribution, which usually generate the final result. It is not easy to set partition number with defaultShufflePartitions, especially on large queries with multiple shuffles.
Finally, changing partition num is good way for use to control shuffle and final results with df.repartition(10).sort("id"). Users may won't write df.repartition(10).sort("id") unless they want change the final partition num. It is not a normal case in other scenes.

Correct me if I'm wrong. Thanks

ulysses-you · 2020-01-06T09:38:04Z

I see you want to make a way that change partition easily after sort.

Only one thing I not sure. If df.repartition(10).sort("id") result the 10 partitions, user will think df.repartition(10).(what need shuffle operator) result 10 partitions too but actually not. It's a special handle for sort.

I don not know how committer think about it, or it's just fine.

cloud-fan · 2020-01-06T11:13:32Z

what .sort should guarantee is that the output is ordered, and users shouldn't care about the number of partitions.

It's more efficient to shuffle only once for query df.repartition(10).sort("id"). This is just an optimization and nothing about semantic.

For df.join(df.repartition(10), Seq("id"), "left"), again we don't care about the number of result partitions. If there is a way to save shuffles, please propose.

maryannxue · 2020-01-27T19:34:19Z

Think we should revert this PR. The change in test ConfigBehaviorSuite cannot be fully justified. Plus, this is not the right approach to fix this kind of issue.

cloud-fan · 2020-01-31T18:50:23Z

After more thoughts, I think it's wrong to use optimization to fix a bug.

Looking into the bug, the issue is: the Repartition operator added by the hint is under the Sort operator, not above it. This is because our parser treats ORDER BY as the last clause, while the hint is associated with the SELECT clause. The parser rule is like SELECT ... UNION/INTERSECT SELECT ... ORDER BY. That's why we add the Sort operator at the end.

I think #27096 is in the right way to optimize redundant shuffles, but we still need to fix the bug about how to handle hints in the parser.

I'm reverting this. Let's fix the bug in the parser.

### Why are the changes needed? `EnsureRequirements` adds `ShuffleExchangeExec` (RangePartitioning) after Sort if `RoundRobinPartitioning` behinds it. This will cause 2 shuffles, and the number of partitions in the final stage is not the number specified by `RoundRobinPartitioning. **Example SQL** ``` SELECT /*+ REPARTITION(5) */ * FROM test ORDER BY a ``` **BEFORE** ``` == Physical Plan == *(1) Sort [a#0 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(a#0 ASC NULLS FIRST, 200), true, [id=apache#11] +- Exchange RoundRobinPartitioning(5), false, [id=apache#9] +- Scan hive default.test [a#0, b#1], HiveTableRelation `default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#0, b#1] ``` **AFTER** ``` == Physical Plan == *(1) Sort [a#0 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(a#0 ASC NULLS FIRST, 5), true, [id=apache#11] +- Scan hive default.test [a#0, b#1], HiveTableRelation `default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#0, b#1] ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? Run suite Tests and add new test for this. Closes apache#26946 from stczwd/RoundRobinPartitioning. Lead-authored-by: lijunqing <lijunqing@baidu.com> Co-authored-by: stczwd <qcsd2011@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

jackylee-ch and others added 3 commits December 19, 2019 12:44

[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by

97cb91d

Change-Id: I9ec887eece29abed048192b559f7d69a9e67afe3

add issue number in test message

4c9b6b1

Change-Id: If6b4c1f818c38b1862f69acc63f79feea127bbee

fix suite tests

02267db

Change-Id: Ieb757a218588e2f35efd1b0eac4d076fb75eb1c8

dongjoon-hyun reviewed Dec 19, 2019

View reviewed changes

dongjoon-hyun added the SQL label Dec 19, 2019

dongjoon-hyun reviewed Dec 19, 2019

View reviewed changes

ulysses-you reviewed Dec 20, 2019

View reviewed changes

maropu reviewed Dec 20, 2019

View reviewed changes

suite repartition(int, Seq[expression])

56a1101

Change-Id: I6b102f32b4084625875b395990e8ac4673c56bac

cloud-fan reviewed Dec 20, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala Show resolved Hide resolved

cloud-fan reviewed Dec 20, 2019

View reviewed changes

change code style

5915a12

Change-Id: I0b55a61e1a9ac3555177322ac44d2b216d45bd24

viirya reviewed Dec 21, 2019

View reviewed changes

cloud-fan reviewed Dec 23, 2019

View reviewed changes

cloud-fan reviewed Dec 24, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala Show resolved Hide resolved

HyukjinKwon reviewed Dec 24, 2019

View reviewed changes

change comments

d2615b6

Change-Id: I175b3824ba9ce46fba0ebba6ebf0b220d64de42c

cloud-fan closed this in a2de20c Dec 27, 2019

jackylee-ch deleted the RoundRobinPartitioning branch January 6, 2020 06:21

ulysses-you mentioned this pull request Jan 14, 2020

[SPARK-28148][SQL] Repartition after join is not optimized away #27096

Closed

HyukjinKwon mentioned this pull request Sep 28, 2020

[SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements #29677

Closed

		case (ShuffleExchangeExec(partitioning, child, _), distribution: OrderedDistribution) =>
		ShuffleExchangeExec(distribution.createPartitioning(partitioning.numPartitions), child)

[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by #26946

[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by #26946

Conversation

jackylee-ch commented Dec 19, 2019 • edited by dongjoon-hyun Loading

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

jackylee-ch commented Dec 19, 2019

ulysses-you commented Dec 19, 2019

cloud-fan commented Dec 19, 2019

jackylee-ch commented Dec 19, 2019 • edited Loading

SparkQA commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackylee-ch commented Dec 20, 2019

SparkQA commented Dec 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Dec 20, 2019

jackylee-ch commented Dec 20, 2019

SparkQA commented Dec 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Dec 23, 2019 • edited Loading

Choose a reason for hiding this comment

cloud-fan commented Dec 23, 2019

jackylee-ch commented Dec 23, 2019 • edited Loading

cloud-fan Dec 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 24, 2019

cloud-fan commented Dec 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HyukjinKwon commented Dec 24, 2019

SparkQA commented Dec 24, 2019

HyukjinKwon commented Dec 24, 2019

SparkQA commented Dec 24, 2019

SparkQA commented Dec 24, 2019

cloud-fan commented Dec 25, 2019

SparkQA commented Dec 25, 2019

HyukjinKwon commented Dec 27, 2019

cloud-fan commented Dec 27, 2019

ulysses-you commented Jan 3, 2020 • edited Loading

cloud-fan commented Jan 3, 2020

jackylee-ch commented Jan 3, 2020

ulysses-you commented Jan 4, 2020

ulysses-you commented Jan 4, 2020

jackylee-ch commented Jan 6, 2020

ulysses-you commented Jan 6, 2020 • edited Loading

cloud-fan commented Jan 6, 2020

maryannxue commented Jan 27, 2020

cloud-fan commented Jan 31, 2020

jackylee-ch commented Dec 19, 2019 •

edited by dongjoon-hyun

Loading

jackylee-ch commented Dec 19, 2019 •

edited

Loading

viirya Dec 23, 2019 •

edited

Loading

jackylee-ch commented Dec 23, 2019 •

edited

Loading

cloud-fan Dec 23, 2019 •

edited

Loading

ulysses-you commented Jan 3, 2020 •

edited

Loading

ulysses-you commented Jan 6, 2020 •

edited

Loading