[SPARK-34637] [SQL] Support DPP + AQE when the broadcast exchange can be reused #31756

JkSelf · 2021-03-05T07:51:46Z

What changes were proposed in this pull request?

We have supported DPP in AQE when the join is Broadcast hash join before applying the AQE rules in SPARK-34168, which has some limitations. It only apply DPP when the small table side executed firstly and then the big table side can reuse the broadcast exchange in small table side. This PR is to address the above limitations and can apply the DPP when the broadcast exchange can be reused.

Why are the changes needed?

Resolve the limitations when both enabling DPP and AQE

Does this PR introduce any user-facing change?

No

How was this patch tested?

Adding new ut

JkSelf · 2021-03-05T07:52:19Z

@cloud-fan Please help me review if you have available time. Thanks for your help.

SparkQA · 2021-03-05T10:08:09Z

Test build #135798 has finished for PR 31756 at commit 547ac92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maryannxue · 2021-03-09T22:57:02Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

@@ -1409,4 +1409,17 @@ class DynamicPartitionPruningSuiteAEOff extends DynamicPartitionPruningSuiteBase
  with DisableAdaptiveExecutionSuite

 class DynamicPartitionPruningSuiteAEOn extends DynamicPartitionPruningSuiteBase
-  with EnableAdaptiveExecutionSuite
+  with EnableAdaptiveExecutionSuite {
+  test("simple inner join triggers DPP with mock-up tables test") {


What does this test?

Only for debug. I will remove this test later.

cloud-fan · 2021-03-10T06:33:35Z

Can you briefly introduce your approach?

JkSelf · 2021-03-10T07:17:35Z

@cloud-fan
This approach mainly contain three steps.

Find the reused exchange. If exist, it will apply the DPP filter.
In order to reuse the exchange stored in AdaptiveExecutionContext#stageCache, we wrap the AdaptiveSparkPlanExec plan in SubqueryAdaptiveBroadcastExec.
SubqueryAdaptiveBroadcastExec#executeCollect will reuse the exchange in runtime by calling the AdaptiveSparkPlanExec#getFinalPhysicalPlan()

cloud-fan · 2021-03-18T13:08:29Z

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala

-            SubqueryBroadcastExec(name, index, buildKeys, reuseQueryStage)
+
+        val canReuseExchange = conf.exchangeReuseEnabled && buildKeys.nonEmpty &&
+          plan.find {


PlanAdaptiveDynamicPruningFilters is a stage optimization rule and the input plan is only a small piece of the plan tree (for one stage). I think we should put the entire plan as a parameter of this rule, when creating this rule in AdaptiveSparkPlanExec

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala

SparkQA · 2021-03-30T07:25:25Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41278/

SparkQA · 2021-03-30T09:02:57Z

Test build #136696 has finished for PR 31756 at commit ae6fe64.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

github-actions · 2021-04-15T08:38:17Z

Test build #751342356 for PR 31756 at commit 1ed23c5.

SparkQA · 2021-04-15T08:43:48Z

Test build #137409 has finished for PR 31756 at commit 1ed23c5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

github-actions · 2021-04-15T08:50:27Z

Test build #751377372 for PR 31756 at commit 3bc4baf.

SparkQA · 2021-04-15T09:38:45Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41986/

SparkQA · 2021-04-15T09:38:46Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41986/

SparkQA · 2021-04-15T09:39:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41985/

SparkQA · 2021-04-15T09:44:56Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41985/

cloud-fan · 2021-04-15T10:09:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

@@ -310,6 +310,11 @@ case class AdaptiveSparkPlanExec(
    rdd
  }

+  override def doExecuteBroadcast[T](): broadcast.Broadcast[T] = {
+    val broadcastPlan = getFinalPhysicalPlan()
+    broadcastPlan.doExecuteBroadcast()


nit: getFinalPhysicalPlan().doExecuteBroadcast()

cloud-fan · 2021-04-15T10:10:21Z

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala


 /**
 * A rule to insert dynamic pruning predicates in order to reuse the results of broadcast.
 */
 case class PlanAdaptiveDynamicPruningFilters(
-    stageCache: TrieMap[SparkPlan, QueryStageExec]) extends Rule[SparkPlan] {
+    originalPlan: SparkPlan) extends Rule[SparkPlan] {


cloud-fan · 2021-04-15T10:11:30Z

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala

+            case _ => false
+          }.isDefined
+
+        if(canReuseExchange) {


nit: if (canReuseExchange)

cloud-fan · 2021-04-15T10:12:10Z

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala

@@ -41,15 +40,26 @@ case class PlanAdaptiveDynamicPruningFilters(
          adaptivePlan: AdaptiveSparkPlanExec), exprId, _)) =>
        val packedKeys = BindReferences.bindReferences(


we can move this into if (canReuseExchange)

cloud-fan · 2021-04-15T10:13:16Z

In general LGTM, can we add some tests?

github-actions · 2021-04-15T12:21:34Z

Test build #751980907 for PR 31756 at commit 657c61b.

SparkQA · 2021-04-15T12:28:22Z

Test build #137410 has finished for PR 31756 at commit 3bc4baf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-04-19T14:35:14Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

+      //          +- HashAggregate
+      //             +- Filter
+      //                +- FileScan
+      //                   +- SubqueryBroadcast


subquery has different symbols in the tree string format. please try to explain some plans locally and update this comment.

cloud-fan · 2021-04-19T14:35:48Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

+      //                +- FileScan
+      //                   +- SubqueryBroadcast
+      //                      +- AdaptiveSparkPlan
+      //                         +- BroadcastQueryStage


there is no other place to reuse this broadcast, right?

This broadcast only be reused in the build side.

tgravescs · 2021-04-28T14:49:24Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

+      //                +- FileScan
+      //                   +- SubqueryBroadcast
+      //                      +- AdaptiveSparkPlan
+      //                         +- BroadcastQueryStage


how is broadcast before the FileScan? what is being broadcast

This broadcast is in the DPP subquery of the FileScan. It will broadcast the results of the build side and then prune the dataset.

Can you make it more clear that this is the subquery in the file scan node, not the child of it?

tgravescs · 2021-04-30T13:04:20Z

@JkSelf will you have time to look at the questions and comments?

JkSelf · 2021-05-06T09:03:49Z

@tgravescs Sorry for the delay responses.

SparkQA · 2021-05-06T09:55:44Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42724/

SparkQA · 2021-05-06T09:55:46Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42724/

tgravescs · 2021-05-06T13:12:51Z

made this comment in one of the thread but it got collapse so making it again.
Could you add more description to the PR as to what you are doing and how this solves the problem?

SparkQA · 2021-05-06T13:27:25Z

Test build #138203 has finished for PR 31756 at commit 6b07c84.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JkSelf · 2021-05-07T04:02:45Z

@tgravescs
This PR is mainly to solve the limitations of PR#31258. When DPP + AQE is supported in PR#31258, only the broadcast exchange on the build side can be executed first. Then the probe side can reuse the exchange of the build side in the DPP subquery, otherwise DPP will not be supported in AQE.

This approach mainly contain two steps.

In PlanAdaptiveDynamicPruningFilters rule, judge whether the broadcast exchange can be reused, if so, it will insert the DPP subquery filter on the probe side.
Create a AdaptiveSparkPlanExec with the broadcast exchange and then we can reuse the existing reuse logic to reuse the broadcast exchange in AdaptiveSparkPlanExec plan。

cloud-fan · 2021-05-10T14:48:58Z

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala

+        if (canReuseExchange) {
+          exchange.setLogicalLink(adaptivePlan.executedPlan.logicalLink.get)
+          val newAdaptivePlan = AdaptiveSparkPlanExec(
+            exchange, adaptivePlan.context, adaptivePlan.preprocessingRules, true)


ditto: adaptivePlan.copy(inputPlan = exchange)

cloud-fan · 2021-05-10T14:49:47Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

@@ -1463,6 +1474,37 @@ abstract class DynamicPartitionPruningSuiteBase
      }
    }
  }
+
+  test("SPARK-34637: test DPP side broadcast query stage is created firstly") {


nit: SPARK-34637: DPP ... remove the test

cloud-fan · 2021-05-10T14:50:26Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

+  test("SPARK-34637: test DPP side broadcast query stage is created firstly") {
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST_ONLY.key -> "true") {
+      val df = sql(
+        """ WITH view1 as (


view1 -> v?

cloud-fan · 2021-05-10T14:57:11Z

sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala

+      //          +- HashAggregate
+      //             +- Filter
+      //                +- FileScan
+      //                      Dynamicpruning Subquery


Did try to explain some queries locally? If you did you should see how subqueries are displayed. For example, select 1, (select 2):

Project [1 AS 1#7, scalar-subquery#6 [] AS scalarsubquery()#9] : +- Project [2 AS 2#8] : +- OneRowRelation +- OneRowRelation

Yes. In this case, it looks like the following:

SparkQA · 2021-05-11T08:26:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42890/

SparkQA · 2021-05-11T08:26:17Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42890/

SparkQA · 2021-05-11T11:10:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42899/

SparkQA · 2021-05-11T11:10:12Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42899/

SparkQA · 2021-05-11T11:45:15Z

Test build #138367 has finished for PR 31756 at commit 4ccd4b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-11T14:41:22Z

Test build #138376 has finished for PR 31756 at commit 701f1c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class PlanAdaptiveDynamicPruningFilters(

cloud-fan · 2021-05-13T13:06:54Z

thanks, merging to master!

…be reused We have supported DPP in AQE when the join is Broadcast hash join before applying the AQE rules in [SPARK-34168](https://issues.apache.org/jira/browse/SPARK-34168), which has some limitations. It only apply DPP when the small table side executed firstly and then the big table side can reuse the broadcast exchange in small table side. This PR is to address the above limitations and can apply the DPP when the broadcast exchange can be reused. Resolve the limitations when both enabling DPP and AQE No Adding new ut Closes apache#31756 from JkSelf/supportDPP2. Authored-by: jiake <ke.a.jia@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Mar 5, 2021

maryannxue reviewed Mar 9, 2021

View reviewed changes

cloud-fan reviewed Mar 18, 2021

View reviewed changes

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala Outdated Show resolved Hide resolved

cloud-fan mentioned this pull request Mar 23, 2021

[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941

Closed

JkSelf force-pushed the supportDPP2 branch from ae6fe64 to 1ed23c5 Compare April 15, 2021 08:37

github-actions bot added the BUILD label Apr 15, 2021

Support DPP + AQE when find the broadcast exchange can reuse

3bc4baf

JkSelf force-pushed the supportDPP2 branch from 1ed23c5 to 3bc4baf Compare April 15, 2021 08:49

cloud-fan reviewed Apr 15, 2021

View reviewed changes

resolve the comments and fix the failed test

657c61b

cloud-fan reviewed Apr 19, 2021

View reviewed changes

tgravescs reviewed Apr 28, 2021

View reviewed changes

andygrove mentioned this pull request Apr 28, 2021

[SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for re-use #32195

Closed

update subquery symbols

6b07c84

cloud-fan reviewed May 10, 2021

View reviewed changes

resolve the comments

4ccd4b8

resolve the comments

701f1c3

cloud-fan approved these changes May 11, 2021

View reviewed changes

cloud-fan closed this in b6d57b6 May 13, 2021

		@@ -41,15 +40,26 @@ case class PlanAdaptiveDynamicPruningFilters(
		adaptivePlan: AdaptiveSparkPlanExec), exprId, _)) =>
		val packedKeys = BindReferences.bindReferences(

[SPARK-34637] [SQL] Support DPP + AQE when the broadcast exchange can be reused #31756

[SPARK-34637] [SQL] Support DPP + AQE when the broadcast exchange can be reused #31756

Conversation

JkSelf commented Mar 5, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

JkSelf commented Mar 5, 2021

SparkQA commented Mar 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Mar 10, 2021

JkSelf commented Mar 10, 2021

Choose a reason for hiding this comment

SparkQA commented Mar 30, 2021

SparkQA commented Mar 30, 2021

github-actions bot commented Apr 15, 2021

SparkQA commented Apr 15, 2021

github-actions bot commented Apr 15, 2021

SparkQA commented Apr 15, 2021

SparkQA commented Apr 15, 2021

SparkQA commented Apr 15, 2021

SparkQA commented Apr 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Apr 15, 2021

github-actions bot commented Apr 15, 2021

SparkQA commented Apr 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Apr 30, 2021

JkSelf commented May 6, 2021

SparkQA commented May 6, 2021

SparkQA commented May 6, 2021

tgravescs commented May 6, 2021

SparkQA commented May 6, 2021

JkSelf commented May 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 11, 2021

SparkQA commented May 11, 2021

SparkQA commented May 11, 2021

SparkQA commented May 11, 2021

SparkQA commented May 11, 2021

SparkQA commented May 11, 2021

cloud-fan commented May 13, 2021

tgravescs Apr 28, 2021 •

edited

Loading