[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941

weixiuli · 2021-03-23T09:51:45Z

What changes were proposed in this pull request?

Both AQE and DPP can be applied at the same time in https://issues.apache.org/jira/browse/SPARK-34168, while AQE and DPP can only enable when the join is Broadcast hash join at the beginning.

This pr supports Dynamic Partition Pruning in Adaptive Execution can be applied at the same time in more scenarios , the processing idea is as follows:

Firstly, we should check whether the inputPlan has a BroadcastHashJoinExec of the build side before the adaptive dynamic pruning optimizer rule.
When the above broadcastHashJoinExec exists but the broadcastQueryStage of the build side of join has't created, we should create a broadcastQueryStage of the build side for the DPP optimizer, and cache it for AQE reuse.
When the above broadcastHashJoinExec exists and the broadcastQueryStage of the build side of join has created, we can reuse it again for the DPP optimizer
When the above broadcastHashJoinExec does't exist, we should fallback the DPP optimizer.

Why are the changes needed?

To support Dynamic Partition Pruning in Adaptive Execution can be applied at the same time in more scenarios.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Add unittests.

…ical optimization. To support Dynamic Partition Pruning in Adaptive Execution: 1. Firstly, we should check whether the sparkPlan has a BroadcastHashJoinExec of the build side before the adaptive dynamic pruning optimizer rule. 2. When the above broadcastHashJoinExec exists but the broadcastQueryStage of the build side of join has't created, we should create a broadcastQueryStage of the build side for the DPP optimizer, and cache it for AQE reuse. 3. When the above broadcastHashJoinExec exists and the broadcastQueryStage of the build side of join has created, we can reuse it again for the DPP optimizer 4. When the above broadcastHashJoinExec does't exist, we should forback the DPP optimizer.

weixiuli · 2021-03-23T09:54:02Z

@cloud-fan @maryannxue @JkSelf PTAL.

cloud-fan · 2021-03-23T13:59:35Z

ok to test

SparkQA · 2021-03-23T14:13:15Z

Test build #136407 has started for PR 31941 at commit 0208465.

cloud-fan · 2021-03-23T14:32:45Z

...c/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala

+            val reuseQueryStage = adaptivePlan.reuseQueryStage(existingStage.get, exchange)
+            logDebug(s"PlanAdaptiveDynamicPruningFilters: reuseQueryStage => $reuseQueryStage")
+            Option(reuseQueryStage)
+          } else if (conf.dynamicPartitionPruningCreateBroadcastEnabled) {


Seems we don't need this config. It's always beneficial to do so. The AQE + DPP integration is only in master and not released yet, so we don't need to worry about regressions.

Thank you for your time. Yes, we may keep it now for reviewing and remove it finally.

cloud-fan · 2021-03-23T14:33:32Z

This looks similar to #31756 , @JkSelf can you take a look?

SparkQA · 2021-03-23T16:40:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40991/

SparkQA · 2021-03-23T16:48:31Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40991/

JkSelf · 2021-03-24T01:45:12Z

@cloud-fan @weixiuli
Yes. PR#31756 is resolving the DPP + AQE use case limitations.

JkSelf · 2021-03-24T12:53:18Z

When found the buildPlan in SubqueryAdaptiveBroadcastExec can be reused, we can apply the DPP filter firstly(same with the PlanDynamicPruningFilters rule) and then back the AQE framework to reuse the broadcast exchange in the build side. And If the build side exchange is running and not finished, we can use the wait or cancel this stage mechanism to avoid the broadcast exchange be executed twice.

@cloud-fan @weixiuli Please correct me if wrong understandings. Thanks.

weixiuli · 2021-03-25T01:25:58Z

When found the buildPlan in SubqueryAdaptiveBroadcastExec can be reused, we can apply the DPP filter firstly(same with the PlanDynamicPruningFilters rule) and then back the AQE framework to reuse the broadcast exchange in the build side. And If the build side exchange is running and not finished, we can use the wait or cancel this stage mechanism to avoid the broadcast exchange be executed twice.

@cloud-fan @weixiuli Please correct me if wrong understandings. Thanks.

Yes, when the InputPlan has a build-side BroadcastHashJoinExec before the adaptive dynamic trimming optimizer rule,
we should ensure that the build plan can be created by the DPP when not created by the AQE, and that the AQE can reuse it later, or that the DPP can reuse it when it has already been created by the AQE. Not only does it avoid the broadcast exchange be executed twice, but it also guarantees that BuildPlan can be created and used by the DPP if it is not created by the AQE, which is beneficial to both the DPP and AQE.

JkSelf · 2021-03-25T01:39:29Z

I think it is better to make use of the AQE framework to reuse the broadcast exchange or newQueryStage.
@cloud-fan What is your point of view?

cloud-fan · 2021-03-25T16:10:05Z

I think it is better to make use of the AQE framework to reuse the broadcast exchange or newQueryStage.

I agree, and I think this PR does it?

When planning the DPP filter, the broadcast plan may have 2 different states:

It's already submitted as a query stage, which means it's available in the stage cache. No matter it's running or completed, we will create a ReusedQueryStage for DPP filter.
It's not submitted yet and not available in the stage cache. We should create a fresh QueryStage for DPP filter and put it in the stage cache, so that the AQE framework can reuse it later.

Case 2 is a bit tricky due to race conditions. Maybe the DPP filter and AQE framework are creating a fresh query stage at the same time. We should double-check it.

JkSelf · 2021-03-26T05:04:20Z

I think it is better to make use of the AQE framework to reuse the broadcast exchange or newQueryStage.

@cloud-fan
I may need to explain a little bit more about this.

In my understanding, PlanDynamicPruningFilters rule is just simply judge whether there is an exchange that can be reused to decide whether to insert DPP or not. And the process of real reuse is in ReuseExchange rule. I think this way of thinking is clearer.
When AQE was enabled, we implemented the ReuseExchange rule in the AQE Framework. When the exchange was created, we went to the stageCache to find out if there is an exchange that can be reused, and if there is, we reuse it.
In the PlanAdaptiveDynamicPruningFilters rule, I am more inclined to the idea of PlanDynamicPruningFilters rule, just add DPP filter by judging whether there is an exchange that can be reused. The real reuse process is left to AQE Framework instead of looking in the stageCache to create the reused exchange or calling the newQueryStage method to create a new quey stage in the PlanAdaptiveDynamicPruningFilters rule. Of course, we did this in PR#31258. But I think we may need to make some improvements in subsequent implementations.

cloud-fan · 2021-03-26T06:13:35Z

@JkSelf are you saying that the PlanAdaptiveDynamicPruningFilters should simply create BroadcastExchange and let the AQE framework create/reuse query stage?

weixiuli · 2021-03-26T06:17:45Z

3. In the PlanAdaptiveDynamicPruningFilters rule, I am more inclined to the idea of PlanDynamicPruningFilters rule, just add DPP filter by judging whether there is an exchange that can be reused. The real reuse process is left to AQE Framework instead of looking in the stageCache to create the reused exchange or calling the newQueryStage method to create a new quey stage in the PlanAdaptiveDynamicPruningFilters rule.

@JkSelf I don't think so, the 'case 2’ said by @cloud-fan would be ignored if following your opinion. This PR can solve the 'case 2’. In addition, I think this modification is relatively concise.

JkSelf · 2021-03-26T06:24:43Z

are you saying that the PlanAdaptiveDynamicPruningFilters should simply create BroadcastExchange and let the AQE framework create/reuse query stage?

@cloud-fan
Yes. I think this implementation is clearer.

cloud-fan · 2021-03-26T08:07:03Z

@JkSelf your idea looks concise, but unfortunately PlanAdaptiveDynamicPruningFilters is a stage optimization rule and I'm not sure if AQE framework can still kick in at that point to reuse the stage. @JkSelf can you try this idea and see if we can make it work?

JkSelf · 2021-03-26T08:40:42Z

the 'case 2’ said by @cloud-fan would be ignored if following your opinion.

@weixiuli
case 2 will be done in createQueryStages method of AdaptiveSparkPlanExec after back to AQE framework. It will call newQueryStage method if there is no reused exchange.

JkSelf · 2021-03-26T08:42:02Z

@cloud-fan I will try this idea in PR#31756 later.

SparkQA · 2021-04-01T07:49:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41383/

SparkQA · 2021-04-01T07:49:03Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41383/

github-actions · 2021-07-11T00:08:04Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added the SQL label Mar 23, 2021

cloud-fan reviewed Mar 23, 2021

View reviewed changes

github-actions bot added the Stale label Jul 11, 2021

github-actions bot closed this Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941

[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941

weixiuli commented Mar 23, 2021 •

edited

Loading

weixiuli commented Mar 23, 2021

cloud-fan commented Mar 23, 2021

SparkQA commented Mar 23, 2021

cloud-fan Mar 23, 2021

weixiuli Mar 24, 2021

cloud-fan commented Mar 23, 2021

SparkQA commented Mar 23, 2021

SparkQA commented Mar 23, 2021

JkSelf commented Mar 24, 2021

JkSelf commented Mar 24, 2021

weixiuli commented Mar 25, 2021 •

edited

Loading

JkSelf commented Mar 25, 2021

cloud-fan commented Mar 25, 2021

JkSelf commented Mar 26, 2021

cloud-fan commented Mar 26, 2021

weixiuli commented Mar 26, 2021

JkSelf commented Mar 26, 2021 •

edited

Loading

cloud-fan commented Mar 26, 2021 •

edited

Loading

JkSelf commented Mar 26, 2021

JkSelf commented Mar 26, 2021

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

github-actions bot commented Jul 11, 2021

[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941

[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941

Conversation

weixiuli commented Mar 23, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

weixiuli commented Mar 23, 2021

cloud-fan commented Mar 23, 2021

SparkQA commented Mar 23, 2021

cloud-fan Mar 23, 2021

Choose a reason for hiding this comment

weixiuli Mar 24, 2021

Choose a reason for hiding this comment

cloud-fan commented Mar 23, 2021

SparkQA commented Mar 23, 2021

SparkQA commented Mar 23, 2021

JkSelf commented Mar 24, 2021

JkSelf commented Mar 24, 2021

weixiuli commented Mar 25, 2021 • edited Loading

JkSelf commented Mar 25, 2021

cloud-fan commented Mar 25, 2021

JkSelf commented Mar 26, 2021

cloud-fan commented Mar 26, 2021

weixiuli commented Mar 26, 2021

JkSelf commented Mar 26, 2021 • edited Loading

cloud-fan commented Mar 26, 2021 • edited Loading

JkSelf commented Mar 26, 2021

JkSelf commented Mar 26, 2021

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

github-actions bot commented Jul 11, 2021

weixiuli commented Mar 23, 2021 •

edited

Loading

weixiuli commented Mar 25, 2021 •

edited

Loading

JkSelf commented Mar 26, 2021 •

edited

Loading

cloud-fan commented Mar 26, 2021 •

edited

Loading