-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34637][SQL] Improve the performance of AQE and DPP through logical optimization. #31941
Conversation
…ical optimization. To support Dynamic Partition Pruning in Adaptive Execution: 1. Firstly, we should check whether the sparkPlan has a BroadcastHashJoinExec of the build side before the adaptive dynamic pruning optimizer rule. 2. When the above broadcastHashJoinExec exists but the broadcastQueryStage of the build side of join has't created, we should create a broadcastQueryStage of the build side for the DPP optimizer, and cache it for AQE reuse. 3. When the above broadcastHashJoinExec exists and the broadcastQueryStage of the build side of join has created, we can reuse it again for the DPP optimizer 4. When the above broadcastHashJoinExec does't exist, we should forback the DPP optimizer.
@cloud-fan @maryannxue @JkSelf PTAL. |
ok to test |
Test build #136407 has started for PR 31941 at commit |
val reuseQueryStage = adaptivePlan.reuseQueryStage(existingStage.get, exchange) | ||
logDebug(s"PlanAdaptiveDynamicPruningFilters: reuseQueryStage => $reuseQueryStage") | ||
Option(reuseQueryStage) | ||
} else if (conf.dynamicPartitionPruningCreateBroadcastEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we don't need this config. It's always beneficial to do so. The AQE + DPP integration is only in master and not released yet, so we don't need to worry about regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your time. Yes, we may keep it now for reviewing and remove it finally.
Kubernetes integration test starting |
Kubernetes integration test status failure |
@cloud-fan @weixiuli |
When found the @cloud-fan @weixiuli Please correct me if wrong understandings. Thanks. |
Yes, when the InputPlan has a build-side BroadcastHashJoinExec before the adaptive dynamic trimming optimizer rule, |
I think it is better to make use of the AQE framework to reuse the broadcast exchange or newQueryStage. |
I agree, and I think this PR does it? When planning the DPP filter, the broadcast plan may have 2 different states:
Case 2 is a bit tricky due to race conditions. Maybe the DPP filter and AQE framework are creating a fresh query stage at the same time. We should double-check it. |
@cloud-fan
|
@JkSelf are you saying that the |
@JkSelf I don't think so, the 'case 2’ said by @cloud-fan would be ignored if following your opinion. This PR can solve the 'case 2’. In addition, I think this modification is relatively concise. |
@cloud-fan |
@weixiuli |
@cloud-fan I will try this idea in PR#31756 later. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Both AQE and DPP can be applied at the same time in https://issues.apache.org/jira/browse/SPARK-34168, while AQE and DPP can only enable when the join is Broadcast hash join at the beginning.
This pr supports Dynamic Partition Pruning in Adaptive Execution can be applied at the same time in more scenarios , the processing idea is as follows:
Why are the changes needed?
To support Dynamic Partition Pruning in Adaptive Execution can be applied at the same time in more scenarios.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Add unittests.