[SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for re-use #32195

andygrove · 2021-04-15T19:18:15Z

What changes were proposed in this pull request?

AQE has an optimization where it attempts to reuse compatible exchanges but it does not take into account whether the exchanges are columnar or not, resulting in incorrect reuse under some circumstances.

This PR simply changes the key used to lookup cached stages. It now uses the canonicalized form of the new query stage (potentially created by a plugin) rather than using the canonicalized form of the original exchange.

Why are the changes needed?

When using the RAPIDS Accelerator for Apache Spark we sometimes see a new query stage correctly create a row-based exchange and then Spark replaces it with a cached columnar exchange, which is not compatible, and this causes queries to fail.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The patch has been tested with the query that highlighted this issue. I looked at writing unit tests for this but it would involve implementing a mock columnar exchange in the tests so would be quite a bit of work. If anyone has ideas on other ways to test this I am happy to hear them.

github-actions · 2021-04-15T19:18:49Z

Test build #753174945 for PR 32195 at commit e735c1b.

andygrove · 2021-04-15T19:20:06Z

@tgravescs fyi

tgravescs · 2021-04-15T20:04:27Z

ok to test

SparkQA · 2021-04-15T20:50:41Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42020/

SparkQA · 2021-04-15T20:51:00Z

Test build #137445 has finished for PR 32195 at commit e735c1b.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

github-actions · 2021-04-15T22:28:55Z

Test build #753637969 for PR 32195 at commit 074f091.

SparkQA · 2021-04-15T23:26:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42022/

SparkQA · 2021-04-15T23:26:02Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42022/

HyukjinKwon · 2021-04-16T02:02:33Z

cc @maryannxue FYI

dongjoon-hyun · 2021-04-16T02:31:13Z

Hi, @andygrove . Could you clarify that this is not a correctness issue in the PR description? It only causes job failures, doesn't it?

resulting in incorrect reuse under some circumstances.

SparkQA · 2021-04-16T02:53:16Z

Test build #137447 has finished for PR 32195 at commit 074f091.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya

Does it mean, if Spark converts reused query stage from columnar to row-based, it will be compatible? Just out of curiosity.

andygrove · 2021-04-16T03:54:33Z

Hi, @andygrove . Could you clarify that this is not a correctness issue in the PR description? It only causes job failures, doesn't it?

Yes @dongjoon-hyun that is correct. It would result in an invalid plan that would fail to execute.

andygrove · 2021-04-16T03:58:20Z

Does it mean, if Spark converts reused query stage from columnar to row-based, it will be compatible? Just out of curiosity.

@viirya Spark plugins can create either columnar or row-based exchanges as required for compatibility with other parts of the query plan.

For example. a columnar BroadcastHashJoin would expect both of its child plans to also be columnar, so if Spark replaces one of the child plans with a row-based plan it would not be compatible at execution time.

viirya · 2021-04-16T04:08:04Z

Does ReuseExchange rule also possibly suffer from the issue? In the rule, Spark also only checks canonicalized plans of exchanges to choose reused exchange.

andygrove · 2021-04-16T04:23:41Z

Does ReuseExchange rule also possibly suffer from the issue? In the rule, Spark also only checks canonicalized plans of exchanges to choose reused exchange.

This hasn't been a problem for us. We only ran into issues with the AQE exchange reuse logic because of the adaptive nature where query stages are being created during execution, and with Spark making changes after we provide the new query stage it is too late for our plugin to correct the issue.

Without AQE, where the ReuseExchange rule would be applied (as far as I can see), our plugin would operate on the physical plan after Spark has applied the exchange reuse logic and we would be able to modify the plan as required after that, and before execution, to insert any necessary transitions.

andygrove · 2021-04-16T04:31:46Z

I'll take another look at ReuseExchange tomorrow. It might be worth considering adding a similar check here, perhaps as a separate PR.

tgravescs · 2021-04-16T18:12:41Z

If I'm understanding this properly it seems like the root cause is actually that AQE looks at the plan before the columnar changes are applied and physical plan gets executed so it thinks they are canonically the same when really after preparations (specifically here ApplyColumnarRulesAndInsertTransitions) and execution they aren't. I wonder if there is a better way to do that part. Might be nice to investigate that a bit further to see if that is possible.

dongjoon-hyun · 2021-04-17T22:17:06Z

cc @cloud-fan

cloud-fan · 2021-04-19T07:01:53Z

it seems like the root cause is actually that AQE looks at the plan before the columnar changes are applied

This is a good point. Without AQE, ApplyColumnarRulesAndInsertTransitions is executed before ReuseExchange.

tgravescs · 2021-04-20T21:55:26Z

so I was looking at this today and I think its fairly easy fix to just to change it to store the plan after its been through all the rules.

val queryStage = context.stageCache.getOrElseUpdate(newStage.plan.canonicalized, newStage)
instead of the

val queryStage = context.stageCache.getOrElseUpdate(e.canonicalized, newStage)

The only downside I can think of is if there is something that is matching on the original plan but after running the rules makes it not canonically the same but really is, it would take a bit longer to have to tranverse down the plan and apply all the rules. But that kind of seems like a bug to me anyway.
If anyone sees an issue with that let me know, otherwise either @andygrove can update this or I can put up a PR.

andygrove · 2021-04-21T17:48:24Z

Thanks, @tgravescs. That is a much simpler change! I have updated the PR.

tgravescs

change looks good to me, but would be great for others to take a look as well.

SparkQA · 2021-04-21T18:09:39Z

Test build #137742 has finished for PR 32195 at commit 1035842.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

SparkQA · 2021-04-21T18:53:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42269/

SparkQA · 2021-04-21T18:53:49Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42269/

SparkQA · 2021-04-21T19:54:50Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42270/

SparkQA · 2021-04-21T19:54:51Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42270/

cloud-fan · 2021-05-17T16:22:50Z

I'm OK with that. The fix is simple and supportColumns API is in 3.0/3.1

tgravescs · 2021-05-18T13:16:51Z

@andygrove could update the description and then test on the 3.1.2 and 3.0.3 branches when you get a chance?

andygrove · 2021-05-18T16:07:00Z

@cloud-fan @tgravescs I updated the title and description. Let me know if this is still not clear.

dongjoon-hyun · 2021-05-18T16:43:03Z

Yes, please backport this to the old branches.

andygrove · 2021-05-18T20:35:57Z

I applied this patch to the latest in branch-3.0 and branch-3.1 and ran manual tests to confirm that this fixes the issue for us in those releases.

tgravescs · 2021-05-18T20:40:57Z

I'm not sure why the label pull requests check isn't running here. Anyone know?

viirya · 2021-05-18T21:22:27Z

Not only here, I also saw other PRs their label pull requests are queued too.

… up cached exchanges for re-use ### What changes were proposed in this pull request? AQE has an optimization where it attempts to reuse compatible exchanges but it does not take into account whether the exchanges are columnar or not, resulting in incorrect reuse under some circumstances. This PR simply changes the key used to lookup cached stages. It now uses the canonicalized form of the new query stage (potentially created by a plugin) rather than using the canonicalized form of the original exchange. ### Why are the changes needed? When using the [RAPIDS Accelerator for Apache Spark](https://github.com/NVIDIA/spark-rapids) we sometimes see a new query stage correctly create a row-based exchange and then Spark replaces it with a cached columnar exchange, which is not compatible, and this causes queries to fail. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The patch has been tested with the query that highlighted this issue. I looked at writing unit tests for this but it would involve implementing a mock columnar exchange in the tests so would be quite a bit of work. If anyone has ideas on other ways to test this I am happy to hear them. Closes #32195 from andygrove/SPARK-35093. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org> (cherry picked from commit 52e3cf9) Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs · 2021-05-19T12:49:53Z

thanks @andygrove, thanks everyone for the reviews.
I've merged to master, branch-3.1 and branch-3.0

dongjoon-hyun · 2021-05-19T15:46:36Z

Hi, All. This seems to break branch-3.0.

https://github.com/apache/spark/runs/2621206825

dongjoon-hyun · 2021-05-19T15:53:01Z

Could you check if that is a flakiness or not?

[info] AdaptiveQueryExecSuite:
[info] - Change merge join to broadcast join (521 milliseconds)
[info] - Reuse the parallelism of CoalescedShuffleReaderExec in LocalShuffleReaderExec (265 milliseconds)
[info] - Reuse the default parallelism in LocalShuffleReaderExec (260 milliseconds)
[info] - Scalar subquery (422 milliseconds)
[info] - Scalar subquery in later stages (567 milliseconds)
[info] - multiple joins *** FAILED *** (1 second, 25 milliseconds)
[info]   ArrayBuffer(BroadcastHashJoin [b#147752], [a#147761], Inner, BuildLeft

andygrove · 2021-05-19T15:55:41Z

I'm looking now

andygrove · 2021-05-19T17:21:37Z

The tests worked fine for me locally at commit 706f91e. AQE is non-deterministic because query stages can change depending on the order of other stages completing, so this is likely a flakey test. I will investigate further and see how we can make it more robust.

andygrove · 2021-05-19T18:28:46Z

It looks like this is not a new issue, although potentially the recent change made it more likely to happen. We should probably re-open https://issues.apache.org/jira/browse/SPARK-32304

HyukjinKwon · 2021-05-20T11:22:50Z

Late LGTM2

… up cached exchanges for re-use ### What changes were proposed in this pull request? AQE has an optimization where it attempts to reuse compatible exchanges but it does not take into account whether the exchanges are columnar or not, resulting in incorrect reuse under some circumstances. This PR simply changes the key used to lookup cached stages. It now uses the canonicalized form of the new query stage (potentially created by a plugin) rather than using the canonicalized form of the original exchange. ### Why are the changes needed? When using the [RAPIDS Accelerator for Apache Spark](https://github.com/NVIDIA/spark-rapids) we sometimes see a new query stage correctly create a row-based exchange and then Spark replaces it with a cached columnar exchange, which is not compatible, and this causes queries to fail. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The patch has been tested with the query that highlighted this issue. I looked at writing unit tests for this but it would involve implementing a mock columnar exchange in the tests so would be quite a bit of work. If anyone has ideas on other ways to test this I am happy to hear them. Closes apache#32195 from andygrove/SPARK-35093. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org> (cherry picked from commit 52e3cf9) Signed-off-by: Thomas Graves <tgraves@apache.org>

… up cached exchanges for re-use ### What changes were proposed in this pull request? AQE has an optimization where it attempts to reuse compatible exchanges but it does not take into account whether the exchanges are columnar or not, resulting in incorrect reuse under some circumstances. This PR simply changes the key used to lookup cached stages. It now uses the canonicalized form of the new query stage (potentially created by a plugin) rather than using the canonicalized form of the original exchange. ### Why are the changes needed? When using the [RAPIDS Accelerator for Apache Spark](https://github.com/NVIDIA/spark-rapids) we sometimes see a new query stage correctly create a row-based exchange and then Spark replaces it with a cached columnar exchange, which is not compatible, and this causes queries to fail. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The patch has been tested with the query that highlighted this issue. I looked at writing unit tests for this but it would involve implementing a mock columnar exchange in the tests so would be quite a bit of work. If anyone has ideas on other ways to test this I am happy to hear them. Closes apache#32195 from andygrove/SPARK-35093. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org>

github-actions bot added the SQL label Apr 15, 2021

viirya reviewed Apr 16, 2021

View reviewed changes

tgravescs approved these changes Apr 21, 2021

View reviewed changes

tgravescs reviewed Apr 21, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

andygrove changed the title ~~[SPARK-35093] [SQL] AQE now respects supportsColumnar when attempting to reuse exchanges~~ [SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for reiuse May 18, 2021

andygrove changed the title ~~[SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for reiuse~~ [SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for re-use May 18, 2021

tgravescs approved these changes May 18, 2021

View reviewed changes

cloud-fan approved these changes May 18, 2021

View reviewed changes

dongjoon-hyun approved these changes May 18, 2021

View reviewed changes

viirya approved these changes May 18, 2021

View reviewed changes

asfgit closed this in 52e3cf9 May 19, 2021

tgravescs mentioned this pull request Jun 7, 2021

[FEA] Consider contributing Spark changes to support columnar adaptive plans NVIDIA/spark-rapids#2067

Closed

tgravescs mentioned this pull request Jul 6, 2021

As a result of SPARK-34637 there is a change to AdaptiveSparkPlanExec when employing DPP and AQE NVIDIA/spark-rapids#2857

Closed

[SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for re-use #32195

[SPARK-35093] [SQL] AQE now uses newQueryStage plan as key for looking up cached exchanges for re-use #32195

Conversation

andygrove commented Apr 15, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

github-actions bot commented Apr 15, 2021

andygrove commented Apr 15, 2021

tgravescs commented Apr 15, 2021

SparkQA commented Apr 15, 2021

SparkQA commented Apr 15, 2021

github-actions bot commented Apr 15, 2021

SparkQA commented Apr 15, 2021

SparkQA commented Apr 15, 2021

HyukjinKwon commented Apr 16, 2021

dongjoon-hyun commented Apr 16, 2021

SparkQA commented Apr 16, 2021

viirya left a comment

Choose a reason for hiding this comment

andygrove commented Apr 16, 2021

andygrove commented Apr 16, 2021

viirya commented Apr 16, 2021

andygrove commented Apr 16, 2021

andygrove commented Apr 16, 2021

tgravescs commented Apr 16, 2021

dongjoon-hyun commented Apr 17, 2021

cloud-fan commented Apr 19, 2021

tgravescs commented Apr 20, 2021

andygrove commented Apr 21, 2021

tgravescs left a comment

Choose a reason for hiding this comment

SparkQA commented Apr 21, 2021

SparkQA commented Apr 21, 2021

SparkQA commented Apr 21, 2021

SparkQA commented Apr 21, 2021

SparkQA commented Apr 21, 2021

cloud-fan commented May 17, 2021

tgravescs commented May 18, 2021

andygrove commented May 18, 2021

dongjoon-hyun commented May 18, 2021

andygrove commented May 18, 2021

tgravescs commented May 18, 2021

viirya commented May 18, 2021

tgravescs commented May 19, 2021

dongjoon-hyun commented May 19, 2021

dongjoon-hyun commented May 19, 2021 • edited Loading

andygrove commented May 19, 2021

andygrove commented May 19, 2021

andygrove commented May 19, 2021

HyukjinKwon commented May 20, 2021

andygrove commented Apr 15, 2021 •

edited

Loading

dongjoon-hyun commented May 19, 2021 •

edited

Loading