-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45584][SQL] Fix subquery execution failure with TakeOrderedAndProjectExec #43419
[SPARK-45584][SQL] Fix subquery execution failure with TakeOrderedAndProjectExec #43419
Conversation
cc @cloud-fan |
@@ -299,6 +301,11 @@ case class TakeOrderedAndProjectExec( | |||
} | |||
} | |||
|
|||
private def prepareAndWaitForSubqueries(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding this method, shall we just call executeQuery
in executeCollect
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg. Just want to make sure RDDOperationScope.withScope
does not any side effect right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same with this one, plus some tracking logic
@@ -283,6 +283,8 @@ case class TakeOrderedAndProjectExec( | |||
} | |||
|
|||
override def executeCollect(): Array[InternalRow] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override def executeCollect(): Array[InternalRow] = { | |
override def executeCollect(): Array[InternalRow] = executeQuery { |
thanks, merging to master/3.5! |
…ProjectExec This PR fixes a bug when there are subqueries in `TakeOrderedAndProjectExec`. The executeCollect method does not wait for subqueries to finish and it can result in IllegalArgumentException when executing a simple query. For example this query: ``` WITH t2 AS ( SELECT * FROM t1 ORDER BY id ) SELECT *, (SELECT COUNT(*) FROM t2) FROM t2 LIMIT 10 ``` will fail with this error ``` java.lang.IllegalArgumentException: requirement failed: Subquery subquery#242, [id=#109] has not finished ``` To fix a bug. No New unit test No Closes #43419 from allisonwang-db/spark-45584-subquery-failure. Authored-by: allisonwang-db <allison.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8fd915f) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM later.
What changes were proposed in this pull request?
This PR fixes a bug when there are subqueries in
TakeOrderedAndProjectExec
. The executeCollect method does not wait for subqueries to finish and it can result in IllegalArgumentException when executing a simple query.For example this query:
will fail with this error
Why are the changes needed?
To fix a bug.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit test
Was this patch authored or co-authored using generative AI tooling?
No