-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-24497][SQL] Support recursive SQL query with adaptive replanning #23531
Conversation
ok to test |
cc @maryannxue |
Test build #101149 has finished for PR 23531 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
Test build #101190 has started for PR 23531 at commit |
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Outdated
Show resolved
Hide resolved
Test build #101191 has started for PR 23531 at commit |
test this please |
1 similar comment
test this please |
Test build #101197 has finished for PR 23531 at commit
|
4ea3a4a
to
5f69d60
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
Test build #101306 has finished for PR 23531 at commit
|
Test build #101308 has finished for PR 23531 at commit
|
Test build #101304 has finished for PR 23531 at commit
|
Test build #101310 has finished for PR 23531 at commit
|
Test build #101327 has finished for PR 23531 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
Test build #101546 has finished for PR 23531 at commit
|
Test build #101555 has finished for PR 23531 at commit
|
Test build #101559 has finished for PR 23531 at commit
|
Test build #101585 has finished for PR 23531 at commit
|
retest this please |
Test build #120411 has finished for PR 23531 at commit
|
retest this please |
Test build #120774 has finished for PR 23531 at commit
|
seems like the PR addressed the feedback, can someone review? |
…497-recursive-sql
Test build #121283 has finished for PR 23531 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
Outdated
Show resolved
Hide resolved
|
||
override def children: Seq[SparkPlan] = anchorTerm :: Nil | ||
|
||
override def innerChildren: Seq[QueryPlan[_]] = logicalRecursiveTerm +: super.innerChildren |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason we need to include logicalResursiveTerm as innerChildren
other than for display?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific reason, just for display. I can find a better way to do it.
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
Outdated
Show resolved
Hide resolved
* Notify the listeners of the physical plan change. | ||
*/ | ||
private def onUpdatePlan(executionId: Long): Unit = { | ||
val queryExecution = SQLExecution.getQueryExecution(executionId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should introduce the idea of iterations in metrics here. Or maybe we need a better way of expressing recursion in UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the idea that we can see what happened in each iteration, but I agree that it can be too much information to present on UI, especially when recursion level is high.
Do you have any idea how to present recursion on the UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance and UI readability reasons, we don't wanna really update all those physical nodes to the UI. AQE has metric-only UI updates, and I assume we can do sth. similar here.
Would you mind doing a little research on how other SQL engines or tools present recursive queries in GUIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
Sure, I'm happy to look into it how other engines show recursion on their UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I didn't have time for this yet, but it is still on my list.
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
Outdated
Show resolved
Hide resolved
Thanks @maryannxue for your comments, really appreciated. I will try to address them soon. |
…497-recursive-sql
Test build #124213 has finished for PR 23531 at commit
|
Test build #124214 has finished for PR 23531 at commit
|
Test build #124233 has finished for PR 23531 at commit
|
Test build #124299 has finished for PR 23531 at commit
|
I've opened a simplified PR (without adaptive replanning in each iteration) here: #29210. That implementation is bit simpler and hopefully will get merged sooner than this PR. |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR adds recursive query feature to Spark SQL.
A recursive query is defined using the
WITH RECURSIVE
keywords and referring the name of the common table expression within the query.The implementation complies with SQL standard and follows similar rules to other relational databases:
UNION
orUNION ALL
operators.Please see
cte-recursive.sql
andwith.sql
for some examples.Please note that this PR focuses on the minimal working implementation which means:
Why are the changes needed?
Recursive query is an ANSI SQL feature that is useful to process hierarchical data.
Does this PR introduce any user-facing change?
Yes, adds recursive query feature.
How was this patch tested?
Added new UTs and tests in
cte-recursion.sql
andwith.sql
.