-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48358][SQL] Support for REPEAT statement #47756
[SPARK-48358][SQL] Support for REPEAT statement #47756
Conversation
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Outdated
Show resolved
Hide resolved
body: CompoundBodyExec, | ||
session: SparkSession) extends NonLeafStatementExec { | ||
|
||
private object RepeatState extends Enumeration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to unify parts of the while
and repeat
? The only difference is starting state and reset()
method. Does it make sense to do it for 2 loops only?
- State enum is exactly the same
treeIterator
is exactly the same- Constructor parameters are identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, treeIterator
is not exactly the same because for while
it iterates while condition is true, but for repeat
it iterates while condition is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see it. We can always override some new abstraction like continueIteration(session, condition)
. From my side this is ok, I'm just thinking if there is need to avoid code duplication in cases like this. Let's wait to hear from other folks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good point, but let's aim to solve this in a follow-up PR to simplify things a bit in this PR, since logic is correct.
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNodeSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNodeSuite.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor comments.
585dc0c
to
c3ef7a0
Compare
|
||
case WhileStatement(condition, body, label) => | ||
val conditionExec = | ||
new SingleStatementExec(condition.parsedPlan, condition.origin, isInternal = false) | ||
val bodyExec = | ||
transformTreeIntoExecutable(body, session).asInstanceOf[CompoundBodyExec] | ||
new WhileStatementExec(conditionExec, bodyExec, label, session) | ||
|
||
case RepeatStatement(condition, body, label) => | ||
val conditionExec = | ||
new SingleStatementExec(condition.parsedPlan, condition.origin, isInternal = false) | ||
val bodyExec = | ||
transformTreeIntoExecutable(body, session).asInstanceOf[CompoundBodyExec] | ||
new RepeatStatementExec(conditionExec, bodyExec, label, session) | ||
|
||
case leaveStatement: LeaveStatement => | ||
new LeaveStatementExec(leaveStatement.label) | ||
|
||
case iterateStatement: IterateStatement => | ||
new IterateStatementExec(iterateStatement.label) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty lines are a suggestion for better readability of this method
case c: RepeatStatementContext | ||
if Option(c.beginLabel()).isDefined && | ||
c.beginLabel().multipartIdentifier().getText.toLowerCase(Locale.ROOT).equals(label) | ||
=> true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge this with the above case for WhileStatementContext
? You have multiple options for this, using |
operator or Either
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wouldn't be trivial, as there is no common supertype between RepeatStatementContext
and WhileStatementContext
with the beginLabel()
method defined. Maybe we could have a labeledStatement
grammar rule, or something similar, to abstract some of this label logic from all statements with labels (ifElse, while, repeat..)
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, fix indentations and it would be nice to add a little bit more negative tests.
b572135
to
e99566a
Compare
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Show resolved
Hide resolved
/** | ||
* Executable node for RepeatStatement. | ||
* @param condition Executable node for the condition. | ||
* @param body Executable node for the body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any contract/restriction for the body
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the type, not really. The body can be anything which extends CompoundBodyExec
curr = Some(condition) | ||
condition.reset() | ||
return retStmt | ||
case _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases does this happen? Shall we raise an error here like unsupported or unexpected statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This match checks if the returned statement is LEAVE
or ITERATE
, because in those cases it should return early. If it's not LEAVE
or ITERATE
, then the method will continue normally. Because we only check if we should return early, we shouldn't throw any error. Similar logic exists in WhileStatementExec
.
+1, LGTM. Merging to master. |
@dusantism-db Congratulations with your first contribution to Apache Spark! |
Thanks for your help, Max! |
…Class` in `SqlScriptingInterpreterSuite` ### What changes were proposed in this pull request? In the PR, I propose to replace `errorClass` by `condition` in `SqlScriptingInterpreterSuite` ### Why are the changes needed? The changes from the PR #47756 conflict to #48027 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test: ``` $ build/sbt "test:testOnly *SqlScriptingInterpreterSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48072 from MaxGekk/fix-errorClass-REPEAT. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? In this PR, support for REPEAT statement in SQL scripting is introduced. Changes summary: Grammar/parser changes - `repeatStatement` grammar rule - `visitRepeatStatement` rule visitor - `RepeatStatement` logical operetor `RepeatStatementExec` execution node Internal sates - `Condition` and `Body` Iterator implementation - switch between body and condition until condition evaluates to true SqlScriptingInterpreter - added logic to transform RepeatStatement logical operator to RepeatStatementExec execution node ### Why are the changes needed? This is a part of SQL Scripting introduced to Spark, REPEAT statement is a basic control flow construct in the SQL language. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New tests are introduced to all of the three scripting test suites: `SqlScriptingParserSuite`, `SqlScriptingExecutionNodeSuite` and `SqlScriptingInterpreterSuite`. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47756 from dusantism-db/sql-scripting-repeat-statement. Authored-by: Dušan Tišma <dusan.tisma@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…Class` in `SqlScriptingInterpreterSuite` ### What changes were proposed in this pull request? In the PR, I propose to replace `errorClass` by `condition` in `SqlScriptingInterpreterSuite` ### Why are the changes needed? The changes from the PR apache#47756 conflict to apache#48027 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test: ``` $ build/sbt "test:testOnly *SqlScriptingInterpreterSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48072 from MaxGekk/fix-errorClass-REPEAT. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
In this PR, support for REPEAT statement in SQL scripting is introduced.
Changes summary:
Grammar/parser changes
repeatStatement
grammar rulevisitRepeatStatement
rule visitorRepeatStatement
logical operetorRepeatStatementExec
execution nodeInternal sates -
Condition
andBody
Iterator implementation - switch between body and condition until condition evaluates to true
SqlScriptingInterpreter - added logic to transform RepeatStatement logical operator to RepeatStatementExec execution node
Why are the changes needed?
This is a part of SQL Scripting introduced to Spark, REPEAT statement is a basic control flow construct in the SQL language.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New tests are introduced to all of the three scripting test suites:
SqlScriptingParserSuite
,SqlScriptingExecutionNodeSuite
andSqlScriptingInterpreterSuite
.Was this patch authored or co-authored using generative AI tooling?
No