Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48358][SQL] Support for REPEAT statement #47756

Closed

Conversation

dusantism-db
Copy link
Contributor

What changes were proposed in this pull request?

In this PR, support for REPEAT statement in SQL scripting is introduced.

Changes summary:

Grammar/parser changes

  • repeatStatement grammar rule
  • visitRepeatStatement rule visitor
  • RepeatStatement logical operetor

RepeatStatementExec execution node
Internal sates - Condition and Body
Iterator implementation - switch between body and condition until condition evaluates to true
SqlScriptingInterpreter - added logic to transform RepeatStatement logical operator to RepeatStatementExec execution node

Why are the changes needed?

This is a part of SQL Scripting introduced to Spark, REPEAT statement is a basic control flow construct in the SQL language.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New tests are introduced to all of the three scripting test suites: SqlScriptingParserSuite, SqlScriptingExecutionNodeSuite and SqlScriptingInterpreterSuite.

Was this patch authored or co-authored using generative AI tooling?

No

body: CompoundBodyExec,
session: SparkSession) extends NonLeafStatementExec {

private object RepeatState extends Enumeration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to unify parts of the while and repeat? The only difference is starting state and reset() method. Does it make sense to do it for 2 loops only?

  • State enum is exactly the same
  • treeIterator is exactly the same
  • Constructor parameters are identical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, treeIterator is not exactly the same because for while it iterates while condition is true, but for repeat it iterates while condition is false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see it. We can always override some new abstraction like continueIteration(session, condition). From my side this is ok, I'm just thinking if there is need to avoid code duplication in cases like this. Let's wait to hear from other folks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point, but let's aim to solve this in a follow-up PR to simplify things a bit in this PR, since logic is correct.

Copy link
Contributor

@davidm-db davidm-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments.

@dusantism-db dusantism-db force-pushed the sql-scripting-repeat-statement branch 2 times, most recently from 585dc0c to c3ef7a0 Compare September 5, 2024 16:23
Comment on lines +97 to +117

case WhileStatement(condition, body, label) =>
val conditionExec =
new SingleStatementExec(condition.parsedPlan, condition.origin, isInternal = false)
val bodyExec =
transformTreeIntoExecutable(body, session).asInstanceOf[CompoundBodyExec]
new WhileStatementExec(conditionExec, bodyExec, label, session)

case RepeatStatement(condition, body, label) =>
val conditionExec =
new SingleStatementExec(condition.parsedPlan, condition.origin, isInternal = false)
val bodyExec =
transformTreeIntoExecutable(body, session).asInstanceOf[CompoundBodyExec]
new RepeatStatementExec(conditionExec, bodyExec, label, session)

case leaveStatement: LeaveStatement =>
new LeaveStatementExec(leaveStatement.label)

case iterateStatement: IterateStatement =>
new IterateStatementExec(iterateStatement.label)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty lines are a suggestion for better readability of this method

Comment on lines +292 to +295
case c: RepeatStatementContext
if Option(c.beginLabel()).isDefined &&
c.beginLabel().multipartIdentifier().getText.toLowerCase(Locale.ROOT).equals(label)
=> true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this with the above case for WhileStatementContext? You have multiple options for this, using | operator or Either type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't be trivial, as there is no common supertype between RepeatStatementContext and WhileStatementContext with the beginLabel() method defined. Maybe we could have a labeledStatement grammar rule, or something similar, to abstract some of this label logic from all statements with labels (ifElse, while, repeat..)

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix indentations and it would be nice to add a little bit more negative tests.

/**
* Executable node for RepeatStatement.
* @param condition Executable node for the condition.
* @param body Executable node for the body.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any contract/restriction for the body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the type, not really. The body can be anything which extends CompoundBodyExec

curr = Some(condition)
condition.reset()
return retStmt
case _ =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases does this happen? Shall we raise an error here like unsupported or unexpected statement?

Copy link
Contributor Author

@dusantism-db dusantism-db Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match checks if the returned statement is LEAVE or ITERATE, because in those cases it should return early. If it's not LEAVE or ITERATE, then the method will continue normally. Because we only check if we should return early, we shouldn't throw any error. Similar logic exists in WhileStatementExec.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 11, 2024

+1, LGTM. Merging to master.
Thank you, @dusantism-db and @miland-db @davidm-db for review.

@MaxGekk MaxGekk closed this in da89322 Sep 11, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Sep 11, 2024

@dusantism-db Congratulations with your first contribution to Apache Spark!

@dusantism-db
Copy link
Contributor Author

@dusantism-db Congratulations with your first contribution to Apache Spark!

Thanks for your help, Max!

MaxGekk added a commit that referenced this pull request Sep 11, 2024
…Class` in `SqlScriptingInterpreterSuite`

### What changes were proposed in this pull request?
In the PR, I propose to replace `errorClass` by `condition` in `SqlScriptingInterpreterSuite`

### Why are the changes needed?
The changes from the PR #47756 conflict to #48027

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test:
```
$ build/sbt "test:testOnly *SqlScriptingInterpreterSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48072 from MaxGekk/fix-errorClass-REPEAT.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
In this PR, support for REPEAT statement in SQL scripting is introduced.

Changes summary:

Grammar/parser changes
- `repeatStatement` grammar rule
- `visitRepeatStatement` rule visitor
- `RepeatStatement` logical operetor

`RepeatStatementExec` execution node
Internal sates - `Condition` and `Body`
Iterator implementation - switch between body and condition until condition evaluates to true
SqlScriptingInterpreter - added logic to transform RepeatStatement logical operator to RepeatStatementExec execution node

### Why are the changes needed?
This is a part of SQL Scripting introduced to Spark, REPEAT statement is a basic control flow construct in the SQL language.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New tests are introduced to all of the three scripting test suites: `SqlScriptingParserSuite`, `SqlScriptingExecutionNodeSuite` and `SqlScriptingInterpreterSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47756 from dusantism-db/sql-scripting-repeat-statement.

Authored-by: Dušan Tišma <dusan.tisma@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…Class` in `SqlScriptingInterpreterSuite`

### What changes were proposed in this pull request?
In the PR, I propose to replace `errorClass` by `condition` in `SqlScriptingInterpreterSuite`

### Why are the changes needed?
The changes from the PR apache#47756 conflict to apache#48027

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test:
```
$ build/sbt "test:testOnly *SqlScriptingInterpreterSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48072 from MaxGekk/fix-errorClass-REPEAT.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants