Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48348][SPARK-48376][SQL] Introduce LEAVE and ITERATE statements #47973

Closed

Conversation

davidm-db
Copy link
Contributor

@davidm-db davidm-db commented Sep 3, 2024

What changes were proposed in this pull request?

This PR proposes introduction of LEAVE and ITERATE statement types to SQL Scripting language:

  • LEAVE statement can be used in loops, as well as in BEGIN ... END compound blocks.
  • ITERATE statement can be used only in loops.

This PR introduces:

  • Logical operators for both statement types.
  • Execution nodes for both statement types.
  • Interpreter changes required to build execution plans that support new statement types.
  • New error if statements are not used properly.
  • Minor changes required to support new keywords.

Why are the changes needed?

Adding support for new statement types to SQL Scripting language.

Does this PR introduce any user-facing change?

This PR introduces new statement types that will be available to users. However, script execution logic hasn't been done yet, so the new changes are not accessible by users yet.

How was this patch tested?

Tests are introduced to all test suites related to SQL scripting.

Was this patch authored or co-authored using generative AI tooling?

No.

// Stop the iteration.
stopIteration = true

// TODO: Variable cleanup (once we add SQL script execution logic).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be done once we introduce execution. For now, we are adding DropVariable statements at the end of each block implicitly. However, introduction of LEAVE and ITERATE statements means that these implicit statements will be skipped and we need to add logic to execute them immediately. Currently, we cannot do so, because execution nodes don't have execution logic yet.

ctx: RuleContext, label: String, isLeave: Boolean): Boolean = {
ctx match {
case c: BeginEndCompoundBlockContext
if isLeave &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we have a dedicated error for using ITERATE on the label of BEGIN END compound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -35,6 +35,10 @@ class SqlScriptingExecutionNodeSuite extends SparkFunSuite with SharedSparkSessi
override def reset(): Unit = ()
}

case class TestLeaveStatement(labelText: String) extends LeaveStatementExec(labelText)

case class TestIterateStatement(labelText: String) extends IterateStatementExec(labelText)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use the real Leave/IterateStatementExec in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I'm dumb :) Used these in extractStatementValue to match on labelText directly instead of matching by type only and using Leave/IterateStatementExec... will change in the next commit to use regular exec nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48348][SPARK-48376][SQL] Introduce LEAVE and ITERATE statements [SPARK-48348][SPARK-48376][SQL] Introduce LEAVE and ITERATE statements Sep 4, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

cc @yaooqinn , too.

@@ -2453,6 +2453,12 @@
},
"sqlState" : "42K0K"
},
"INVALID_ITERATE_LABEL_USAGE_FOR_COMPOUND" : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these error classes be put together with subclasses? like
INVALID_LABEL.ITERATE_FOR_COMPOUND / INVALID_LABEL.IN_STATEMENT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Handle LEAVE or ITERATE statement if it has been encountered.
retStmt match {
case leaveStatementExec: LeaveStatementExec if !leaveStatementExec.hasBeenMatched =>
if (label.contains(leaveStatementExec.label)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we use contains instead of equals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not contains like in Strings, it's from the perspective of Option, i.e. it means whether the value within the Option is leaveStatementExec.label

@miland-db
Copy link
Contributor

LGTM

@davidm-db
Copy link
Contributor Author

@cloud-fan @yaooqinn @dongjoon-hyun can someone merge this now, all tests are fine? also, could you please close both JIRA items, because it seems that only the first one from the title gets linked to the PR.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9676b1c Sep 5, 2024
MaxGekk pushed a commit that referenced this pull request Sep 6, 2024
…nSqlScript in SQL Scripting Interpreter test suite

### What changes were proposed in this pull request?
Previous [pull request](#47973) introduced new tests to `SqlScriptingInterpreterSuite` (among others) where accidentally `parseScript` was used instead of `runSqlScript`.
While the same exception would get thrown (since it happens in the parsing phase) it violates the consistency among the tests in this suite and adds unnecessary import, so it would be nice to change it.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This patch alters tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48016 from davidm-db/interpreter_test_suite_fix.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
a0x8o added a commit to a0x8o/spark that referenced this pull request Sep 6, 2024
…nSqlScript in SQL Scripting Interpreter test suite

### What changes were proposed in this pull request?
Previous [pull request](apache/spark#47973) introduced new tests to `SqlScriptingInterpreterSuite` (among others) where accidentally `parseScript` was used instead of `runSqlScript`.
While the same exception would get thrown (since it happens in the parsing phase) it violates the consistency among the tests in this suite and adds unnecessary import, so it would be nice to change it.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This patch alters tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48016 from davidm-db/interpreter_test_suite_fix.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk pushed a commit that referenced this pull request Sep 8, 2024
### What changes were proposed in this pull request?
Previous [pull request](#47973) introduced new logic for `LEAVE` and `ITERATE` statements, but I missed to introduce minor part of `ITERATE` statement handling.
While `ITERATE` statement is not allowed for compounds (i.e. user cannot use `LEAVE <label>` where label belongs to compound) it's still possible to use `ITERATE` statement within the compound (i.e. when using label of the surrounding `WHILE` loop for example). In such cases, it's required to drop variables defined in the compound (the same way as for `ITERATE` statement).
We don't yet have execution logic (I will work on POC and design doc during the next week), but this is setting up stuff now, so we don't miss anything in future.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added small test, but not completely related to this change. Since we don't have execution logic yet, it's hard to test. I will add more tests together with the execution logic.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48022 from davidm-db/missing_logic_for_leave_statement.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…ments

### What changes were proposed in this pull request?
This PR proposes introduction of `LEAVE` and `ITERATE` statement types to SQL Scripting language:
- `LEAVE` statement can be used in loops, as well as in `BEGIN ... END` compound blocks.
- `ITERATE` statement can be used only in loops.

This PR introduces:
- Logical operators for both statement types.
- Execution nodes for both statement types.
- Interpreter changes required to build execution plans that support new statement types.
- New error if statements are not used properly.
- Minor changes required to support new keywords.

### Why are the changes needed?
Adding support for new statement types to SQL Scripting language.

### Does this PR introduce _any_ user-facing change?
This PR introduces new statement types that will be available to users. However, script execution logic hasn't been done yet, so the new changes are not accessible by users yet.

### How was this patch tested?
Tests are introduced to all test suites related to SQL scripting.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47973 from davidm-db/sql_scripting_leave_iterate.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…nSqlScript in SQL Scripting Interpreter test suite

### What changes were proposed in this pull request?
Previous [pull request](apache#47973) introduced new tests to `SqlScriptingInterpreterSuite` (among others) where accidentally `parseScript` was used instead of `runSqlScript`.
While the same exception would get thrown (since it happens in the parsing phase) it violates the consistency among the tests in this suite and adds unnecessary import, so it would be nice to change it.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This patch alters tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48016 from davidm-db/interpreter_test_suite_fix.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
### What changes were proposed in this pull request?
Previous [pull request](apache#47973) introduced new logic for `LEAVE` and `ITERATE` statements, but I missed to introduce minor part of `ITERATE` statement handling.
While `ITERATE` statement is not allowed for compounds (i.e. user cannot use `LEAVE <label>` where label belongs to compound) it's still possible to use `ITERATE` statement within the compound (i.e. when using label of the surrounding `WHILE` loop for example). In such cases, it's required to drop variables defined in the compound (the same way as for `ITERATE` statement).
We don't yet have execution logic (I will work on POC and design doc during the next week), but this is setting up stuff now, so we don't miss anything in future.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added small test, but not completely related to this change. Since we don't have execution logic yet, it's hard to test. I will add more tests together with the execution logic.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48022 from davidm-db/missing_logic_for_leave_statement.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…ments

### What changes were proposed in this pull request?
This PR proposes introduction of `LEAVE` and `ITERATE` statement types to SQL Scripting language:
- `LEAVE` statement can be used in loops, as well as in `BEGIN ... END` compound blocks.
- `ITERATE` statement can be used only in loops.

This PR introduces:
- Logical operators for both statement types.
- Execution nodes for both statement types.
- Interpreter changes required to build execution plans that support new statement types.
- New error if statements are not used properly.
- Minor changes required to support new keywords.

### Why are the changes needed?
Adding support for new statement types to SQL Scripting language.

### Does this PR introduce _any_ user-facing change?
This PR introduces new statement types that will be available to users. However, script execution logic hasn't been done yet, so the new changes are not accessible by users yet.

### How was this patch tested?
Tests are introduced to all test suites related to SQL scripting.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47973 from davidm-db/sql_scripting_leave_iterate.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…nSqlScript in SQL Scripting Interpreter test suite

### What changes were proposed in this pull request?
Previous [pull request](apache#47973) introduced new tests to `SqlScriptingInterpreterSuite` (among others) where accidentally `parseScript` was used instead of `runSqlScript`.
While the same exception would get thrown (since it happens in the parsing phase) it violates the consistency among the tests in this suite and adds unnecessary import, so it would be nice to change it.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This patch alters tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48016 from davidm-db/interpreter_test_suite_fix.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Previous [pull request](apache#47973) introduced new logic for `LEAVE` and `ITERATE` statements, but I missed to introduce minor part of `ITERATE` statement handling.
While `ITERATE` statement is not allowed for compounds (i.e. user cannot use `LEAVE <label>` where label belongs to compound) it's still possible to use `ITERATE` statement within the compound (i.e. when using label of the surrounding `WHILE` loop for example). In such cases, it's required to drop variables defined in the compound (the same way as for `ITERATE` statement).
We don't yet have execution logic (I will work on POC and design doc during the next week), but this is setting up stuff now, so we don't miss anything in future.

### Why are the changes needed?
Changes are minor, they improve consistency among test suites for SQL scripting.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added small test, but not completely related to this change. Since we don't have execution logic yet, it's hard to test. I will add more tests together with the execution logic.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48022 from davidm-db/missing_logic_for_leave_statement.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants