Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48342][SQL] Introduction of SQL Scripting Parser #46665

Closed
wants to merge 20 commits into from

Conversation

davidm-db
Copy link
Contributor

@davidm-db davidm-db commented May 20, 2024

What changes were proposed in this pull request?

This PR proposes changes to SQL parser to introduce support for SQL scripting statements:

  • Introduces BEGIN keyword to the lexer.
  • Changes grammar to support compound statements.
  • Exposes new parseScript method in ParserInterface.
  • Implements visitor functions for new nodes in the syntax tree in AstBuilder.
  • Introduces new logical operators in SqlScriptingLogicalOperators for compound statements that are created by visiting functions and that will be used during interpretation phase.

In order to simplify the process, in this PR we only introduce the support for compound statements to the SQL parser.
Follow-up PRs will introduce interpreter, further statements, support for exceptions thrown from parser/interpreter, etc.
More details can be found in Jira item for this task and its parent (where the design doc is uploaded as well).

Why are the changes needed?

The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL.

Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL.

Does this PR introduce any user-facing change?

No.
This PR is a first in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

How was this patch tested?

There are tests in SqlScriptingParserSuite that test the newly introduced parser changes.

Was this patch authored or co-authored using generative AI tooling?

No.

@davidm-db davidm-db changed the title SQL Batch Lang Parser [SPARK-48342] SQL Batch Lang Parser May 20, 2024
@davidm-db davidm-db changed the title [SPARK-48342] SQL Batch Lang Parser [SPARK-48342] Introduction of SQL Scripting Parser May 21, 2024
@davidm-db davidm-db changed the title [SPARK-48342] Introduction of SQL Scripting Parser [SPARK-48342][SQL] Introduction of SQL Scripting Parser May 21, 2024
@davidm-db davidm-db marked this pull request as ready for review May 21, 2024 15:17
@github-actions github-actions bot added the DOCS label May 22, 2024
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@melin
Copy link

melin commented May 27, 2024

Multiple statements include set statements if the next statement of the set statement has a syntax error. The analysis is wrong,

BEGIN
    SET hive.exec.dynamic.partition.mode = 'nonstrict';
    CREATE TABLE IF NOT EXISTS dws_spu_user_type_agg
    (
        spu_code                   STRING
        ,detail_channelSTRING
    )
    PARTITIONED BY (pt STRING)
    STORED AS ORC
END;

image

@davidm-db
Copy link
Contributor Author

@melin we are aware of this problem, it already exists without SQL scripting... example: using SET VAR to set a SQL variable with syntax error after the = (let's say SELECT query with some syntax error) won't throw syntax error, but will rather match the SET .*? rule of setConfiguration and return completely unrelated exception...

We have a follow-up PR, that will define behavior of SET within the SQL scripts. Per standard, we should make VAR | VARIABLE to not be required in SQL scripts for SET statement. SET .*? rule creates a big problem for this approach, so we decided that for the start we will remove those rules from SQL scripts and allow only SET for SQL variables. If anyone wants to set config, they can do it through EXECUTE IMMEDIATE.

@melin
Copy link

melin commented May 29, 2024

Can remove: SET.*? , or optimize the set syntax definition?

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the discussion!

val visitedChild = visit(ctx.getChild(i))
visitedChild match {
case statement: CompoundPlanStatement => buff += statement
case null if child.isInstanceOf[TerminalNodeImpl] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the child not be an instance of TerminalNodeImpl? If this is enforced by the parser, then you can just remove the check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I agree. We had only case null => but it was very unclear what it meant, so we decided to add this check.

This way, if anything changes in the future, we would get exceptions in our tests if this null does not happen because the child is instance of TerminalNodeImpl.

Not sure if this is the best way to do it though. I guess that it's probably better to not visit child at all if it's a terminal node. I don't know how better to achieve this. Or maybe we are fine with leaving only case null => and adding comment that explains it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example SQL to trigger this null case?

Copy link
Contributor Author

@davidm-db davidm-db Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically happens within every SQL statement - TerminalNodeImpl corresponds to keywords from the SQL grammar, so for example if the query is like:
image

Then, on the top level, TerminalNodeImpl are semicolons:
image

And then deeper in the tree it could be a SELECT:
image

Or whichever other keyword.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved with the latest changes in visitCompoundBodyImpl.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM!

}

override def visitCompoundStatement(ctx: CompoundStatementContext): CompoundPlanStatement = {
val child = visit(ctx.getChild(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks very weird, do you mean ctx.statement() != null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser rule is

compoundStatement
    : statement
    | beginEndCompoundBlock
    ;

I think it should be very clear which one is matched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like checking if we are visiting statement or beginEndCompoundBlock first and then do the visit, instead of visiting first and than matching on the output type?
I will change it to this anyways because I think it makes more sense and it cleaner than getChild(0) etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

;

compoundBody
: (compoundStatement SEMICOLON)*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the scala side, (statements+=compoundStatement SEMICOLON)*

You can check out similar rules in this rule that use +=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is helpful... This will help us resolve the other comment for null check in visitCompoundBodyImpl as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

case body: CompoundBody => body
case _ =>
val position = Origin(None, None)
throw QueryParsingErrors.sqlStatementUnsupportedError(sqlScriptText, position)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this really happen? Or it means bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we did the same as in parsePlan, but in my opinion this means bug... This would happen if parsing is successful, but something other than expected node (CompoundBody here) is returned, which I have no idea how it could happen. If something is bad, I would expect Syntax/Parsing exception to be thrown, otherwise the sqlScriptText should be parsed into CompoundBody. So, I don't expect this case to ever be hit, but we copied the parsePlan behavior.

Of course, I might be missing something here...

/**
* Logical operator representing result of parsing a single SQL statement
* that is supposed to be executed against Spark.
* It can also be a Spark expression that is wrapped in a statement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing, this seems to be accidental paste from POC PR and it was a leftover from some previous version there I think... Will remove the last sentence from the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

* It can also be a Spark expression that is wrapped in a statement.
* @param parsedPlan Result of SQL statement parsing.
* @param sourceStart Index of the first char of the statement in the original SQL script text.
* @param sourceEnd Index of the last char of the statement in the original SQL script text.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird put these in the constructor. Other plan nodes keep the SQL string context as a thread local Origin, can't we do the same? Check out the withOrigin calls in AstBuilder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware of this, thanks! Will change to using Origin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9898e9d Jun 18, 2024
HyukjinKwon pushed a commit that referenced this pull request Jun 20, 2024
### What changes were proposed in this pull request?
In [SQL Scripting parser PR](#46665) we didn't notice that import for `scala.collection.immutable.Seq` was added to `AstBuilder` (probably accidentally by IntelliJ). While this import is not unused per se, it is logically not needed (since Scala 2.13, `scala.Seq` is an alias for `scala.collection.immutable.Seq`).

Anyways, no enforcement was there before and it makes sense to leave it as such. It's not in any way required for SQL Scripting parser change.

### Why are the changes needed?
To remove accidentally added package enforcement that's not needed and wasn't used before.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Already existing tests cover this change.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47025 from davidm-db/sql_scripting_remove_import.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
vkorukanti pushed a commit to delta-io/delta that referenced this pull request Jun 24, 2024
…ges (#3294)

## Description
apache/spark#46665 introduced a new
`parseScript` method in `ParserInterface`, which broke compilation
against Spark master:
```
[error] /home/runner/work/delta/delta/spark/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala:75:7: class DeltaSqlParser needs to be abstract.
[error] Missing implementation for member of trait ParserInterface:
[error]   def parseScript(sqlScriptText: String): org.apache.spark.sql.catalyst.parser.CompoundBody = ???
[error] class DeltaSqlParser(val delegate: ParserInterface) extends ParserInterface {
[error]       ^
[warn] 100 warnings found
[error] one error found
[error] (spark / Compile / compileIncremental) Compilation failed
```

This PR fixes the issue by shimming the `ParserInterface` since
`parseScript` is not available in Spark 3.5
## How was this patch tested?
Existing DeltaSqlParserSuite
cloud-fan pushed a commit that referenced this pull request Jul 8, 2024
### What changes were proposed in this pull request?
Previous [PR](#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes:
- `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase:
  - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement.
  - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements.
  - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node.
- `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed.

Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc.
More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL.

Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL.

### Does this PR introduce _any_ user-facing change?
No.
This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:
- `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes.
- `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration).

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47026 from davidm-db/sql_scripting_interpreter.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
### What changes were proposed in this pull request?
Previous [PR](apache#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes:
- `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase:
  - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement.
  - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements.
  - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node.
- `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed.

Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc.
More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL.

Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL.

### Does this PR introduce _any_ user-facing change?
No.
This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:
- `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes.
- `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration).

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47026 from davidm-db/sql_scripting_interpreter.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jul 11, 2024
### What changes were proposed in this pull request?
Previous [PR1](#46665) and [PR2](#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes:

- Changes grammar to support labels at start and end of the compound statements.
- Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid.

More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements.

### Does this PR introduce any user-facing change?
No.
This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:

SqlScriptingParserSuite - unit tests for execution nodes.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47146 from miland-db/sql_batch_labels.

Lead-authored-by: David Milicevic <david.milicevic@databricks.com>
Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
biruktesf-db pushed a commit to biruktesf-db/spark that referenced this pull request Jul 11, 2024
### What changes were proposed in this pull request?
Previous [PR1](apache#46665) and [PR2](apache#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes:

- Changes grammar to support labels at start and end of the compound statements.
- Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid.

More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements.

### Does this PR introduce any user-facing change?
No.
This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:

SqlScriptingParserSuite - unit tests for execution nodes.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47146 from miland-db/sql_batch_labels.

Lead-authored-by: David Milicevic <david.milicevic@databricks.com>
Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?
Previous [PR](apache#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes:
- `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase:
  - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement.
  - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements.
  - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node.
- `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed.

Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc.
More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL.

Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL.

### Does this PR introduce _any_ user-facing change?
No.
This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:
- `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes.
- `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration).

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47026 from davidm-db/sql_scripting_interpreter.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?
Previous [PR1](apache#46665) and [PR2](apache#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes:

- Changes grammar to support labels at start and end of the compound statements.
- Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid.

More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements.

### Does this PR introduce any user-facing change?
No.
This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:

SqlScriptingParserSuite - unit tests for execution nodes.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47146 from miland-db/sql_batch_labels.

Lead-authored-by: David Milicevic <david.milicevic@databricks.com>
Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
In [SQL Scripting parser PR](apache#46665) we didn't notice that import for `scala.collection.immutable.Seq` was added to `AstBuilder` (probably accidentally by IntelliJ). While this import is not unused per se, it is logically not needed (since Scala 2.13, `scala.Seq` is an alias for `scala.collection.immutable.Seq`).

Anyways, no enforcement was there before and it makes sense to leave it as such. It's not in any way required for SQL Scripting parser change.

### Why are the changes needed?
To remove accidentally added package enforcement that's not needed and wasn't used before.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Already existing tests cover this change.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47025 from davidm-db/sql_scripting_remove_import.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Previous [PR](apache#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes:
- `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase:
  - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement.
  - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements.
  - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node.
- `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed.

Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc.
More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL.

Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL.

### Does this PR introduce _any_ user-facing change?
No.
This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:
- `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes.
- `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration).

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47026 from davidm-db/sql_scripting_interpreter.

Authored-by: David Milicevic <david.milicevic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Previous [PR1](apache#46665) and [PR2](apache#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes:

- Changes grammar to support labels at start and end of the compound statements.
- Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid.

More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well).

### Why are the changes needed?
The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements.

### Does this PR introduce any user-facing change?
No.
This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged.
In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.

### How was this patch tested?
There are tests for newly introduced parser changes:

SqlScriptingParserSuite - unit tests for execution nodes.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47146 from miland-db/sql_batch_labels.

Lead-authored-by: David Milicevic <david.milicevic@databricks.com>
Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants