-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48342][SQL] Introduction of SQL Scripting Parser #46665
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/BatchLangLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/BatchParserSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
...alyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
Multiple statements include set statements if the next statement of the set statement has a syntax error. The analysis is wrong, BEGIN
SET hive.exec.dynamic.partition.mode = 'nonstrict';
CREATE TABLE IF NOT EXISTS dws_spu_user_type_agg
(
spu_code STRING
,detail_channelSTRING
)
PARTITIONED BY (pt STRING)
STORED AS ORC
END; |
@melin we are aware of this problem, it already exists without SQL scripting... example: using We have a follow-up PR, that will define behavior of |
Can remove: SET.*? , or optimize the set syntax definition? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the discussion!
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
val visitedChild = visit(ctx.getChild(i)) | ||
visitedChild match { | ||
case statement: CompoundPlanStatement => buff += statement | ||
case null if child.isInstanceOf[TerminalNodeImpl] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would the child not be an instance of TerminalNodeImpl
? If this is enforced by the parser, then you can just remove the check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I agree. We had only case null =>
but it was very unclear what it meant, so we decided to add this check.
This way, if anything changes in the future, we would get exceptions in our tests if this null
does not happen because the child
is instance of TerminalNodeImpl
.
Not sure if this is the best way to do it though. I guess that it's probably better to not visit child
at all if it's a terminal node. I don't know how better to achieve this. Or maybe we are fine with leaving only case null =>
and adding comment that explains it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example SQL to trigger this null case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be resolved with the latest changes in visitCompoundBodyImpl
.
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM!
} | ||
|
||
override def visitCompoundStatement(ctx: CompoundStatementContext): CompoundPlanStatement = { | ||
val child = visit(ctx.getChild(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks very weird, do you mean ctx.statement() != null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser rule is
compoundStatement
: statement
| beginEndCompoundBlock
;
I think it should be very clear which one is matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like checking if we are visiting statement
or beginEndCompoundBlock
first and then do the visit, instead of visiting first and than matching on the output type?
I will change it to this anyways because I think it makes more sense and it cleaner than getChild(0)
etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
; | ||
|
||
compoundBody | ||
: (compoundStatement SEMICOLON)* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the scala side, (statements+=compoundStatement SEMICOLON)*
You can check out similar rules in this rule that use +=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is helpful... This will help us resolve the other comment for null check in visitCompoundBodyImpl
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
case body: CompoundBody => body | ||
case _ => | ||
val position = Origin(None, None) | ||
throw QueryParsingErrors.sqlStatementUnsupportedError(sqlScriptText, position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this really happen? Or it means bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we did the same as in parsePlan
, but in my opinion this means bug... This would happen if parsing is successful, but something other than expected node (CompoundBody
here) is returned, which I have no idea how it could happen. If something is bad, I would expect Syntax/Parsing exception to be thrown, otherwise the sqlScriptText
should be parsed into CompoundBody
. So, I don't expect this case to ever be hit, but we copied the parsePlan
behavior.
Of course, I might be missing something here...
/** | ||
* Logical operator representing result of parsing a single SQL statement | ||
* that is supposed to be executed against Spark. | ||
* It can also be a Spark expression that is wrapped in a statement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing, this seems to be accidental paste from POC PR and it was a leftover from some previous version there I think... Will remove the last sentence from the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
* It can also be a Spark expression that is wrapped in a statement. | ||
* @param parsedPlan Result of SQL statement parsing. | ||
* @param sourceStart Index of the first char of the statement in the original SQL script text. | ||
* @param sourceEnd Index of the last char of the statement in the original SQL script text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird put these in the constructor. Other plan nodes keep the SQL string context as a thread local Origin
, can't we do the same? Check out the withOrigin
calls in AstBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't aware of this, thanks! Will change to using Origin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
thanks, merging to master! |
### What changes were proposed in this pull request? In [SQL Scripting parser PR](#46665) we didn't notice that import for `scala.collection.immutable.Seq` was added to `AstBuilder` (probably accidentally by IntelliJ). While this import is not unused per se, it is logically not needed (since Scala 2.13, `scala.Seq` is an alias for `scala.collection.immutable.Seq`). Anyways, no enforcement was there before and it makes sense to leave it as such. It's not in any way required for SQL Scripting parser change. ### Why are the changes needed? To remove accidentally added package enforcement that's not needed and wasn't used before. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Already existing tests cover this change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47025 from davidm-db/sql_scripting_remove_import. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…ges (#3294) ## Description apache/spark#46665 introduced a new `parseScript` method in `ParserInterface`, which broke compilation against Spark master: ``` [error] /home/runner/work/delta/delta/spark/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala:75:7: class DeltaSqlParser needs to be abstract. [error] Missing implementation for member of trait ParserInterface: [error] def parseScript(sqlScriptText: String): org.apache.spark.sql.catalyst.parser.CompoundBody = ??? [error] class DeltaSqlParser(val delegate: ParserInterface) extends ParserInterface { [error] ^ [warn] 100 warnings found [error] one error found [error] (spark / Compile / compileIncremental) Compilation failed ``` This PR fixes the issue by shimming the `ParserInterface` since `parseScript` is not available in Spark 3.5 ## How was this patch tested? Existing DeltaSqlParserSuite
### What changes were proposed in this pull request? Previous [PR](#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes: - `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase: - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement. - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements. - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node. - `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed. Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL. Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL. ### Does this PR introduce _any_ user-facing change? No. This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: - `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes. - `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration). ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47026 from davidm-db/sql_scripting_interpreter. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Previous [PR](apache#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes: - `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase: - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement. - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements. - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node. - `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed. Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL. Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL. ### Does this PR introduce _any_ user-facing change? No. This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: - `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes. - `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration). ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47026 from davidm-db/sql_scripting_interpreter. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Previous [PR1](#46665) and [PR2](#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes: - Changes grammar to support labels at start and end of the compound statements. - Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements. ### Does this PR introduce any user-facing change? No. This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: SqlScriptingParserSuite - unit tests for execution nodes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47146 from miland-db/sql_batch_labels. Lead-authored-by: David Milicevic <david.milicevic@databricks.com> Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Previous [PR1](apache#46665) and [PR2](apache#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes: - Changes grammar to support labels at start and end of the compound statements. - Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements. ### Does this PR introduce any user-facing change? No. This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: SqlScriptingParserSuite - unit tests for execution nodes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47146 from miland-db/sql_batch_labels. Lead-authored-by: David Milicevic <david.milicevic@databricks.com> Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Previous [PR](apache#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes: - `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase: - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement. - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements. - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node. - `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed. Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL. Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL. ### Does this PR introduce _any_ user-facing change? No. This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: - `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes. - `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration). ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47026 from davidm-db/sql_scripting_interpreter. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Previous [PR1](apache#46665) and [PR2](apache#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes: - Changes grammar to support labels at start and end of the compound statements. - Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements. ### Does this PR introduce any user-facing change? No. This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: SqlScriptingParserSuite - unit tests for execution nodes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47146 from miland-db/sql_batch_labels. Lead-authored-by: David Milicevic <david.milicevic@databricks.com> Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? In [SQL Scripting parser PR](apache#46665) we didn't notice that import for `scala.collection.immutable.Seq` was added to `AstBuilder` (probably accidentally by IntelliJ). While this import is not unused per se, it is logically not needed (since Scala 2.13, `scala.Seq` is an alias for `scala.collection.immutable.Seq`). Anyways, no enforcement was there before and it makes sense to leave it as such. It's not in any way required for SQL Scripting parser change. ### Why are the changes needed? To remove accidentally added package enforcement that's not needed and wasn't used before. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Already existing tests cover this change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47025 from davidm-db/sql_scripting_remove_import. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Previous [PR](apache#46665) introduced parser changes for SQL Scripting. This PR is a follow-up to introduce the interpreter for SQL Scripting language and proposes the following changes: - `SqlScriptingExecutionNode` - introduces execution nodes for SQL scripting, used during interpretation phase: - `SingleStatementExec` - executable node for `SingleStatement` logical node; wraps logical plan of the single statement. - `CompoundNestedStatementIteratorExec` - implements base recursive iterator logic for all nesting statements. - `CompoundBodyExec` - concrete implementation of `CompoundNestedStatementIteratorExec` for `CompoundBody` logical node. - `SqlScriptingInterpreter` - introduces the interpreter for SQL scripts. Product of interpretation is the iterator over the statements that should be executed. Follow-up PRs will introduce further statements, support for exceptions thrown from parser/interpreter, exception handling in SQL, etc. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48343) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL. Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL. ### Does this PR introduce _any_ user-facing change? No. This PR is second in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: - `SqlScriptingExecutionNodeSuite` - unit tests for execution nodes. - `SqlScriptingInterpreterSuite` - tests for interpreter (with parser integration). ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47026 from davidm-db/sql_scripting_interpreter. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Previous [PR1](apache#46665) and [PR2](apache#46665) introduced parser and interpreter changes for SQL Scripting. This PR is a follow-up to introduce the concept of labels for SQL Scripting language and proposes the following changes: - Changes grammar to support labels at start and end of the compound statements. - Updates visitor functions for compound nodes in the syntax tree in AstBuilder to check if labels are present and valid. More details can be found in [Jira item](https://issues.apache.org/jira/browse/SPARK-48529) for this task and its parent (where the design doc is uploaded as well). ### Why are the changes needed? The intent is to add support for various SQL scripting concepts like loops, leave & iterate statements. ### Does this PR introduce any user-facing change? No. This PR is among first PRs in series of PRs that will introduce changes to sql() API to add support for SQL scripting, but for now, the API remains unchanged. In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts. ### How was this patch tested? There are tests for newly introduced parser changes: SqlScriptingParserSuite - unit tests for execution nodes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47146 from miland-db/sql_batch_labels. Lead-authored-by: David Milicevic <david.milicevic@databricks.com> Co-authored-by: Milan Dankovic <milan.dankovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR proposes changes to SQL parser to introduce support for SQL scripting statements:
BEGIN
keyword to the lexer.parseScript
method inParserInterface
.AstBuilder
.SqlScriptingLogicalOperators
for compound statements that are created by visiting functions and that will be used during interpretation phase.In order to simplify the process, in this PR we only introduce the support for compound statements to the SQL parser.
Follow-up PRs will introduce interpreter, further statements, support for exceptions thrown from parser/interpreter, etc.
More details can be found in Jira item for this task and its parent (where the design doc is uploaded as well).
Why are the changes needed?
The intent is to add support for SQL scripting (and stored procedures down the line). It gives users the ability to develop complex logic and ETL entirely in SQL.
Until now, users had to write verbose SQL statements or combine SQL + Python to efficiently write the logic. This is an effort to breach that gap and enable complex logic to be written entirely in SQL.
Does this PR introduce any user-facing change?
No.
This PR is a first in series of PRs that will introduce changes to
sql()
API to add support for SQL scripting, but for now, the API remains unchanged.In the future, the API will remain the same as well, but it will have new possibility to execute SQL scripts.
How was this patch tested?
There are tests in
SqlScriptingParserSuite
that test the newly introduced parser changes.Was this patch authored or co-authored using generative AI tooling?
No.