Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48990][SQL] Unified variable related SQL syntax keywords #47469

Closed
wants to merge 3 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jul 24, 2024

What changes were proposed in this pull request?

The pr aims to unified variable related SQL syntax keywords, enable syntax DECLARE (OR REPLACE)? ... and DROP TEMPORARY ... to support keyword: VAR (not only VARIABLE).

Why are the changes needed?

When setting variables, we support (VARIABLE | VAR), but when declaring and dropping variables, we only support the keyword VARIABLE (not support the keyword VAR)

image

setStatementWithOptionalVarKeyword
: SET (VARIABLE | VAR)? assignmentList #setVariableWithOptionalKeyword
| SET (VARIABLE | VAR)? LEFT_PAREN multipartIdentifierList RIGHT_PAREN EQ
LEFT_PAREN query RIGHT_PAREN #setVariableWithOptionalKeyword
;

| DECLARE (OR REPLACE)? VARIABLE?
identifierReference dataType? variableDefaultExpression? #createVariable
| DROP TEMPORARY VARIABLE (IF EXISTS)? identifierReference #dropVariable

The syntax seems a bit weird, inconsistent experience in SQL syntax related to variable usage by end-users, so I propose to unify it.

Does this PR introduce any user-facing change?

Yes, enable end-users to use variable related SQL with consistent keywords.

How was this patch tested?

Updated existed UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@panbingkun
Copy link
Contributor Author

cc @srielau @cloud-fan

@panbingkun panbingkun marked this pull request as ready for review July 24, 2024 07:06
@@ -26,9 +26,9 @@ DECLARE VARIABLE IF NOT EXISTS var1 INT;
DROP TEMPORARY VARIABLE IF EXISTS var1;

SET VARIABLE title = 'Drop Variable';
DECLARE VARIABLE var1 INT;
DECLARE VAR var1 INT;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update only one SQL in sql-session-variables.sql so that UT can cover it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a parser suite for variable? I think that's a better place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not maybe we should create one, like ShowTablesParserSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Currently we don't have a parser suite for variable, let me create one.
  • Additionally, there are already sql-session-variables.sql assisted tests available. Do we still need to create some UT like DeclareVariableSuiteBase, v1\DeclareVariableSuite, v2\DeclareVariableSuite ... ?
    (I'm not sure, it looks a bit duplicate, does it?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only support session variable today so we don't need cross-test with v1/v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@@ -215,9 +215,9 @@ statement
routineCharacteristics
RETURN (query | expression) #createUserDefinedFunction
| DROP TEMPORARY? FUNCTION (IF EXISTS)? identifierReference #dropFunction
| DECLARE (OR REPLACE)? VARIABLE?
| DECLARE (OR REPLACE)? (VARIABLE | VAR)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we create a common parser rule for VARIABLE | VAR, like namespace did?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, let me improve it.

Copy link
Member

@yaooqinn yaooqinn Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a common parser, using a lexer VARIABLE: 'VARIABLE' | 'VAR'; shall be better. Because a common parser can still be misused in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you mean by misuse is that the developer may have mistakenly used the keywords VARIABLE or VAR (directly use VARIABLE or VAR instead of variable) in the file SqlBaseParser.g4, right?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 34e65a8 Jul 24, 2024
ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
### What changes were proposed in this pull request?
The pr aims to unified `variable` related `SQL syntax` keywords, enable syntax `DECLARE (OR REPLACE)? ...` and `DROP TEMPORARY ...` to support keyword: `VAR` (not only `VARIABLE`).

### Why are the changes needed?
When `setting` variables, we support `(VARIABLE | VAR)`, but when `declaring` and `dropping` variables, we only support the keyword `VARIABLE` (not support the keyword `VAR`)

<img width="597" alt="image" src="https://github.com/user-attachments/assets/07084fef-4080-4410-a74c-e6001ae0a8fa">

https://github.com/apache/spark/blob/285489b0225004e918b6e937f7367e492292815e/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L68-L72

https://github.com/apache/spark/blob/285489b0225004e918b6e937f7367e492292815e/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L218-L220

The syntax seems `a bit weird`, `inconsistent experience` in SQL syntax related to variable usage by end-users, so I propose to `unify` it.

### Does this PR introduce _any_ user-facing change?
Yes, enable end-users to use `variable related SQL` with `consistent` keywords.

### How was this patch tested?
Updated existed UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47469 from panbingkun/SPARK-48990.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
fusheng-rd pushed a commit to fusheng-rd/spark that referenced this pull request Aug 6, 2024
### What changes were proposed in this pull request?
The pr aims to unified `variable` related `SQL syntax` keywords, enable syntax `DECLARE (OR REPLACE)? ...` and `DROP TEMPORARY ...` to support keyword: `VAR` (not only `VARIABLE`).

### Why are the changes needed?
When `setting` variables, we support `(VARIABLE | VAR)`, but when `declaring` and `dropping` variables, we only support the keyword `VARIABLE` (not support the keyword `VAR`)

<img width="597" alt="image" src="https://github.com/user-attachments/assets/07084fef-4080-4410-a74c-e6001ae0a8fa">

https://github.com/apache/spark/blob/285489b0225004e918b6e937f7367e492292815e/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L68-L72

https://github.com/apache/spark/blob/285489b0225004e918b6e937f7367e492292815e/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L218-L220

The syntax seems `a bit weird`, `inconsistent experience` in SQL syntax related to variable usage by end-users, so I propose to `unify` it.

### Does this PR introduce _any_ user-facing change?
Yes, enable end-users to use `variable related SQL` with `consistent` keywords.

### How was this patch tested?
Updated existed UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47469 from panbingkun/SPARK-48990.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
The pr aims to unified `variable` related `SQL syntax` keywords, enable syntax `DECLARE (OR REPLACE)? ...` and `DROP TEMPORARY ...` to support keyword: `VAR` (not only `VARIABLE`).

### Why are the changes needed?
When `setting` variables, we support `(VARIABLE | VAR)`, but when `declaring` and `dropping` variables, we only support the keyword `VARIABLE` (not support the keyword `VAR`)

<img width="597" alt="image" src="https://github.com/user-attachments/assets/07084fef-4080-4410-a74c-e6001ae0a8fa">

https://github.com/apache/spark/blob/285489b0225004e918b6e937f7367e492292815e/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L68-L72

https://github.com/apache/spark/blob/285489b0225004e918b6e937f7367e492292815e/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L218-L220

The syntax seems `a bit weird`, `inconsistent experience` in SQL syntax related to variable usage by end-users, so I propose to `unify` it.

### Does this PR introduce _any_ user-facing change?
Yes, enable end-users to use `variable related SQL` with `consistent` keywords.

### How was this patch tested?
Updated existed UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47469 from panbingkun/SPARK-48990.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants