Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY #48413

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dtenedor
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds SQL pipe syntax support for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY.

For example:

CREATE TABLE t(x INT, y STRING) USING CSV;
INSERT INTO t VALUES (0, 'abc'), (1, 'def');

TABLE t
|> ORDER BY x
|> LIMIT 1 OFFSET 1

1	def

Why are the changes needed?

The SQL pipe operator syntax will let users compose queries in a more flexible fashion.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.

Was this patch authored or co-authored using generative AI tooling?

No

@dtenedor dtenedor marked this pull request as ready for review October 10, 2024 16:47
@github-actions github-actions bot added the SQL label Oct 10, 2024
@dtenedor
Copy link
Contributor Author

cc @cloud-fan @gengliangwang this is the PR to support LIMIT/OFFSET + sorting. There are a few more changes in the AstBuilder for this one but still contained only in the parser.

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your review!

throw QueryParsingErrors.combinationQueryResultClausesUnsupportedError(ctx)
val allClauses = "ORDER BY/SORT BY/DISTRIBUTE BY/CLUSTER BY"
if (forPipeOperators) {
throw QueryParsingErrors.clausesWithPipeOperatorsUnsupportedError(ctx, allClauses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always call combinationQueryResultClausesUnsupportedError here, as this is not related to pipe. We don't support this combination in both pipe and classic SQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this is done.

val limitClause = "LIMIT"
if (forPipeOperators && clause.nonEmpty && clause != offsetClause) {
throw QueryParsingErrors.clausesWithPipeOperatorsUnsupportedError(
ctx, s"the $clause and $limitClause clauses")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, s"the $clause and $limitClause clauses")
ctx, s"the co-existence of $clause and $limitClause clauses")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this is done.

val withOffset = withWindow.optional(offset) {
if (forPipeOperators && clause.nonEmpty) {
throw QueryParsingErrors.clausesWithPipeOperatorsUnsupportedError(
ctx, s"the $clause and $offsetClause clauses")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, s"the $clause and $offsetClause clauses")
ctx, s"the co-existence of $clause and $offsetClause clauses")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this is done.

@@ -4926,9 +4926,14 @@
"Catalog <catalogName> does not support <operation>."
]
},
"CLAUSE_WITH_PIPE_OPERATORS" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @srielau can also comment here, but it looks to me that we have two new errors:

  1. WINDOW in the SQL pipe operator. We don't support window definition in SQL pipe yet.
  2. More than one QUERY_RESULT_CLAUSES in SQL pipe. SQL pipe is designed to specify one operator at a time, so we don't allow any combination, such as ORDER BY col LIMIT 1, or LIMIT 1 OFFSET 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. appears to be a 0A000 (unsupported feature)
  2. Isn't that a syntax error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, 2 is a syntax error for the pipe statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, split this into two new errors per suggestion.

respond to code review comments

respond to code review comments

respond to code review comments
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your reviews!

@@ -4926,9 +4926,14 @@
"Catalog <catalogName> does not support <operation>."
]
},
"CLAUSE_WITH_PIPE_OPERATORS" : {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, split this into two new errors per suggestion.


// OFFSET
// - OFFSET 0 is the same as omitting the OFFSET clause
val offsetClause = "OFFSET"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we put all the constant clauses into an object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it is good to keep magic strings out of the code.

@@ -5006,6 +5011,11 @@
"Multiple bucket TRANSFORMs."
]
},
"MULTIPLE_QUERY_RESULT_CLAUSES_WITH_PIPE_OPERATORS" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we turn it into a top-level error class? This is syntax error, not unsupported feature, and should have a different SQL state.

@@ -5006,6 +5011,11 @@
"Multiple bucket TRANSFORMs."
]
},
"MULTIPLE_QUERY_RESULT_CLAUSES_WITH_PIPE_OPERATORS" : {
"message" : [
"Syntax error: the SQL pipe operator syntax using |> does not support <clauses>. Please separate the multiple result clauses into separate pipe operators and then retry the query again."
Copy link
Contributor

@cloud-fan cloud-fan Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

<clause1> and <clause2> can not coexist in SQL pipe operator using '|>'. Please separate ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to make the error parameters simple, so that people can consume it programatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants