Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43922] Add named parameter support in parser for function calls #41429

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c7215fb
[SPARK-43922] Add named parameter support in parser for function calls
learningchess2003 Jun 1, 2023
5e168de
Adding some compilation fixes
learningchess2003 Jun 6, 2023
e044d14
Adding tests
learningchess2003 Jun 12, 2023
6e44c55
Adding table valued function tests
learningchess2003 Jun 12, 2023
246d5c1
Merge branch 'master' into SPARK-43922
learningchess2003 Jun 12, 2023
20b9b93
Adding error analysis test
learningchess2003 Jun 12, 2023
12c814b
Addressing some comments
learningchess2003 Jun 12, 2023
95b3f20
Removing file
learningchess2003 Jun 12, 2023
8312980
Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr…
learningchess2003 Jun 12, 2023
30a72e9
Adding some minor comments
learningchess2003 Jun 12, 2023
515d153
Comment
learningchess2003 Jun 12, 2023
7a8f4ad
Adding suggested tests
learningchess2003 Jun 12, 2023
23f9c98
Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr…
learningchess2003 Jun 12, 2023
f53dab7
Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr…
learningchess2003 Jun 12, 2023
b26fd43
Some changes
learningchess2003 Jun 12, 2023
3ae0ee6
Adding test and error
learningchess2003 Jun 13, 2023
4eff719
Making conversion
learningchess2003 Jun 13, 2023
4038773
Updating test to reflect error spec
learningchess2003 Jun 13, 2023
3209f60
Addressing some comments
learningchess2003 Jun 14, 2023
23075d5
Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr…
learningchess2003 Jun 20, 2023
c879012
Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/pars…
learningchess2003 Jun 20, 2023
c60f1a4
Update sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/pars…
learningchess2003 Jun 20, 2023
208d005
Update sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryC…
learningchess2003 Jun 20, 2023
90dafcb
Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/pars…
learningchess2003 Jun 20, 2023
dc868e7
Adding addressed comments
learningchess2003 Jun 20, 2023
25775f7
Merge branch 'master' into SPARK-43922
learningchess2003 Jun 21, 2023
cb0d011
Merge branch 'master' into SPARK-43922
learningchess2003 Jun 22, 2023
9a5f74e
Fixing scalastyle rebase error
learningchess2003 Jun 22, 2023
566c994
Fixing some problems
learningchess2003 Jun 22, 2023
65d9d95
Fixing indentation
learningchess2003 Jun 27, 2023
7653225
Update core/src/main/resources/error/error-classes.json
learningchess2003 Jun 27, 2023
f071073
Addressing comments
learningchess2003 Jun 27, 2023
7d99b58
Add golden file for end-to-end tests
learningchess2003 Jun 28, 2023
345b96d
Remove unnecessary line
learningchess2003 Jun 28, 2023
704dcad
Line deletion
learningchess2003 Jun 29, 2023
01c3af9
Reverting error class change
learningchess2003 Jun 29, 2023
c6ca3bd
Try
learningchess2003 Jun 29, 2023
da49eed
Moving stuff around
learningchess2003 Jun 29, 2023
5735732
Original
learningchess2003 Jun 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions core/src/main/resources/error/error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -1580,6 +1580,11 @@
"Not allowed to implement multiple UDF interfaces, UDF class <className>."
]
},
"NAMED_ARGUMENTS_SUPPORT_DISABLED" : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it can be enabled. If so, could you describe how in the error message.

... are not supported here

If it is just not supported, could you add a sub-class to UNSUPPORTED_FEATURE

"message" : [
"Cannot call function ‘<functionName>’ because named argument references are not supported here. In this case, the named argument reference was ‘<argument>’.\n"
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
]
},
"NESTED_AGGREGATE_FUNCTION" : {
"message" : [
"It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,7 @@ CONCAT_PIPE: '||';
HAT: '^';
COLON: ':';
ARROW: '->';
FAT_ARROW : '=>';
HENT_START: '/*+';
HENT_END: '*/';

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -789,7 +789,7 @@ inlineTable
;

functionTable
: funcName=functionName LEFT_PAREN (expression (COMMA expression)*)? RIGHT_PAREN tableAlias
: funcName=functionName LEFT_PAREN (functionArgument (COMMA functionArgument)*)? RIGHT_PAREN tableAlias
;

tableAlias
Expand Down Expand Up @@ -862,6 +862,15 @@ expression
: booleanExpression
;

namedArgumentExpression
: key=identifier FAT_ARROW value=expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifier can be quoted and unquoted
Shall we make it IDENTIFIER(unquoted) only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang After looking at the design doc, would it be fine to keep it like this?

;

functionArgument
: expression
| namedArgumentExpression
;

expressionSeq
: expression (COMMA expression)*
;
Expand Down Expand Up @@ -921,7 +930,8 @@ primaryExpression
| LEFT_PAREN namedExpression (COMMA namedExpression)+ RIGHT_PAREN #rowConstructor
| LEFT_PAREN query RIGHT_PAREN #subqueryExpression
| IDENTIFIER_KW LEFT_PAREN expression RIGHT_PAREN #identifierClause
| functionName LEFT_PAREN (setQuantifier? argument+=expression (COMMA argument+=expression)*)? RIGHT_PAREN
| functionName LEFT_PAREN (setQuantifier? argument+=functionArgument
(COMMA argument+=functionArgument)*)? RIGHT_PAREN
(FILTER LEFT_PAREN WHERE where=booleanExpression RIGHT_PAREN)?
(nullsOption=(IGNORE | RESPECT) NULLS)? ( OVER windowSpec)? #functionCall
| identifier ARROW expression #lambda
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.expressions

import org.apache.spark.sql.types.DataType

/**
* This represents an argument expression to a function call accompanied with an
* explicit reference to the corresponding argument name as a string. In this way,
* the analyzer can make sure that the provided values match up to the arguments
* as intended, and the arguments may appear in any order.
* This unary expression is unevaluable because we intend to replace it with
* the provided value itself during query analysis (after possibly rearranging
* the parsed argument list to match up the names to the expected function
* signature).
*
* SQL Syntax: key => value
* SQL grammar: key=identifier FAT_ARROW value=expression
*
* Example usage with the "encode" scalar function:
* SELECT encode("abc", charset => "utf-8");
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
* The second argument generates NamedArgumentExpression("charset", Literal("utf-8"))
* SELECT encode(charset => "utf-8", value => "abc");
*
* @param key The name of the function argument
* @param value The value of the function argument
*/
case class NamedArgumentExpression(key: String, value: Expression)
extends UnaryExpression with Unevaluable {
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
override def nullable: Boolean = value.nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove this line as UnaryExpression already provides an implementation of nullable


override def dataType: DataType = value.dataType

override def toString: String = s"$key => $value"

// NamedArgumentExpression has a single child, which is its value expression,
// so the value expression can be resolved by Analyzer rules recursively.
// For example, when the value is a built-in function expression,
// it must be resolved by [[ResolveFunctions]]
override def child: Expression = value

override protected def withNewChildInternal(newChild: Expression): Expression =
copy(value = newChild)
}
Original file line number Diff line number Diff line change
Expand Up @@ -1544,8 +1544,20 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] with SQLConfHelper wit
if (name.length > 1) {
throw QueryParsingErrors.invalidTableValuedFunctionNameError(name, ctx)
}
val args = func.functionArgument.asScala.map { e =>
Option(e.namedArgumentExpression).map { n =>
if (SQLConf.get.getConf(SQLConf.ALLOW_NAMED_FUNCTION_ARGUMENTS)) {
NamedArgumentExpression(n.key.getText, expression(n.value))
} else {
throw QueryCompilationErrors.namedArgumentsNotEnabledError(
func.functionName.getText, n.key.getText)
}
}.getOrElse {
expression(e)
}
}.toSeq

val tvf = UnresolvedTableValuedFunction(name, func.expression.asScala.map(expression).toSeq)
val tvf = UnresolvedTableValuedFunction(name, args)

val tvfAliases = if (aliases.nonEmpty) UnresolvedTVFAliases(name, tvf, aliases) else tvf

Expand Down Expand Up @@ -2176,7 +2188,18 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] with SQLConfHelper wit
val name = ctx.functionName.getText
val isDistinct = Option(ctx.setQuantifier()).exists(_.DISTINCT != null)
// Call `toSeq`, otherwise `ctx.argument.asScala.map(expression)` is `Buffer` in Scala 2.13
val arguments = ctx.argument.asScala.map(expression).toSeq match {
val arguments = ctx.argument.asScala.map { e =>
Option(e.namedArgumentExpression).map { n =>
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
if (SQLConf.get.getConf(SQLConf.ALLOW_NAMED_FUNCTION_ARGUMENTS)) {
NamedArgumentExpression(n.key.getText, expression(n.value))
} else {
throw QueryCompilationErrors.namedArgumentsNotEnabledError(
name, n.key.getText)
}
}.getOrElse {
expression(e)
}
}.toSeq match {
case Seq(UnresolvedStar(None))
if name.toLowerCase(Locale.ROOT) == "count" && !isDistinct =>
// Transform COUNT(*) into COUNT(1).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,13 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase {
messageParameters = Map("configKey" -> configKey))
}

def namedArgumentsNotEnabledError(functionName: String, argumentName: String): Throwable = {
new AnalysisException(
errorClass = "NAMED_ARGUMENTS_SUPPORT_DISABLED",
messageParameters = Map("functionName" -> functionName, "argument" -> argumentName)
)
}

def unresolvedUsingColForJoinError(
colName: String, suggestion: String, side: String): Throwable = {
new AnalysisException(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,13 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val ALLOW_NAMED_FUNCTION_ARGUMENTS = buildConf("spark.sql.allowNamedFunctionArguments")
.doc("If true, Spark will turn on support for named parameters for all functions that has" +
" it implemented.")
.version("3.5.0")
.booleanConf
.createWithDefault(true)

val DYNAMIC_PARTITION_PRUNING_ENABLED =
buildConf("spark.sql.optimizer.dynamicPartitionPruning.enabled")
.doc("When true, we will generate predicate for partition column when it's used as join key")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.FunctionIdentifier
import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, _}
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last}
import org.apache.spark.sql.catalyst.plans.logical.{OneRowRelation, Project}
import org.apache.spark.sql.catalyst.util.{DateTimeTestUtils, IntervalUtils}
import org.apache.spark.sql.catalyst.util.DateTimeConstants._
import org.apache.spark.sql.internal.SQLConf
Expand Down Expand Up @@ -329,6 +330,27 @@ class ExpressionParserSuite extends AnalysisTest {
parameters = Map("error" -> "'x'", "hint" -> ": extra input 'x'"))
}

test("function expressions with named arguments") {
assertEqual("encode(value => 'abc', charset => 'utf-8')",
$"encode".function(NamedArgumentExpression("value", Literal("abc")),
NamedArgumentExpression("charset", Literal("utf-8"))))
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
assertEqual("encode('abc', charset => 'utf-8')",
$"encode".function(Literal("abc"),
NamedArgumentExpression("charset", Literal("utf-8"))))
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
assertEqual("encode(charset => 'utf-8', 'abc')",
$"encode".function(
NamedArgumentExpression("charset", Literal("utf-8")), Literal("abc")))
assertEqual("encode('abc', charset => 'utf' || '-8')",
$"encode".function(Literal("abc"),
NamedArgumentExpression("charset",
Concat(Literal("utf") :: Literal("-8") :: Nil))))
val unresolvedAlias = Project(Seq(UnresolvedAlias(Literal("1"))), OneRowRelation())
assertEqual("encode(value => ((select '1')), charset => 'utf-8')",
$"encode".function(
NamedArgumentExpression("value", ScalarSubquery(plan = unresolvedAlias)),
NamedArgumentExpression("charset", Literal("utf-8"))))
}

private def lv(s: Symbol) = UnresolvedNamedLambdaVariable(Seq(s.name))

test("lambda functions") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1412,6 +1412,36 @@ class PlanParserSuite extends AnalysisTest {
assertEqual("select a, b from db.c; ;; ;", table("db", "c").select($"a", $"b"))
}

test("table valued function with named arguments") {
// All named arguments
assertEqual(
"select * from my_tvf(arg1 => 'value1', arg2 => true)",
UnresolvedTableValuedFunction("my_tvf",
NamedArgumentExpression("arg1", Literal("value1")) ::
NamedArgumentExpression("arg2", Literal(true)) :: Nil).select(star()))

// Unnamed and named arguments
assertEqual(
"select * from my_tvf(2, arg1 => 'value1', arg2 => true)",
UnresolvedTableValuedFunction("my_tvf",
Literal(2) ::
NamedArgumentExpression("arg1", Literal("value1")) ::
NamedArgumentExpression("arg2", Literal(true)) :: Nil).select(star()))

// Mixed arguments
assertEqual(
"select * from my_tvf(arg1 => 'value1', 2, arg2 => true)",
UnresolvedTableValuedFunction("my_tvf",
NamedArgumentExpression("arg1", Literal("value1")) ::
Literal(2) ::
NamedArgumentExpression("arg2", Literal(true)) :: Nil).select(star()))

assertEqual(
"select * from my_tvf(group => 'abc')",
UnresolvedTableValuedFunction("my_tvf",
NamedArgumentExpression("group", Literal("abc")) :: Nil).select(star()))
}

test("SPARK-32106: TRANSFORM plan") {
// verify schema less
assertEqual(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ import org.apache.spark._
import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, QueryTest, Row, SaveMode}
import org.apache.spark.sql.catalyst.FunctionIdentifier
import org.apache.spark.sql.catalyst.analysis.{Parameter, UnresolvedGenerator}
import org.apache.spark.sql.catalyst.expressions.{Grouping, Literal, RowNumber}
import org.apache.spark.sql.catalyst.expressions.{Grouping, Literal, NamedArgumentExpression, RowNumber}
import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
import org.apache.spark.sql.catalyst.expressions.objects.InitializeJavaBean
Expand Down Expand Up @@ -837,6 +837,18 @@ class QueryExecutionErrorsSuite
sqlState = "XX000")
}

test("INTERNAL_ERROR: Calling eval on Unevaluable NamedArgumentExpression") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test it. Unevaluable is not added by this PR and has been there for a long time.

val e = intercept[SparkException] {
NamedArgumentExpression("arg0", Literal("value0")).eval()
}
checkError(
exception = e,
errorClass = "INTERNAL_ERROR",
parameters = Map("message" -> "Cannot evaluate expression: arg0 => value0"),
sqlState = "XX000"
)
}

test("INTERNAL_ERROR: Calling doGenCode on unresolved") {
val e = intercept[SparkException] {
val ctx = new CodegenContext
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,27 @@ package org.apache.spark.sql.errors
import org.apache.spark.SparkThrowable
import org.apache.spark.sql.QueryTest
import org.apache.spark.sql.catalyst.parser.ParseException
import org.apache.spark.sql.catalyst.plans.SQLHelper
import org.apache.spark.sql.test.SharedSparkSession

// Turn of the length check because most of the tests check entire error messages
// scalastyle:off line.size.limit
class QueryParsingErrorsSuite extends QueryTest with SharedSparkSession {
class QueryParsingErrorsSuite extends QueryTest with SharedSparkSession with SQLHelper {

private def parseException(sqlText: String): SparkThrowable = {
intercept[ParseException](sql(sqlText).collect())
}

test("NAMED_ARGUMENTS_SUPPORT_DISABLED: named arguments not turned on") {
withSQLConf("spark.sql.allowNamedFunctionArguments" -> "false") {
checkError(
exception = parseException("SELECT * FROM encode(value => 'abc', charset => 'utf-8')"),
learningchess2003 marked this conversation as resolved.
Show resolved Hide resolved
errorClass = "NAMED_ARGUMENTS_SUPPORT_DISABLED",
parameters = Map("functionName" -> "encode", "argument" -> "value")
)
}
}

test("UNSUPPORTED_FEATURE: LATERAL join with NATURAL join not supported") {
checkError(
exception = parseException("SELECT * FROM t1 NATURAL JOIN LATERAL (SELECT c1 + c2 AS c2)"),
Expand Down Expand Up @@ -368,6 +379,25 @@ class QueryParsingErrorsSuite extends QueryTest with SharedSparkSession {
parameters = Map("error" -> "end of input", "hint" -> ""))
}

def checkParseSyntaxError(sqlCommand: String, errorString: String, hint: String = ""): Unit = {
checkError(
exception = parseException(sqlCommand),
errorClass = "PARSE_SYNTAX_ERROR",
sqlState = "42601",
parameters = Map("error" -> errorString, "hint" -> hint)
)
}

test("PARSE_SYNTAX_ERROR: named arguments invalid syntax") {
checkParseSyntaxError("select * from my_tvf(arg1 ==> 'value1')", "'>'")
checkParseSyntaxError("select * from my_tvf(arg1 = => 'value1')", "'=>'")
checkParseSyntaxError("select * from my_tvf((arg1 => 'value1'))", "'=>'")
checkParseSyntaxError("select * from my_tvf(arg1 => )", "')'")
checkParseSyntaxError("select * from my_tvf(arg1 => , 42)", "','")
checkParseSyntaxError("select * from my_tvf(my_tvf.arg1 => 'value1')", "'=>'")
checkParseSyntaxError("select * from my_tvf(arg1 => table t1)", "'t1'", hint = ": extra input 't1'")
}

test("PARSE_SYNTAX_ERROR: extraneous input") {
checkError(
exception = parseException("select 1 1"),
Expand Down