Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45033][SQL] Support maps by parameterized sql() #42752

Closed
wants to merge 7 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 31, 2023

What changes were proposed in this pull request?

In the PR, I propose to allow a few more column expressions as parameters of sql() to construct maps/array: CreateArray, MapFromArrays, CreateMap, MapFromArraysandMapFromEntries`. Need to allow such expression because Spark SQL doesn't support constructing literals of the map type.

Why are the changes needed?

To improve user experience with Spark SQL, and support the rest built-in type.

Does this PR introduce any user-facing change?

No. It extends the existing API.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *ParametersSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

…ark-proto` uber jar to test the `connect` module"

This reverts commit df63adf.
@MaxGekk MaxGekk changed the title [WIP][SQL] Support maps constructed from arrays in parameterized sql() [WIP][SPARK-45033][SQL] Support maps constructed from arrays in parameterized sql() Aug 31, 2023
@MaxGekk MaxGekk changed the title [WIP][SPARK-45033][SQL] Support maps constructed from arrays in parameterized sql() [SPARK-45033][SQL] Support maps constructed from arrays in parameterized sql() Aug 31, 2023
@MaxGekk MaxGekk requested a review from cloud-fan August 31, 2023 18:22
@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 1, 2023

@cloud-fan Could you review this PR, please.

@MaxGekk MaxGekk changed the title [SPARK-45033][SQL] Support maps constructed from arrays in parameterized sql() [SPARK-45033][SQL] Support maps by parameterized sql() Sep 2, 2023
@@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, args: Array[Expression])
*/
object BindParameters extends Rule[LogicalPlan] with QueryErrorsBase {
private def checkArgs(args: Iterable[(String, Expression)]): Unit = {
args.find(!_._2.isInstanceOf[Literal]).foreach { case (name, expr) =>
def isNotAllowed(expr: Expression): Boolean = expr.exists {
case _: Literal | _: CreateArray | _: MapFromArrays | _: CreateMap => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should recursively check the children of CreateArray, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_.exists checks the children recursively, doesn't it? So, we allow mix of arrays, maps and literals.

Copy link
Member Author

@MaxGekk MaxGekk Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add an negative test with a deep nested prohibited expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! It's TreeNode#exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add CreateStruct? It's weird that we only leave it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. How about to support MapFromEntries as well? And maybe CreateNamedStruct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateNamedStruct is the actual expression for CreateStruct. MapFromEntries also look fine to me.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 4, 2023

Merging to master. Thank you, @cloud-fan and @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in 4162076 Sep 4, 2023
MaxGekk added a commit that referenced this pull request Sep 18, 2023
…`sql()`

### What changes were proposed in this pull request?
In the PR, I propose to extend the proto protocol, and add two more fields to `SqlCommand`: `arguments` and `pos_arguments`. This should allow to construct arrays/maps/structs by the `map()`, `array()`, `struct()`, `map_from_arrays()` and `map_from_entries()`, and pass them to `sql()` as arguments.

### Why are the changes needed?
To fix the issue demonstrated by the example:
```scala
spark.sql("select element_at(?, 1)", Array(array(lit(1)))).collect()
java.lang.UnsupportedOperationException: literal unresolved_function {
[info]   function_name: "array"
[info]   arguments {
[info]     literal {
[info]       integer: 1
[info]     }
[info]   }
[info] }
[info]  not supported (yet).
```

Note: The PR #42752 supported constructors, and this PR extends the feature on Scala connect client.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By new checks:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.ClientE2ETestSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42931 from MaxGekk/fix-parameterized-sql-connect-map.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk added a commit that referenced this pull request Sep 18, 2023
…ed `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to update some error formats and comments regarding `sql()` parameters - maps, arrays and struct might be used as `sql()` parameters. New behaviour has been added by #42752.

### Why are the changes needed?
To inform users about recent changes introduced by SPARK-45033.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt "core/testOnly *SparkThrowableSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42957 from MaxGekk/clean-ClientE2ETestSuite.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk added a commit that referenced this pull request Sep 20, 2023
…`sql()`

### What changes were proposed in this pull request?
In the PR, I propose to add a few more examples for the `sql()` method in PySpark API with array and map parameters.

### Why are the changes needed?
To inform users about recent changes introduced by #42752 and #42470, and check the changes work actually.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new examples:
```
$ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session SparkSession.sql'
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42996 from MaxGekk/map-sql-parameterized-python-connect.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants