Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24574][SQL] array_contains, array_position, array_remove and element_at functions deal with Column type #21581

Closed
wants to merge 8 commits into from

Conversation

chongguang
Copy link
Contributor

@chongguang chongguang commented Jun 17, 2018

What changes were proposed in this pull request?

For the function def array_contains(column: Column, value: Any): Column , if we pass the value parameter as a Column type, it will yield a runtime exception.

This PR proposes a pattern matching to detect if value is of type Column. If yes, it will use the .expr of the column, otherwise it will work as it used to.

Same thing for array_position, array_remove and element_at functions

How was this patch tested?

Unit test modified to cover this code change.

Ping @ueshin

@@ -3077,12 +3077,16 @@ object functions {
//////////////////////////////////////////////////////////////////////////////////////////////

/**
* Returns null if the array is null, true if the array contains `value`, and false otherwise.
* Returns null if the array is null, true if the array contains `value` or the content of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update this comment? I think content of value is a little ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the message. Do you want me to change the comment back? I see that you have started the test, is it too late?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can fix now

@maropu
Copy link
Member

maropu commented Jun 18, 2018

BTW, I found the other 3 similar issues there;

scala> Seq((Seq(1, 2, 3), 2)).toDF("a", "b").selectExpr("array_position(a, b)").show
+--------------------+
|array_position(a, b)|
+--------------------+
|                   2|
+--------------------+

scala> Seq((Seq(1, 2, 3), 2)).toDF("a", "b").selectExpr("element_at(a, b)").show
+----------------+
|element_at(a, b)|
+----------------+
|               2|
+----------------+

scala> Seq((Seq(1, 2, 3), 2)).toDF("a", "b").selectExpr("array_remove(a, b)").show
+------------------+
|array_remove(a, b)|
+------------------+
|            [1, 3]|
+------------------+

I think this is a tiny fix, so IMHO this pr might need to address all the issues here. cc: @ueshin

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92015 has finished for PR 21581 at commit 28aa515.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jun 18, 2018

retest this please

* @group collection_funcs
* @since 1.5.0
*/
def array_contains(column: Column, value: Any): Column = withExpr {
ArrayContains(column.expr, Literal(value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other similar expressions are ArrayPosition, ElementAt, ArrayRemove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just found @maropu has pointed it out in #21581 (comment).

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92017 has finished for PR 21581 at commit 28aa515.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chongguang chongguang changed the title [SPARK-24574][SQL] array_contains function deals with Column type [SPARK-24574][SQL] array_contains, array_position, array_remove and element_at functions deal with Column type Jun 18, 2018
@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92028 has finished for PR 21581 at commit e65611e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92030 has finished for PR 21581 at commit 8440d5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92029 has finished for PR 21581 at commit b6e150c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chongguang
Copy link
Contributor Author

@viirya @maropu @HyukjinKwon All the 4 functions have been modified.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the delay.
I left some comments. Thanks!

@@ -3082,7 +3082,10 @@ object functions {
* @since 1.5.0
*/
def array_contains(column: Column, value: Any): Column = withExpr {
ArrayContains(column.expr, Literal(value))
value match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, we should use lit(), e.g., ArrayContains(column.expr, lit(value).expr)? WDYT? @viirya @maropu @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's what I was thinking too from my glance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

checkAnswer(
df.select(array_contains(df("a"), df("c"))),
Seq(Row(true), Row(false))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another test to use selectExpr, e.g., df.selectExpr("array_contains(a, c)")?

@@ -3082,7 +3082,10 @@ object functions {
* @since 1.5.0
*/
def array_contains(column: Column, value: Any): Column = withExpr {
ArrayContains(column.expr, Literal(value))
value match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chongguang I mean, use just ArrayContains(column.expr, lit(value).expr) instead of value match { .... The lit() should handle literals and columns well.

@SparkQA
Copy link

SparkQA commented Jun 20, 2018

Test build #92126 has finished for PR 21581 at commit f8c5b43.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chongguang
Copy link
Contributor Author

chongguang commented Jun 20, 2018

@ueshin Done! :) But tests fail... can you launch the tests again please?

@HyukjinKwon
Copy link
Member

you could just push the newer changes. that will retrigger the tests.

@SparkQA
Copy link

SparkQA commented Jun 20, 2018

Test build #92136 has finished for PR 21581 at commit ddd94f7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jun 20, 2018

Jenkins, retest this please.

@ueshin
Copy link
Member

ueshin commented Jun 20, 2018

LGTM pending tests.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@SparkQA
Copy link

SparkQA commented Jun 20, 2018

Test build #92141 has finished for PR 21581 at commit ddd94f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jun 21, 2018

@chongguang I think this pr is ready to merge, so I tried, but seems like the commits in this pr aren't connected with your GitHub account. If you want to connect the merge commit to your account, could you let me know the email address connected to your account? Thanks!

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 21, 2018

oh haha FYI that works after it's merged if @chongguang link the email into his Github profile too. I asked the same thing in databricks/spark-xml before :)

@ueshin
Copy link
Member

ueshin commented Jun 21, 2018

But the emails of commits in this pr seem not valid, just for the local computer.

@HyukjinKwon
Copy link
Member

Oh, I see.

@chongguang
Copy link
Contributor Author

Hey @ueshin I just updated the email address linked to my github account, it is now lcg31439@gmail.com

Thanks

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants