-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24574][SQL] array_contains, array_position, array_remove and element_at functions deal with Column type #21581
Conversation
@@ -3077,12 +3077,16 @@ object functions { | |||
////////////////////////////////////////////////////////////////////////////////////////////// | |||
|
|||
/** | |||
* Returns null if the array is null, true if the array contains `value`, and false otherwise. | |||
* Returns null if the array is null, true if the array contains `value` or the content of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update this comment? I think content of value
is a little ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the message. Do you want me to change the comment back? I see that you have started the test, is it too late?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can fix now
BTW, I found the other 3 similar issues there;
I think this is a tiny fix, so IMHO this pr might need to address all the issues here. cc: @ueshin |
ok to test |
Test build #92015 has finished for PR 21581 at commit
|
retest this please |
* @group collection_funcs | ||
* @since 1.5.0 | ||
*/ | ||
def array_contains(column: Column, value: Any): Column = withExpr { | ||
ArrayContains(column.expr, Literal(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other similar expressions are ArrayPosition
, ElementAt
, ArrayRemove
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I just found @maropu has pointed it out in #21581 (comment).
Test build #92017 has finished for PR 21581 at commit
|
Test build #92028 has finished for PR 21581 at commit
|
Test build #92030 has finished for PR 21581 at commit
|
Test build #92029 has finished for PR 21581 at commit
|
@viirya @maropu @HyukjinKwon All the 4 functions have been modified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the delay.
I left some comments. Thanks!
@@ -3082,7 +3082,10 @@ object functions { | |||
* @since 1.5.0 | |||
*/ | |||
def array_contains(column: Column, value: Any): Column = withExpr { | |||
ArrayContains(column.expr, Literal(value)) | |||
value match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, we should use lit()
, e.g., ArrayContains(column.expr, lit(value).expr)
? WDYT? @viirya @maropu @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's what I was thinking too from my glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
checkAnswer( | ||
df.select(array_contains(df("a"), df("c"))), | ||
Seq(Row(true), Row(false)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another test to use selectExpr
, e.g., df.selectExpr("array_contains(a, c)")
?
@@ -3082,7 +3082,10 @@ object functions { | |||
* @since 1.5.0 | |||
*/ | |||
def array_contains(column: Column, value: Any): Column = withExpr { | |||
ArrayContains(column.expr, Literal(value)) | |||
value match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chongguang I mean, use just ArrayContains(column.expr, lit(value).expr)
instead of value match { ...
. The lit()
should handle literals and columns well.
Test build #92126 has finished for PR 21581 at commit
|
@ueshin Done! :) But tests fail... can you launch the tests again please? |
you could just push the newer changes. that will retrigger the tests. |
Test build #92136 has finished for PR 21581 at commit
|
Jenkins, retest this please. |
LGTM pending tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
Test build #92141 has finished for PR 21581 at commit
|
@chongguang I think this pr is ready to merge, so I tried, but seems like the commits in this pr aren't connected with your GitHub account. If you want to connect the merge commit to your account, could you let me know the email address connected to your account? Thanks! |
oh haha FYI that works after it's merged if @chongguang link the email into his Github profile too. I asked the same thing in databricks/spark-xml before :) |
But the emails of commits in this pr seem not valid, just for the local computer. |
Oh, I see. |
Hey @ueshin I just updated the email address linked to my github account, it is now lcg31439@gmail.com Thanks |
Merged to master. |
What changes were proposed in this pull request?
For the function
def array_contains(column: Column, value: Any): Column
, if we pass thevalue
parameter as a Column type, it will yield a runtime exception.This PR proposes a pattern matching to detect if
value
is of type Column. If yes, it will use the .expr of the column, otherwise it will work as it used to.Same thing for
array_position, array_remove and element_at
functionsHow was this patch tested?
Unit test modified to cover this code change.
Ping @ueshin