-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23917][SQL] Add array_max function #21024
Conversation
Test build #89107 has finished for PR 21024 at commit
|
python/pyspark/sql/functions.py
Outdated
|
||
:param col: name of column or expression | ||
|
||
>>> df = spark.createDataFrame([([2, 1, 3],),([None, 10, -1],)], ['data']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick nit ,(
->, (
} | ||
|
||
override def dataType: DataType = child.dataType match { | ||
case ArrayType(dt, _) => dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check if dt
is orderable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the check in the checkInputDataTypes
method, thanks.
Test build #89114 has finished for PR 21024 at commit
|
Test build #89110 has finished for PR 21024 at commit
|
Test build #89120 has finished for PR 21024 at commit
|
cc @ueshin |
retest this please |
Test build #89126 has finished for PR 21024 at commit
|
Test build #89145 has finished for PR 21024 at commit
|
s""" | ||
|${childGen.code} | ||
|boolean ${ev.isNull} = true; | ||
|$javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use MIN
value for each data type instead of default value?
If we perform this operation against (-10, -100, -1000), I think that we would get -1
as a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, isNull
is used for assigning the initial value.
override def nullable: Boolean = | ||
child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull | ||
|
||
override def foldable: Boolean = child.foldable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same line of code is in UnaryExpression
.
Test build #89248 has finished for PR 21024 at commit
|
Test build #89252 has finished for PR 21024 at commit
|
retest this please |
case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { | ||
|
||
override def nullable: Boolean = | ||
child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always be true
because the array might be empty?
Test build #89261 has finished for PR 21024 at commit
|
Test build #89271 has finished for PR 21024 at commit
|
Jenkins, retest this please. |
LGTM pending Jenkins. |
Test build #89311 has finished for PR 21024 at commit
|
retest this please |
* Returns the maximum value in the array. | ||
*/ | ||
@ExpressionDescription( | ||
usage = "_FUNC_(array) - Returns the maximum value in the array.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here. Document we ignore any null values.
Test build #89318 has finished for PR 21024 at commit
|
Jenkins, retest this please. |
Test build #89326 has finished for PR 21024 at commit
|
Test build #89329 has finished for PR 21024 at commit
|
LGTM. Thanks! Merged to master. |
What changes were proposed in this pull request?
The PR adds the SQL function
array_max
. It takes an array as argument and returns the maximum value in it.How was this patch tested?
added UTs