Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23917][SQL] Add array_max function #21024

Closed
wants to merge 10 commits into from

Conversation

mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

The PR adds the SQL function array_max. It takes an array as argument and returns the maximum value in it.

How was this patch tested?

added UTs

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89107 has finished for PR 21024 at commit 93c5b8a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCastInputTypes


:param col: name of column or expression

>>> df = spark.createDataFrame([([2, 1, 3],),([None, 10, -1],)], ['data'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick nit ,( ->, (

}

override def dataType: DataType = child.dataType match {
case ArrayType(dt, _) => dt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check if dt is orderable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check in the checkInputDataTypes method, thanks.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89114 has finished for PR 21024 at commit c8c1d03.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89110 has finished for PR 21024 at commit a296bc0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89120 has finished for PR 21024 at commit d017ccf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @ueshin

@kiszk
Copy link
Member

kiszk commented Apr 10, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89126 has finished for PR 21024 at commit e082f00.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89145 has finished for PR 21024 at commit e082f00.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

s"""
|${childGen.code}
|boolean ${ev.isNull} = true;
|$javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use MIN value for each data type instead of default value?
If we perform this operation against (-10, -100, -1000), I think that we would get -1 as a result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, isNull is used for assigning the initial value.

override def nullable: Boolean =
child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull

override def foldable: Boolean = child.foldable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same line of code is in UnaryExpression.

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89248 has finished for PR 21024 at commit a4fd616.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89252 has finished for PR 21024 at commit 8dfd263.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

retest this please

case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {

override def nullable: Boolean =
child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be true because the array might be empty?

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89261 has finished for PR 21024 at commit 8dfd263.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89271 has finished for PR 21024 at commit e739a0a.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Apr 13, 2018

Jenkins, retest this please.

@ueshin
Copy link
Member

ueshin commented Apr 13, 2018

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89311 has finished for PR 21024 at commit e739a0a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

* Returns the maximum value in the array.
*/
@ExpressionDescription(
usage = "_FUNC_(array) - Returns the maximum value in the array.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here. Document we ignore any null values.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89318 has finished for PR 21024 at commit e739a0a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Apr 13, 2018

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89326 has finished for PR 21024 at commit e739a0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89329 has finished for PR 21024 at commit 1cde795.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM. Thanks! Merged to master.

@asfgit asfgit closed this in 6931022 Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants