[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179

JoshRosen · 2015-07-02T06:20:55Z

This fixes code generation for queries containing ORDER BY NULL. Previously, the generated code would fail to compile.

JoshRosen · 2015-07-02T06:21:09Z

This should block on #7176 being merged.

JoshRosen · 2015-07-02T06:21:43Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala

+  // Test GenerateOrdering for all common types. For each type, we construct random input rows that
+  // contain two columns of that type, then for pairs of randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>


This test is total overkill, but it's a neat example of how randomized data generation plus a list of types can be used for exploratory testing.

AmplabJenkins · 2015-07-02T06:23:11Z

Merged build triggered.

AmplabJenkins · 2015-07-02T06:23:19Z

Merged build started.

JoshRosen · 2015-07-02T06:24:01Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala

+        StructField("b", dataType, nullable = true) :: Nil)
+      val toCatalyst = CatalystTypeConverters.createToCatalystConverter(rowType)
+      // Sort ordering is not defined for NaN, so skip any random inputs that contain it:
+      def isIncomparable(v: Any): Boolean = v match {


While working on this, I discovered that RowOrdering and GenerateOrdering disagree for inputs containing NaN. This isn't a bug per-se, since many systems have undefined behavior when sorting on NaN. For this reason, I think that some databases prohibit NaN and Infinity from being used.

Given that we might use sorting for clustering as part of a sort-based distinct operator, I wonder whether this has any bad implications for performing distinct on columns that contain NaN. Should we warn about this undefined behavior somewhere in our documentation?

It turns out that it's actually possible to crash the Sort operator with "Comparison method violates its general contract!" errors if NaNs are present in the column being sorted.

SparkQA · 2015-07-02T06:24:34Z

Test build #36352 has started for PR 7179 at commit f9efbb5.

JoshRosen · 2015-07-02T06:33:54Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala

+  // contain two columns of that type, then for pairs of randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
+    test(s"GenerateOrdering with $dataType") {


The nesting of the loops here is slightly misleading, because we'll always report a passed test for types where we don't have a data generator. We at least test that we're able to generate code for the ordering even if we don't actually execute that code. Maybe this is an okay trade-off, but it's a concern to watch out for.

SparkQA · 2015-07-02T07:54:48Z

Test build #36352 has finished for PR 7179 at commit f9efbb5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-07-02T07:55:21Z

Merged build finished. Test PASSed.

rxin · 2015-07-02T22:30:25Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

@@ -185,6 +185,7 @@ class CodeGenContext {
    // use c1 - c2 may overflow
    case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < $c2 ? -1 : 0)"
    case BinaryType => s"org.apache.spark.sql.catalyst.util.TypeUtils.compareBinary($c1, $c2)"
+    case NullType => "0"


rxin · 2015-07-02T22:47:15Z

Can you take the testing stuff out of this pr and merge this first?

AmplabJenkins · 2015-07-02T22:58:11Z

Merged build triggered.

AmplabJenkins · 2015-07-02T22:58:19Z

Merged build started.

JoshRosen · 2015-07-02T22:58:53Z

@rxin, done.

rxin · 2015-07-02T22:59:48Z

LGTM/

AmplabJenkins · 2015-07-02T23:03:11Z

Merged build triggered.

AmplabJenkins · 2015-07-02T23:03:18Z

Merged build started.

SparkQA · 2015-07-02T23:04:04Z

Test build #36441 has started for PR 7179 at commit 6ef49a6.

AmplabJenkins · 2015-07-02T23:13:31Z

Merged build finished. Test FAILed.

SparkQA · 2015-07-02T23:20:56Z

Test build #36441 has finished for PR 7179 at commit 6ef49a6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-07-02T23:21:03Z

Merged build finished. Test FAILed.

rxin · 2015-07-02T23:26:47Z

Jenkins, retest this please.

AmplabJenkins · 2015-07-02T23:28:10Z

Merged build triggered.

AmplabJenkins · 2015-07-02T23:28:18Z

Merged build started.

SparkQA · 2015-07-02T23:31:34Z

Test build #36445 has started for PR 7179 at commit 6ef49a6.

SparkQA · 2015-07-03T01:06:04Z

Test build #36445 has finished for PR 7179 at commit 6ef49a6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-07-03T01:06:57Z

Merged build finished. Test PASSed.

rxin · 2015-07-03T01:07:41Z

Thanks - merged.

JoshRosen reviewed Jul 2, 2015
View reviewed changes

JoshRosen mentioned this pull request Jul 2, 2015

[SPARK-8777] [SQL] Add random data generator test utilities to Spark SQL #7176

Closed

JoshRosen reviewed Jul 2, 2015
View reviewed changes

JoshRosen mentioned this pull request Jul 2, 2015

[SPARK-8797] [SPARK-9146] [SPARK-9145] [SPARK-9147] Support NaN ordering and equality comparisons in Spark SQL #7194

Closed

rxin reviewed Jul 2, 2015
View reviewed changes

JoshRosen added 2 commits July 2, 2015 15:58

Add regression test for SPARK-8782 (ORDER BY NULL)

0036696

Fix ORDER BY NULL

6ef49a6

JoshRosen force-pushed the generate-order-fixes branch from 21325e2 to 6ef49a6 Compare July 2, 2015 22:58

asfgit closed this in d983819 Jul 3, 2015

JoshRosen deleted the generate-order-fixes branch July 3, 2015 04:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179

[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179

JoshRosen commented Jul 2, 2015

JoshRosen commented Jul 2, 2015

JoshRosen Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

JoshRosen Jul 2, 2015

JoshRosen Jul 2, 2015

JoshRosen Jul 2, 2015

rxin Jul 2, 2015

SparkQA commented Jul 2, 2015

JoshRosen Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

rxin Jul 2, 2015

davies Jul 2, 2015

rxin commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

JoshRosen commented Jul 2, 2015

rxin commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

rxin commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

SparkQA commented Jul 3, 2015

AmplabJenkins commented Jul 3, 2015

rxin commented Jul 3, 2015

[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179

[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179

Conversation

JoshRosen commented Jul 2, 2015

JoshRosen commented Jul 2, 2015

Choose a reason for hiding this comment

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 2, 2015

Choose a reason for hiding this comment

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

JoshRosen commented Jul 2, 2015

rxin commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

rxin commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

AmplabJenkins commented Jul 2, 2015

SparkQA commented Jul 2, 2015

SparkQA commented Jul 3, 2015

AmplabJenkins commented Jul 3, 2015

rxin commented Jul 3, 2015