Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179

Closed
wants to merge 2 commits into from

Conversation

JoshRosen
Copy link
Contributor

This fixes code generation for queries containing ORDER BY NULL. Previously, the generated code would fail to compile.

@JoshRosen
Copy link
Contributor Author

This should block on #7176 being merged.

// Test GenerateOrdering for all common types. For each type, we construct random input rows that
// contain two columns of that type, then for pairs of randomly-generated rows we check that
// GenerateOrdering agrees with RowOrdering.
(DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is total overkill, but it's a neat example of how randomized data generation plus a list of types can be used for exploratory testing.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

StructField("b", dataType, nullable = true) :: Nil)
val toCatalyst = CatalystTypeConverters.createToCatalystConverter(rowType)
// Sort ordering is not defined for NaN, so skip any random inputs that contain it:
def isIncomparable(v: Any): Boolean = v match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on this, I discovered that RowOrdering and GenerateOrdering disagree for inputs containing NaN. This isn't a bug per-se, since many systems have undefined behavior when sorting on NaN. For this reason, I think that some databases prohibit NaN and Infinity from being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we might use sorting for clustering as part of a sort-based distinct operator, I wonder whether this has any bad implications for performing distinct on columns that contain NaN. Should we warn about this undefined behavior somewhere in our documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that it's actually possible to crash the Sort operator with "Comparison method violates its general contract!" errors if NaNs are present in the column being sorted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36352 has started for PR 7179 at commit f9efbb5.

// contain two columns of that type, then for pairs of randomly-generated rows we check that
// GenerateOrdering agrees with RowOrdering.
(DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
test(s"GenerateOrdering with $dataType") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nesting of the loops here is slightly misleading, because we'll always report a passed test for types where we don't have a data generator. We at least test that we're able to generate code for the ordering even if we don't actually execute that code. Maybe this is an okay trade-off, but it's a concern to watch out for.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36352 has finished for PR 7179 at commit f9efbb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@@ -185,6 +185,7 @@ class CodeGenContext {
// use c1 - c2 may overflow
case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < $c2 ? -1 : 0)"
case BinaryType => s"org.apache.spark.sql.catalyst.util.TypeUtils.compareBinary($c1, $c2)"
case NullType => "0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

Can you take the testing stuff out of this pr and merge this first?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@JoshRosen
Copy link
Contributor Author

@rxin, done.

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

LGTM/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36441 has started for PR 7179 at commit 6ef49a6.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36441 has finished for PR 7179 at commit 6ef49a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36445 has started for PR 7179 at commit 6ef49a6.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36445 has finished for PR 7179 at commit 6ef49a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@rxin
Copy link
Contributor

rxin commented Jul 3, 2015

Thanks - merged.

@asfgit asfgit closed this in d983819 Jul 3, 2015
@JoshRosen JoshRosen deleted the generate-order-fixes branch July 3, 2015 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants