[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23043

adoron · 2018-11-15T08:49:12Z

GROUP BY treats -0.0 and 0.0 as different values which is unlike hive's behavior.
In addition current behavior with codegen is unpredictable (see example in JIRA ticket).

What changes were proposed in this pull request?

In Platform.putDouble/Float() checking if the value is -0.0, and if so replacing with 0.0.
This is used by UnsafeRow so it won't have -0.0 values.

How was this patch tested?

Added tests

added tests

adoron · 2018-11-15T09:00:53Z

@srowen @gatorsmile @cloud-fan

cloud-fan · 2018-11-15T12:21:19Z

This only works for attribute, not literal or intermedia result. Is there a better place to fix it?

kiszk · 2018-11-15T17:56:49Z

IIUC, we discussed handling +0.0 and -0.0 before in another PR.
@srowen ~~do you remember the previous discussion?~~ I found the discussion. Good to know the semantics in Hive.

kiszk · 2018-11-15T18:07:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala

@@ -56,17 +56,32 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
      val javaType = JavaCode.javaType(dataType)
      val value = CodeGenerator.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
      if (nullable) {
-        ev.copy(code =
+        var codeBlock =


nit: better to use val instead of var.

kiszk · 2018-11-15T18:07:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala

      } else {
-        ev.copy(code = code"$javaType ${ev.value} = $value;", isNull = FalseLiteral)
+        var codeBlock = code"$javaType ${ev.value} = $value;"


kiszk · 2018-11-15T18:07:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala

+
+  private def genReplaceMinusZeroWithZeroCode(javaType: String, value: String): Block = {
+    val code = s"\nif ($value == -0.0%c) $value = 0.0%c;"
+    var formattedCode = ""


srowen · 2018-11-15T18:09:17Z

@kiszk This spun out of https://issues.apache.org/jira/browse/SPARK-24834 and #21794 ; is that what you may be thinking of? I'm not aware of others.

cloud-fan · 2018-11-16T16:44:38Z

Before rushing to a fix that replaces -0.0 to 0.0, I'd like to know how this bug happens.

One possible reason might be, 0.0 and -0.0 have different binary format. Spark use unsafe API to write float/double, maybe we can investigate that first.

srowen · 2018-11-16T16:47:36Z

They do, FWIW:

scala> java.lang.Double.doubleToLongBits(0.0)
res1: Long = 0

scala> java.lang.Double.doubleToLongBits(-0.0)
res2: Long = -9223372036854775808

cloud-fan · 2018-11-16T17:11:53Z

Looking at UnsafeRow.putFloat, it normalizes the value of Float.NaN. I think we should do the same there for -0.0, and other related places (check how we handle Float.NaN).

adoron · 2018-11-16T18:23:19Z

@cloud-fan that's what I thought as well at first, but the flow doesn't go through that code -
running Seq(0.0d, 0.0d, -0.0d).toDF("i").groupBy("i").count().collect() and adding a breakpoint.

The reason for -0.0 and 0.0 being put in different buckets of "group by" is in UnsafeFixedWidthAggregationMap::getAggregationBufferFromUnsafeRow():

public UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key) {
    return getAggregationBufferFromUnsafeRow(key, key.hashCode());
}

The hashing is done on the UnsafeRow, and by this point the whole row is hashed as a unit and it's hard to find the double columns and their value.

sabanas · 2018-11-16T21:35:40Z

@adoron indeed this doesn't pass through setFloat, but all values go through -
InternalRow.scala#L42

which goes through -
rows.scala#L209

so using such code for example solves your example -

  override def update(i: Int, value: Any): Unit = {
    val ignoreMinusZeroValue = value match {
      case v: Double => if (v == 0d) 0d else value
      case v: Float => if (v == 0f) 0f else value
      case _ => value
    }
    values(i) = ignoreMinusZeroValue
  }

not sure if that holds for other cases mentioned in this PR though.

cloud-fan · 2018-11-17T03:27:25Z

UnsafeRow.set is not the only place to write float/double as binary data, can you check other places like UnsafeWriter?

adoron · 2018-11-17T17:24:23Z

@cloud-fan changing writeDouble/writeFloat in UnsafeWriter indeed do the trick!
I'll fix the PR. I was thinking about making the change in Platform::putDouble so all accesses get affected, in UnsafeRow and UnsafeWriter as well.

cloud-fan · 2018-11-19T01:44:24Z

common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java

@@ -120,6 +120,9 @@ public static float getFloat(Object object, long offset) {
  }

  public static void putFloat(Object object, long offset, float value) {
+    if(value == -0.0f) {


I'm fine to put this trick here, shall we also move the IsNaN logic to here as well?

cloud-fan · 2018-11-19T01:46:01Z

common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java

+    byte[] floatBytes = new byte[Float.BYTES];
+    Platform.putDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET, -0.0d);
+    Platform.putFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET, -0.0f);
+    Assert.assertEquals(0, Double.compare(0.0d, ByteBuffer.wrap(doubleBytes).getDouble()));


are you sure this test fails before the fix? IIUC 0.0 == -0.0 is ture, but they have different binary format

BTW thanks for adding the unit test! It's a good complementary to the end-to-end test.

yeah, it fails. Indeed 0.0 == -0.0 so I'm using Double.compare == 0 to test this.

cloud-fan · 2018-11-19T01:50:36Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala

+    def assertResult[T](result: Array[Row], zero: T)(implicit ordering: Ordering[T]): Unit = {
+      assert(result.length == 1)
+      // using compare since 0.0 == -0.0 is true
+      assert(ordering.compare(result(0).getAs[T](0), zero) == 0)


Instead of checking the result, I prefer the code snippet in the JIRA ticket, which is more obvious about where is the problem.

Let's run a group-by query, with both 0.0 and -0.0 in the input. Then we check the number of result rows, as ideally 0.0 and -0.0 is same, so we should only have one group(one result row).

I'm not sure I follow, below this I'm constructing Seqs with 0 and -0 like in the JIRA and in the assertResult helper I'm checking that there's only 1 line like you said.
Do you mean the check that the key is indeed 0.0 and not -0.0 is redundant?

ah sorry I misread the code.

kiszk · 2018-11-19T02:00:12Z

@srowen #21794 is what I thought.

kiszk · 2018-11-19T02:15:18Z

Is it better to update this PR title now?

kiszk · 2018-11-19T05:23:12Z

Do we need to consider GenerateSafeProjection, too? In other words, if the generated code or runtime does not use data in Unsafe, this +0.0/-0.0 problem may still exist.
Am I correct?

adoron · 2018-11-19T17:18:30Z

@kiszk is there a use case where the preliminary RDD isn't created with UnsafeRows? If not then the data will already be corrected on reading.

Anyway, looking at all different implementations of InternalRow.setDouble I found the following places that aren't handled (though I'm not sure there's a use case where a -0.0 can get there after the fix):

OnHeapColumnVector.putDouble
MutableDouble.update
GenericInternalRow.update
SpecificInternalRow.setDouble

cloud-fan · 2018-11-20T02:02:20Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala

+    val doublesBoxed =
+      groupByCollect(Seq(Double.box(0.0d), Double.box(0.0d), Double.box(-0.0d)).toDF(colName))
+    val floats =
+      groupByCollect(Seq(0.0f, -0.0f, 0.0f).toDF(colName))


why we have to turn off whole-stage-codegen?

looks like leftovers from a different solution. Also there's no need to test the boxed version now that it's not in the codegen. I'll simplify the test.

cloud-fan · 2018-11-21T02:37:53Z

ok to test

cloud-fan · 2018-11-21T02:39:10Z

common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java

+    Platform.putDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET, -0.0d);
+    Platform.putFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET, -0.0f);
+    Assert.assertEquals(0, Double.compare(0.0d, ByteBuffer.wrap(doubleBytes).getDouble()));
+    Assert.assertEquals(0, Float.compare(0.0f, ByteBuffer.wrap(floatBytes).getFloat()));


can we use Platform.getFloat to read the value back? to match how we write it.

and would be better to directly check that, the binary of 0.0 and -0.0 are same.

SparkQA · 2018-11-22T17:47:35Z

Test build #99184 has finished for PR 23043 at commit 03408d3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…/Float GROUP BY treats -0.0 and 0.0 as different values which is unlike hive's behavior. In addition current behavior with codegen is unpredictable (see example in JIRA ticket). ## What changes were proposed in this pull request? In Platform.putDouble/Float() checking if the value is -0.0, and if so replacing with 0.0. This is used by UnsafeRow so it won't have -0.0 values. ## How was this patch tested? Added tests Closes #23043 from adoron/adoron-spark-26021-replace-minus-zero-with-zero. Authored-by: Alon Doron <adoron@palantir.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0ec7b99) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2018-11-23T00:56:09Z

thanks, merging to master/2.4!

## What changes were proposed in this pull request? A followup of #23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes #23239 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

A followup of apache#23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. existing tests Closes apache#23239 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…feWriter backport #23239 to 2.4 --------- ## What changes were proposed in this pull request? A followup of #23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes #23265 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

cloud-fan · 2018-12-26T18:24:08Z

Given the behavior change, I think we should revert it from branch 2.4.

Although I have a different fix without behavior change, that's a little risky to backport.

…tDouble/Float This PR reverts #23043 and its followup #23265, from branch 2.4, because it has behavior changes. existing tests Closes #23389 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

## What changes were proposed in this pull request? In #23043 , we introduced a behavior change: Spark users are not able to distinguish 0.0 and -0.0 anymore. This PR proposes an alternative fix to the original bug, to retain the difference between 0.0 and -0.0 inside Spark. The idea is, we can rewrite the window partition key, join key and grouping key during logical phase, to normalize the special floating numbers. Thus only operators care about special floating numbers need to pay the perf overhead, and end users can distinguish -0.0. ## How was this patch tested? existing test Closes #23388 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…/Float GROUP BY treats -0.0 and 0.0 as different values which is unlike hive's behavior. In addition current behavior with codegen is unpredictable (see example in JIRA ticket). ## What changes were proposed in this pull request? In Platform.putDouble/Float() checking if the value is -0.0, and if so replacing with 0.0. This is used by UnsafeRow so it won't have -0.0 values. ## How was this patch tested? Added tests Closes apache#23043 from adoron/adoron-spark-26021-replace-minus-zero-with-zero. Authored-by: Alon Doron <adoron@palantir.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

## What changes were proposed in this pull request? a followup of apache#23043 . Add a test to show the minor behavior change introduced by apache#23043 , and add migration guide. ## How was this patch tested? a new test Closes apache#23141 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

## What changes were proposed in this pull request? A followup of apache#23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes apache#23239 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

## What changes were proposed in this pull request? In apache#23043 , we introduced a behavior change: Spark users are not able to distinguish 0.0 and -0.0 anymore. This PR proposes an alternative fix to the original bug, to retain the difference between 0.0 and -0.0 inside Spark. The idea is, we can rewrite the window partition key, join key and grouping key during logical phase, to normalize the special floating numbers. Thus only operators care about special floating numbers need to pay the perf overhead, and end users can distinguish -0.0. ## How was this patch tested? existing test Closes apache#23388 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…/Float GROUP BY treats -0.0 and 0.0 as different values which is unlike hive's behavior. In addition current behavior with codegen is unpredictable (see example in JIRA ticket). ## What changes were proposed in this pull request? In Platform.putDouble/Float() checking if the value is -0.0, and if so replacing with 0.0. This is used by UnsafeRow so it won't have -0.0 values. ## How was this patch tested? Added tests Closes apache#23043 from adoron/adoron-spark-26021-replace-minus-zero-with-zero. Authored-by: Alon Doron <adoron@palantir.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0ec7b99) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…feWriter backport apache#23239 to 2.4 --------- ## What changes were proposed in this pull request? A followup of apache#23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes apache#23265 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…tDouble/Float This PR reverts apache#23043 and its followup apache#23265, from branch 2.4, because it has behavior changes. existing tests Closes apache#23389 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…/Float GROUP BY treats -0.0 and 0.0 as different values which is unlike hive's behavior. In addition current behavior with codegen is unpredictable (see example in JIRA ticket). ## What changes were proposed in this pull request? In Platform.putDouble/Float() checking if the value is -0.0, and if so replacing with 0.0. This is used by UnsafeRow so it won't have -0.0 values. ## How was this patch tested? Added tests Closes apache#23043 from adoron/adoron-spark-26021-replace-minus-zero-with-zero. Authored-by: Alon Doron <adoron@palantir.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0ec7b99) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…feWriter backport apache#23239 to 2.4 --------- ## What changes were proposed in this pull request? A followup of apache#23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes apache#23265 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…tDouble/Float This PR reverts apache#23043 and its followup apache#23265, from branch 2.4, because it has behavior changes. existing tests Closes apache#23389 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…feWriter backport apache/spark#23239 to 2.4 --------- ## What changes were proposed in this pull request? A followup of apache/spark#23043 There are 4 places we need to deal with NaN and -0.0: 1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. 3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group. 4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same. The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements. Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore. Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`. ## How was this patch tested? existing tests Closes #23265 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 33460c5)

…tDouble/Float This PR reverts apache/spark#23043 and its followup apache/spark#23265, from branch 2.4, because it has behavior changes. existing tests Closes #23389 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit fa1abe2)

Alon Doron added 2 commits November 14, 2018 18:18

replace -0.0 with 0.0 in BoundAttribute

ee0ef91

added tests

minor remove var type

63b7f59

kiszk reviewed Nov 15, 2018

View reviewed changes

revert + replace -0 in Platform.setDouble/Float

f48d4ef

cloud-fan reviewed Nov 19, 2018

View reviewed changes

adoron changed the title ~~[SPARK-26021][SQL] replace minus zero with zero in UnsafeProjection~~ [SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float Nov 19, 2018

move isNan check to Platform

28bd429

cloud-fan reviewed Nov 20, 2018

View reviewed changes

simplify test

20d56eb

cloud-fan reviewed Nov 21, 2018

View reviewed changes

fix tests

03408d3

asfgit closed this in 0ec7b99 Nov 23, 2018

cloud-fan mentioned this pull request Nov 26, 2018

[SPARK-26021][SQL][followup] add test for special floating point values #23141

Closed

cloud-fan mentioned this pull request Dec 5, 2018

[SPARK-26021][SQL][followup] only deal with NaN and -0.0 in UnsafeWriter #23239

Closed

cloud-fan mentioned this pull request Dec 9, 2018

[2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter #23265

Closed

cloud-fan mentioned this pull request Dec 26, 2018

[SPARK-26448][SQL] retain the difference between 0.0 and -0.0 #23388

Closed

cloud-fan reopened this Dec 26, 2018

cloud-fan closed this Dec 26, 2018

cloud-fan mentioned this pull request Dec 27, 2018

[2.4] revert [SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23389

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23043

[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23043

adoron commented Nov 15, 2018 •

edited

Loading

adoron commented Nov 15, 2018

cloud-fan commented Nov 15, 2018

kiszk commented Nov 15, 2018 •

edited

Loading

kiszk Nov 15, 2018 •

edited

Loading

kiszk Nov 15, 2018

kiszk Nov 15, 2018

srowen commented Nov 15, 2018

cloud-fan commented Nov 16, 2018

srowen commented Nov 16, 2018

cloud-fan commented Nov 16, 2018

adoron commented Nov 16, 2018

sabanas commented Nov 16, 2018 •

edited

Loading

cloud-fan commented Nov 17, 2018

adoron commented Nov 17, 2018

cloud-fan Nov 19, 2018

cloud-fan Nov 19, 2018

cloud-fan Nov 19, 2018

adoron Nov 19, 2018

cloud-fan Nov 19, 2018

adoron Nov 19, 2018

cloud-fan Nov 20, 2018

kiszk commented Nov 19, 2018

kiszk commented Nov 19, 2018

kiszk commented Nov 19, 2018

adoron commented Nov 19, 2018 •

edited

Loading

cloud-fan Nov 20, 2018

adoron Nov 20, 2018

cloud-fan commented Nov 21, 2018

cloud-fan Nov 21, 2018

cloud-fan Nov 21, 2018

SparkQA commented Nov 22, 2018

cloud-fan commented Nov 23, 2018

cloud-fan commented Dec 26, 2018 •

edited

Loading

[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23043

[SPARK-26021][SQL] replace minus zero with zero in Platform.putDouble/Float #23043

Conversation

adoron commented Nov 15, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

adoron commented Nov 15, 2018

cloud-fan commented Nov 15, 2018

kiszk commented Nov 15, 2018 • edited Loading

kiszk Nov 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srowen commented Nov 15, 2018

cloud-fan commented Nov 16, 2018

srowen commented Nov 16, 2018

cloud-fan commented Nov 16, 2018

adoron commented Nov 16, 2018

sabanas commented Nov 16, 2018 • edited Loading

cloud-fan commented Nov 17, 2018

adoron commented Nov 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiszk commented Nov 19, 2018

kiszk commented Nov 19, 2018

kiszk commented Nov 19, 2018

adoron commented Nov 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Nov 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 22, 2018

cloud-fan commented Nov 23, 2018

cloud-fan commented Dec 26, 2018 • edited Loading

adoron commented Nov 15, 2018 •

edited

Loading

kiszk commented Nov 15, 2018 •

edited

Loading

kiszk Nov 15, 2018 •

edited

Loading

sabanas commented Nov 16, 2018 •

edited

Loading

adoron commented Nov 19, 2018 •

edited

Loading

cloud-fan commented Dec 26, 2018 •

edited

Loading