-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData #37807
Conversation
@@ -369,6 +369,9 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression { | |||
val valueHashCode = value match { | |||
case null => 0 | |||
case binary: Array[Byte] => util.Arrays.hashCode(binary) | |||
// SPARK-40315: Literals of ArrayBasedMapData should have deterministic hashCode. | |||
case arrayBasedMapData: ArrayBasedMapData => | |||
arrayBasedMapData.keyArray.hashCode() * 37 + arrayBasedMapData.valueArray.hashCode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simply re-adding the old hashCode function that was removed in: #13847
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I didn't realize that we already handled ArrayBasedMapData
in Literal.equals
### What changes were proposed in this pull request? There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values). In this PR, we add a `hashCode` function so that it works exactly as we expect. ### Why are the changes needed? This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A simple unit test was added. Closes #37807 from c27kwan/SPARK-40315-lit. Authored-by: Carmen Kwan <carmen.kwan@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e85a4ff) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values). In this PR, we add a `hashCode` function so that it works exactly as we expect. ### Why are the changes needed? This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A simple unit test was added. Closes #37807 from c27kwan/SPARK-40315-lit. Authored-by: Carmen Kwan <carmen.kwan@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e85a4ff) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values). In this PR, we add a `hashCode` function so that it works exactly as we expect. ### Why are the changes needed? This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A simple unit test was added. Closes #37807 from c27kwan/SPARK-40315-lit. Authored-by: Carmen Kwan <carmen.kwan@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e85a4ff) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/3.3/3.2/3.1! |
### What changes were proposed in this pull request? There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values). In this PR, we add a `hashCode` function so that it works exactly as we expect. ### Why are the changes needed? This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A simple unit test was added. Closes apache#37807 from c27kwan/SPARK-40315-lit. Authored-by: Carmen Kwan <carmen.kwan@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e85a4ff) Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit c7cc0ae)
What changes were proposed in this pull request?
There is no explicit
hashCode()
function override forArrayBasedMapData
. As a result, there is a non-deterministic error where thehashCode()
computed forLiteral
s ofArrayBasedMapData
can be different for two equal objects (Literal
s ofArrayBasedMapData
with equal keys and values).In this PR, we add a
hashCode
function so that it works exactly as we expect.Why are the changes needed?
This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the
hashCode
method instead of relying on defaults. We can't add thehashCode
directly toArrayBasedMapData
because of SPARK-9415.Does this PR introduce any user-facing change?
No
How was this patch tested?
A simple unit test was added.