-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40315][SQL] Add equals() and hashCode() to ArrayBasedMapData #37771
Conversation
override def hashCode(): Int = { | ||
val seed = MurmurHash3.productSeed | ||
val keyHash = scala.util.hashing.MurmurHash3.mix(seed, keyArray.hashCode()) | ||
val valueHash = scala.util.hashing.MurmurHash3.mix(keyHash, valueArray.hashCode()) | ||
scala.util.hashing.MurmurHash3.finalizeHash(valueHash, 2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cloud-fan , can you take a look? Do you think we need to also add an explicit hashCode()
override to ArrayData
(used by keyArray
and valueArray
)?
override def equals(obj: Any): Boolean = { | ||
if (obj == null && this == null) { | ||
return true | ||
} | ||
|
||
if (obj == null || !obj.isInstanceOf[ArrayBasedMapData]) { | ||
return false | ||
} | ||
|
||
val other = obj.asInstanceOf[ArrayBasedMapData] | ||
|
||
keyArray.equals(other.keyArray) && valueArray.equals(other.valueArray) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this part is also causing ComplexDataSuite
to fail the "inequality test for MapData" test case, where we use the triple inequality check for object equality. That should be calling equalsTo
? This PR's implementation of equals
follows the expectation of Literal
. Can you comment on the expected behaviour for equals()
, @cloud-fan ? Is it supposed to be object equality or element equality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally equals
should do element equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we do it in other code, but there is Arrays.deepEquals for element-wise equality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ArrayBasedMapData
equals
check in Literal
s does a ==
check between the keyArray
and the valueArray
: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L377-L390
To be consistent, I'll change it to use ==
@@ -35,6 +37,29 @@ class ArrayBasedMapData(val keyArray: ArrayData, val valueArray: ArrayData) exte | |||
override def toString: String = { | |||
s"keys: $keyArray, values: $valueArray" | |||
} | |||
|
|||
override def equals(obj: Any): Boolean = { | |||
if (obj == null && this == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can this
be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think IDEA can generate equals and hashCode implementations pretty well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this
can't be null. I'll remove this check.
The IDEA defaults in this case are not super helpful because they just defer to the parent class super.equals(obj)
.
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilderSuite.scala
Outdated
Show resolved
Hide resolved
Can one of the admins verify this patch? |
I've been going through old PRs and see that the We've also chosen to honour this removal by adding the checks in other places like I also saw a newer PR (not merged) attempting to add these methods back in: #32552 Is this going to break an invariant if we now provide an equals and hashCode function only for ArrayBasedMapData? I'm going to make another variant of this PR and fix the |
I made a new PR that I think works better because it has a more narrow area of impact: #3780 I'm going to close this PR because the proposed solution in this PR isn't as good. |
There are 2 things:
In long term, we need to support equality check of map values, so that we can use map type as join/group keys. In short term, we need to correctly implement I'm +1 to this idea: |
@c27kwan did you put the wrong link of your new PR? |
Oops, here's the correct PR link: #37807 |
What changes were proposed in this pull request?
There is no explicit
hashCode()
function override for theArrayBasedMapData
LogicalPlan. As a result, there is a non-deterministic error where thehashCode()
computed forArrayBasedMapData
can be different for two equal objects (objects with equal keys and values).In this PR, we override the
hashCode
function so that it works exactly as we expect. We also have an explicitequals()
function for consistency with howLiteral
s check for equality ofArrayBasedMapData
.Why are the changes needed?
This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the
hashCode
andequals
methods instead of relying on defaults.Does this PR introduce any user-facing change?
No
How was this patch tested?
A simple unit test was added.