-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16135][SQL] Remove hashCode and euqals in ArrayBasedMapData #13847
Conversation
LGTM |
@maropu I think you also need to add these methods to Where in the spark code base do we compare two (unsafe) |
You could also just also use the approach taken in |
It seems |
aha, yes. It'd better to take the same approach in |
Yeah you are right about I would take the same approach as |
okay, I'm fixing now. |
okay, done. |
Test build #61041 has started for PR 13847 at commit |
Test build #61038 has finished for PR 13847 at commit
|
Do we need to hash all values? This could be a performance issue if Story: MLlib had some performance issues caused by |
Does the current implementation of |
The performance of |
At least, we'd be better to leave comments for that. |
// This `hashCode` computation could consume much processor time for large data. | ||
// If the computation becomes a bottleneck, we can use a light-weight logic; the first fixed bytes | ||
// are used to compute `hashCode` (See `Vector.hashCode`). | ||
// The same issue exists in `UnsafeMapData.hashCode`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue also exists for UnsafeRow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll add now.
Test build #61052 has finished for PR 13847 at commit
|
Test build #61053 has finished for PR 13847 at commit
|
Test build #3126 has finished for PR 13847 at commit
|
I think we don't need to implement We should remove the |
Thx, good direction. The current master doesn't throw any exception in an analyzer when map-typed data are passed into |
yea we should improve the type check of |
Test build #61093 has finished for PR 13847 at commit
|
I'm now checking failed tests... |
Test build #61168 has finished for PR 13847 at commit
|
@hvanhovell please check again? |
@@ -115,7 +115,7 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
new GenericArrayData(0 until length), | |||
new GenericArrayData(Seq.fill(length)(true)))) | |||
|
|||
if (!checkResult(actual, expected)) { | |||
if (!actual.zip(expected).forall { case (data, answer) => checkResult(data, answer)}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we turn map data in actual
to scala map and compare it with expected
? (also use scala map in expected
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll try to replace it.
looks pretty good, thanks for working on it! |
Test build #61173 has finished for PR 13847 at commit
|
Test build #61174 has finished for PR 13847 at commit
|
Test build #61175 has finished for PR 13847 at commit
|
Test build #61178 has finished for PR 13847 at commit
|
@@ -39,4 +39,7 @@ abstract class MapData extends Serializable { | |||
i += 1 | |||
} | |||
} | |||
|
|||
// `MapData` should not implement `equals` and `hashCode` because the type cannot be used as join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put this in the class header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
LGTM - @cloud-fan? |
Test build #61208 has finished for PR 13847 at commit
|
@@ -19,6 +19,10 @@ package org.apache.spark.sql.catalyst.util | |||
|
|||
import org.apache.spark.sql.types.DataType | |||
|
|||
/** | |||
* `MapData` should not implement `equals` and `hashCode` because the type cannot be used as join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not your fault, but I think we need to add some comment for MapData
itself, and then follows this comment as a note. We can simply say: An internal data representation for map type in Spark SQL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, how about this?
Test build #61237 has finished for PR 13847 at commit
|
retest this please |
LGTM, retest this as the last test pass is 2 days ago. |
Test build #61298 has finished for PR 13847 at commit
|
## What changes were proposed in this pull request? This pr is to remove `hashCode` and `equals` in `ArrayBasedMapData` because the type cannot be used as join keys, grouping keys, or in equality tests. ## How was this patch tested? Add a new test suite `MapDataSuite` for comparison tests. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #13847 from maropu/UnsafeMapTest. (cherry picked from commit 3e4e868) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.0! |
What changes were proposed in this pull request?
This pr is to remove
hashCode
andequals
inArrayBasedMapData
because the type cannot be used as join keys, grouping keys, or in equality tests.How was this patch tested?
Add a new test suite
MapDataSuite
for comparison tests.