-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-11193 - Use Java ConcurrentHashMap instead of SynchronizedMap trait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver #10203
Conversation
Why don't you just replace the use of SynchronizedMap in KinesisReceiver with a ConcurrentHashMap instead? |
There are two things:
Thought ? |
That problem actually applies to all types for which Kryo provides a default ser/de. Mostly because kryo will try to deserialize to the type known during registration and this syntax |
Good point. Let me upgrade KinesisReceiver to use Java ConcurrentHashMap implementation in this PR. We will see what the others think about this. |
Yes, the simpler solution is better here. Use |
All right. I'm updating the PR. Maybe it could make sense to inform people who use Kryo that the SynchronizedMap trait is "lost". WDYT ? |
313517f
to
caa4363
Compare
PR rebased and updated to use Java ConcurrentHashMap. I removed the change on the KryoSeralizer to deal with SynchronizedMap trait. |
@@ -222,7 +222,7 @@ private[kinesis] class KinesisReceiver[T]( | |||
|
|||
/** Get the latest sequence number for the given shard that can be checkpointed through KCL */ | |||
private[kinesis] def getLatestSeqNumToCheckpoint(shardId: String): Option[String] = { | |||
shardIdToLatestStoredSeqNum.get(shardId) | |||
return Option[String]{ shardIdToLatestStoredSeqNum.get(shardId) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to just Option(shardIdToLatestStoredSeqNum.get(shardId))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Sean. Let me improve this !
Thanks !
Aside from the two instances of that comment, looks OK |
@@ -222,7 +222,7 @@ private[kinesis] class KinesisReceiver[T]( | |||
|
|||
/** Get the latest sequence number for the given shard that can be checkpointed through KCL */ | |||
private[kinesis] def getLatestSeqNumToCheckpoint(shardId: String): Option[String] = { | |||
shardIdToLatestStoredSeqNum.get(shardId) | |||
return Option(shardIdToLatestStoredSeqNum.get(shardId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop the return
keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I forgot this cleanup. Thanks !
Looks good |
@@ -124,8 +125,7 @@ private[kinesis] class KinesisReceiver[T]( | |||
private val seqNumRangesInCurrentBlock = new mutable.ArrayBuffer[SequenceNumberRange] | |||
|
|||
/** Sequence number ranges of data added to each generated block */ | |||
private val blockIdToSeqNumRanges = new mutable.HashMap[StreamBlockId, SequenceNumberRanges] | |||
with mutable.SynchronizedMap[StreamBlockId, SequenceNumberRanges] | |||
private val blockIdToSeqNumRanges = new ConcurrentHashMap[StreamBlockId, SequenceNumberRanges] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to fix this -- but I think the style convention is to use () when the invocation has a side effect and I'd argue that constructors always do. I should have said it earlier but don't know that it's worth changing as the original call didn't either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks for this reminder Sean. You are right, I used the same syntax as in the original code. Let me know if you want I change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably always a good idea to use ConcurrentHashMap
instead of the mixed-in trait. The typesafe people themselves deprecated the trait and said it's unreliable and recommended that users use java's map instead.
Test build #2190 has finished for PR 10203 at commit
|
@jbonofre can you update the title to reflect the change? possibly description too |
Test build #2194 has started for PR 10203 at commit |
Sure. You mean the PR title or also the commit comment ? |
PR title/description. The squashed commit gets a new message and the squashed commit descriptions look OK anyway. |
All right, let me do it. |
9cad42d
to
67aa4e6
Compare
retest this please. The changes here LGTM |
Thanks @andrewor14 (again ;)). Let me retest it. |
…dMap trait in order to avoid Kryo serializer issue
67aa4e6
to
5cec007
Compare
Tests OK on my box. The Jenkins test failure doesn't look related. |
Test build #2208 has finished for PR 10203 at commit
|
Merged to master/1.6 |
…rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10203 from jbonofre/SPARK-11193. (cherry picked from commit 03138b6) Signed-off-by: Sean Owen <sowen@cloudera.com>
No description provided.