-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make ordered serialization stable across compilations #1664
Conversation
Seems like a worthwhile fix to me. 👍 |
* Note: because of an implementation detail, immutable Scala sets | ||
* preserve the insertion ordering for up to four elements (see `Set1`, | ||
* `Set2`, etc). This method maintains the same behavior for backward | ||
* compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we need this? what are we being compatible with? I'd rather we did the same thing regardless of set size. What if the next version of scala adds more or less special Set implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it might be simpler to always sort
Ideally, we'd probably sort these by something like thrift field ID, but I guess it doesn't really matter, I don't think OrderedSerialization makes any promises about schema evolution right? |
@isnotinvain I'm not sure, looking at the code I don't see why this could be a problem to the user who reported the issue since it's stable for a single compilation. Maybe there's a path somewhere that reuses serialized data from previous executions/versions? I haven't heard back from the user on this, and the problem was affecting only an integration test, so maybe we can drop the backward compatibility if we consider that OrderedSerialization shouldn't support this scenario? cc/ @johnynek |
I think the use case here is for ephemeral serialization (between tasks deployed from the same jar). I don't think we have contemplated this as a replacement for any long term storage. |
👍 |
* ordered serialization - fix `knownDirectSubclasses` indeterminism * remove special handling for hierarchies with <= 4 elements
Problem
Ordered serialization for thrift unions is not stable across scalac executions. This issue happens because
knownDirectSubclasses
returns an unorderedSet
and theType
concrete classes don't implement stablehashCode
s.Solution
Sort the result of the
knownDirectSubclasses
method by the type name.Notes
knownDirectSubclasses
is actually stable across compilations for hierarchies with up to four elements becauseSet
preserves the insertion order if the set has <= 4 elements. It's a side effect of the performance optimization that usesSet1
,Set2
, etc. for small sets. I've kept this behavior, but I'm not sure if it's necessary and would like to hear your thoughts on it.This issue doesn't seem to introduce runtime issues since I think users typically compile once and then execute the job, but it seems worth fixing it to avoid confusion.
It's hard to test this fix since it depends on multiple scalac executions. We could try to invoke the compiler programmatically but I'm not sure it's worth it.