Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ordered serialization stable across compilations #1664

Merged
merged 2 commits into from
Apr 17, 2017
Merged

make ordered serialization stable across compilations #1664

merged 2 commits into from
Apr 17, 2017

Conversation

fwbrasil
Copy link
Contributor

@fwbrasil fwbrasil commented Apr 11, 2017

Problem

Ordered serialization for thrift unions is not stable across scalac executions. This issue happens because knownDirectSubclasses returns an unordered Set and the Type concrete classes don't implement stable hashCodes.

Solution

Sort the result of the knownDirectSubclasses method by the type name.

Notes

  • knownDirectSubclasses is actually stable across compilations for hierarchies with up to four elements because Set preserves the insertion order if the set has <= 4 elements. It's a side effect of the performance optimization that uses Set1, Set2, etc. for small sets. I've kept this behavior, but I'm not sure if it's necessary and would like to hear your thoughts on it.

  • This issue doesn't seem to introduce runtime issues since I think users typically compile once and then execute the job, but it seems worth fixing it to avoid confusion.

  • It's hard to test this fix since it depends on multiple scalac executions. We could try to invoke the compiler programmatically but I'm not sure it's worth it.

@fwbrasil fwbrasil changed the title [WIP] fix ordered serialization indeterminism make ordered serialization stable across compilations Apr 12, 2017
@johnynek
Copy link
Collaborator

Seems like a worthwhile fix to me.

👍

* Note: because of an implementation detail, immutable Scala sets
* preserve the insertion ordering for up to four elements (see `Set1`,
* `Set2`, etc). This method maintains the same behavior for backward
* compatibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we need this? what are we being compatible with? I'd rather we did the same thing regardless of set size. What if the next version of scala adds more or less special Set implementations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it might be simpler to always sort

@isnotinvain
Copy link
Contributor

Ideally, we'd probably sort these by something like thrift field ID, but I guess it doesn't really matter, I don't think OrderedSerialization makes any promises about schema evolution right?

@fwbrasil
Copy link
Contributor Author

I don't think OrderedSerialization makes any promises about schema evolution right?

@isnotinvain I'm not sure, looking at the code I don't see why this could be a problem to the user who reported the issue since it's stable for a single compilation. Maybe there's a path somewhere that reuses serialized data from previous executions/versions? I haven't heard back from the user on this, and the problem was affecting only an integration test, so maybe we can drop the backward compatibility if we consider that OrderedSerialization shouldn't support this scenario? cc/ @johnynek

@johnynek
Copy link
Collaborator

I think the use case here is for ephemeral serialization (between tasks deployed from the same jar). I don't think we have contemplated this as a replacement for any long term storage.

@johnynek
Copy link
Collaborator

👍

@piyushnarang piyushnarang merged commit e9d5d33 into twitter:develop Apr 17, 2017
ttim pushed a commit to ttim/scalding that referenced this pull request May 16, 2017
* ordered serialization - fix `knownDirectSubclasses` indeterminism

* remove special handling for hierarchies with <= 4 elements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants