Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix serialization/deserialization of weakrefs #61

Merged
merged 6 commits into from
Dec 24, 2020
Merged

Conversation

tlambert03
Copy link
Owner

fixes #60 and pickling of OME objects with weakrefs by adding __getstate__ and __setstate__ methods that de-reference re-reference the weakrefs.

I added tests, and most of them are passing, but a handful of them with xml annotations are still failing with a very weird message:

_pickle.PicklingError: Can't pickle <class 'xml.etree.ElementTree.Element'>: it's not the same object as xml.etree.ElementTree.Element

furthermore, I don't get that error when I do the test manually in ipython. @jmuhlich, would love to hear if you have any thoughts on what might be happening there.

@jmuhlich
Copy link
Collaborator

Oh man I waged epic battle with that xml.etree.ElementTree.Element identity issue myself. I discovered that due to how xml.etree.ElementTree is implemented in C, every time you import classes from it in a different module of your own, you get distinct class objects. Instances of one such distinct class are not considered instances of any other instance. This means that when you create e.g. an Element object in one module, then pass it to a function in another module which calls isinstance(elt, Element), it will return false. The way I dealt with it was to ensure our code only imports xml.etree.ElementTree in one module, then all our other modules import it from there. This isn't an issue in the actual module code since all XML handling is in one module, but it does impact the tests:

# Import ElementTree from one central module to avoid problems passing Elements around,
from ome_types.schema import ElementTree # isort: skip

The PicklingError is almost certainly a manifestation of this issue. I'm not sure if there's a way to influence how pickle obtains the class object to use when unpickling, and if there isn't one we'll need to re-serialize XMLAnnotation elements as XML strings for pickling. I agree that's not great, but at least it's not the entire XML document. XMLAnnotations seem pretty rare in the wild from what I've seen anyway.

@jmuhlich
Copy link
Collaborator

Even once we fix the ElementTree identity issue, there will still be an issue with the equality testing in test_serialization. Element objects don't define a custom __eq__ method so even if you parse the same XML string twice, the resulting Element objects will not compare equal. We might need to implement a custom __eq__ method on XMLAnnotation elements that re-serializes the Element and compares that.

@jmuhlich
Copy link
Collaborator

I think any direct pickling of Element objects is doomed. There seems to always be some sequence of imports of our code and xml.etree that leads to the PicklingError, and in the face of user code doing its own imports there's nothing we can do. It looks we'll need to use the XML string representation here for pickling, and due to Element's lack of a custom __eq__ we'll probably want to use the XML string for equality comparison as well (in a canonicalized form). Here are the two approaches I see right away:

1. On-demand XML serialization in XMLAnnotation.__getstate__ and __eq__, and XML re-parsing in __setstate__.
This preserves our current XMLAnnotation model where .value is an Element. Performance of OME-XML parsing and generation is unchanged. Performance of code that interrogates the XML structure of XMLAnnotations also remains unchanged. Pickling/unpickling and __eq__ will incur XML serialization/parsing overhead every time.

2. Change the XMLAnnotation model so value holds an ahead-of-time canonicalized XML-serialized string.
This will make pickling and equality just trivially work without any custom __getstate__ or __eq__ implementations since the XML data is all in a plain string, and they will be fast. We can still provide a @property that builds an Element on demand at the cost of the XML re-parsing. The performance of OME-XML parsing in the presence of XMLAnnotations will be affected, since each XMLAnnotation will need to be re-serialized on the way in rather than just taking a direct reference to the already parsed Element object. OME-XML generation will be slower too because we'll need to re-parse on the way out. Accessing the Element will also be slowed down by the on-demand re-parse.

I'm not sure how much real-world performance is actually on the table here. It depends on 1) How many XMLAnnotations are present in a given workload of images and how complex their XML is, and 2) How much pickling/unpickling is done. Maybe the clarity of the code is more important... (My assumptions are that pickling/unpickling would occur only once per image in typical use cases, like sending images to worker processes, and model equality testing will be very rare. I think poking around in the XMLAnnotation Element content will be rare too, but I'm less certain of that.)

Sorry this has gotten pretty long. You guys let me know what you think, and if you can come up with any other solutions.

@tlambert03
Copy link
Owner Author

tlambert03 commented Dec 23, 2020

This is awesome, thanks for all the sleuthing @jmuhlich! I guess I favor on-demand serialization in option number 1: putting the performance penalty (however small/large) on the pickling and equality checking. Feels like that's the less common operation, and the code would be a bit more sandboxed... You agree?

I added benchmarking in #63 to help tell whether any of this really matters...

@tlambert03
Copy link
Owner Author

updated this PR with overrides for __getstate__, __setstate__ and __eq__ on XMLAnnotation. Seems to be working with all roundtrip serialization tests passing now. @jmuhlich, if you have a moment to take a quick look at the approach before merging, that'd be great.

thanks @JacksonMaxfield for raising this issue. this is definitely a useful thing to be testing.

@evamaxfield
Copy link
Contributor

I just want to chime in and say this:

The way I dealt with it was to ensure our code only imports xml.etree.ElementTree in one module, then all our other modules import it from there.

Is WILD.

@tlambert03
Copy link
Owner Author

agreed... that's A) some nonsense going on there in etree and B) incredible detective work!

@jmuhlich
Copy link
Collaborator

I agree it's wild, and it never sat right with me... You encouraged me to create a minimum reproducible example and figure out exactly what's going on! Turns out the culprit isn't xml.etree, it's xmlschema! I think this code is the issue but I'm still digging:
https://github.com/sissaschool/xmlschema/blob/master/xmlschema/etree.py#L24-L59
If we can figure out how to keep the ElementTree imports from getting messed up (maybe we just have to move our ElementTree import after our xmlschema import?) we might be able to avoid messing with XMLAnnotation pickling entirely! The custom __eq__ will be required regardless, though.

@tlambert03
Copy link
Owner Author

wow...

@jmuhlich
Copy link
Collaborator

I have convinced myself there is definitely a bug in that module namespace manipulation code in xmlschema. I submitted a two-line fix at sissaschool/xmlschema#220 . From the pace that the developers there are closing issues and cutting new releases, I think there's a good chance this could make it into a published release quickly.

@jmuhlich
Copy link
Collaborator

jmuhlich commented Dec 23, 2020

In my local working directory, I removed the new XMLAnnotation __getstate__ and __setstate__ methods and changed __eq__ to use ElementTree.tostring, and that passes on py38 using my xmlschema fix. :) I guess we can move forward with this PR as it stands and then roll back the pickle part once xmlschema releases my fix?

Now that we're down to it, I'm not sure resolving the weakrefs is the best way to handle pickling the References. I'm a little nervous about what happens when fragments of a model are pickled and not the top-level OME object. What about removing the ref_ entirely from the pickled state, and re-running the code in OME.__post_init_post_parse__ in OME.__getstate__ to rebuild the weakrefs from the ID strings? Maybe the logic we have in __post_init_post_parse__ should be moved to a method that can be called more easily from elsewhere. (Calling __post_init_post_parse__ explicitly feels wrong)

@jmuhlich
Copy link
Collaborator

That proposed change means people could still pickle a fragment of a model, but any refs will be broken on unpickle. I think this would prevent the kinds of misuse of pickled model objects that I'd expect.

@tlambert03
Copy link
Owner Author

tlambert03 commented Dec 24, 2020

awesome. thanks for the PR to xmlschema. will be cool if that fixes other things too

What about removing the ref_ entirely from the pickled state, and re-running the code in OME.__post_init_post_parse__ in OME.__getstate__ to rebuild the weakrefs from the ID strings

I'm happy to try that out before we commit to the approach in this PR. will give it a go tomorrow.

@tlambert03
Copy link
Owner Author

looks like your fix is already merged at xmlschema. I'm going to go ahead and merge this anyway so @JacksonMaxfield can at least use our master branch, and we can look at bumping our xmlschema requirement and cleaning it up in a later PR... perhaps before cutting a new release.

@tlambert03
Copy link
Owner Author

forgot to add... I made the change you suggested in #61 (comment) (removing refs on getstate and rebuilding them in OME.__setstate__). works well, and I like it better

@tlambert03 tlambert03 merged commit 8528da0 into master Dec 24, 2020
@tlambert03 tlambert03 deleted the fix-serialization branch December 24, 2020 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove or replace call to weakref / support serdes
3 participants