Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Recent breaking changes in new dill releases #589

Open
tvalentyn opened this issue Apr 12, 2023 · 10 comments
Open

[Discuss] Recent breaking changes in new dill releases #589

tvalentyn opened this issue Apr 12, 2023 · 10 comments
Labels

Comments

@tvalentyn
Copy link

In the project I work on (apache-beam), dill is used both on client-side and service side. Since pickled bytes created by newer version of dill cannot be unpickled by an older version of dill, we require a very narrow version band in our requirements. Practically speaking, we have been using dill==0.3.1.1 for a while.

Recently we received many requests from users asking us to upgrade to newer versions of dill (example: apache/beam#22893). We also had to update dill version to support Python 3.11.

It appears that updating to the newer version of dill risks breaking many of our customers. I have visibility into a significant user-base who use Beam at my company and I see that changing dill from 0.3.1.1 to 0.3.6 dill breaks many users, with various failure modes. Multiple breaking changes have also been reported on dill's issue tracker. It is understandable that sometimes bugs happen and are fixed in follow up releases, however it seems that there were some large non-trivial changes that changed what is being pickled and how. For example, we noticed that some our tests suites dill invocations started to pickle unpickleable pytest classes from main session (unnecessary for that particular test and a new behavior).

The impact of the breaking changes is not immediately obvious without diving into the details of dill implementation. The release notes (https://github.com/uqfoundation/dill/releases) don't call out breaking changes and/or mitigations for the breaking changes that users can take. It doesn't seem that new behavior may be reverted in a followup releases.

My assumptions that none of the breakages are intentional and the intent behind the changes is to pickle more and pickle better. However, if giving advantages to some group users also breaks another group of users, it would be better to enable new behaviors via opt-in or at least provide an opt-out option.

My questions to dill maintainers:

  • Can substantially different pickling behaviors be enabled/disabled by dill settings?
  • Can breaking changes and mitigation options be called out in release notes when breaking changes are unavoidable or become known after the release?
  • Any other things we can do to minimize the impact of braking changes?
@mmckerns
Copy link
Member

mmckerns commented Apr 13, 2023

I try to ensure full backward compatibility, but sometimes (1) python forces a breakage or (2) I don't catch a subtle omission someone (could be me) makes which causes a breakage. Pickling is a nasty, messy, nasty and messy business. Functions by default try to pickle some version of the global namespace, so that in itself is already a mess. As helper functions in dill that are used to pickle registered objects, the global namespace for those functions is the _dill module... so that has its own complexity. Anyway, you get the idea.

In short:

  • Can substantially different pickling behaviors be enabled/disabled by dill settings?

Yes

  • Can breaking changes and mitigation options be called out in release notes when breaking changes are unavoidable or become known after the release?

This is not done now, but it could be. Breaking changes are generally found after the release. Maybe a pre-release would be a good path forward.

  • Any other things we can do to minimize the impact of braking changes?

Recently, dill added the concept of a shim... which gives infrastructure to load old pickles across breaking changes. So, to go from 0.3.1.1 to the current version, you should be able to add shims to get you across the breaking changes. It might be good to have easy user function access to a shim_registry (like register with the pickle_registry). @anivegesana

https://github.com/uqfoundation/dill/blob/master/dill/_shims.py

@anivegesana
Copy link
Contributor

anivegesana commented Apr 13, 2023

Adding on top of that, dill is relatively popular. Because of that, people are bound to pickle objects that were not considered before and may have been pickled differently or incorrectly, sometimes causing regressions that are inevitable to fix things that would be considered bugs for most other users.

https://xkcd.com/1172/

That (but of course less extreme) is how I would classify #572. It was caused by a change in order to fix #482, but because we cannot change the behavior of how individual objects are pickled, it is impossible to create code that solves both issues. An extensible way to customize how dill pickles objects is still needed.

@tvalentyn
Copy link
Author

tvalentyn commented Apr 15, 2023

Thanks for the feedback.

. Pickling is a nasty, messy, nasty and messy business.

There is no question about it, and I sincerely appreciate your efforts on this front. My goal is to save apache-beam users from this realization as much as possible.

I try to ensure full backward compatibility

Backward compatibility in the sense of being able to unpickle old pickles on new version is not very relevant for us; the problem is when the same payload that was pickled previously is no longer able to pickle, or unpickles into something that causes unexplained behavior downstream. Are shims still relevant for this problem?

regressions that are inevitable to fix things that would be considered bugs for most other users

Well... this is the tricky part. Occasional issues are fine, but in my particular scenario, I see a large number of errors that show different failure modes on upon random inspection. The question becomes, should I dive deep to to identify understand and fix these errors, or this task is basically a non-starter because something fundamental has changed and would require weeks of work to migrate.

Can substantially different pickling behaviors be enabled/disabled by dill settings?

Yes

This would be great. FWIW, on one of the issues there was a suggestion to use recurse=True, which wasn't sufficient (and led to more failures).

@mmckerns
Copy link
Member

@tvalentyn: do dill._shims provide a solution for you?

@tvalentyn
Copy link
Author

Recently, dill added the concept of a shim... which gives infrastructure to load old pickles across breaking changes

Sounds like it might be addressing some other problem, our pickles are using consistent versions. The problem that pickling algorithm has somehow changed in fundamental ways.

@mmckerns
Copy link
Member

The intent of a shim is to allow older pickles to be loaded after the algorithm has changed. Essentially, if it identifies an older version of the algorithm (stored as a shim), it uses it to load the pickle as opposed to what is currently used. This can happen due to a change where python has broken backward compatibility or there was a pickling bug that was fixed. So this to me would apply in cases where the algorithm has changed (even for the same version of python).

@tvalentyn
Copy link
Author

tvalentyn commented Sep 24, 2024

I meant that we use the same version of dill. we don't need to unpickle payloads created on older version of dill, we create payloads using the same version of dill that deserializes them.

To give you a perspective, I work on a mono-repo, and upgrading dill from 0.3.1.1 to on 0.3.6 caused several hundreds of test failures inside a mono-repo, which didn't have any obvious pattern across them after preliminary inspection; the only motivation for us to upgrade dill was to add support for newer python versions. so instead of upgrading, we patched the necessary changes to support multiple python version on top of 0.3.1.1

@mmckerns
Copy link
Member

Sure, that fits within my understanding of what your usage is. What I'm saying is that the shim infrastructure was implemented to deal with any change to dill that breaks backward compatibility of old pickles stored to disk. So, given you use the same version of dill everywhere, when you upgrade to a newer version, and something changes either in dump or load, or dill internals that would cause stored pickles to no longer work... then you can create a shim with the old pattern in it, and it will be used to load the old pickle into the new definition. If I understand you correctly, this would be what you want.

@tvalentyn
Copy link
Author

We don't have stored pickles. What I want to do is use newer version of dill but use the "older" algorithm for pickling and unpickling. For example, IIRC there were changes related to pickling by value vs by reference, that likely caused some of the changes.

I agree with #589 (comment) which calls out two issues:

  1. Fixing some behaviors that in pickling that were considered "bugs" by from some standpoint resulted in failures in other scenarios that relied on the prior behavior.
  2. "An extensible way to customize how dill pickles objects is still needed."

@mmckerns
Copy link
Member

Ah, I see. I believe a smarter thing I could have done is to have all the reduction mechanisms in a separate file from the loading mechanisms and separate from the rest of the internals. That would probably insulate the pickling process a bit more to change of algorithm or internals. Pickling just grabs all sorts of references that one doesn't consider... so I'm very hesitant to make a structural change like that until there's a clear way to move these functions and etc without serious breakage.

To use older algorithms, they'd have to be untagled from other internal references. Then you could load an older algorithm to the pickle registry. So, I think best case, you'd need to register your custom reducer plus potentially other reducers that were referenced by the function or dependency chaining. That's essentially the process for extending dill that is used when adding a new object type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants