-
-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix backward compatibility with pickles before v22.2.0 #1085
Conversation
f537661
to
3cdb31d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry and thanks!
Two little quibbles, should be quick to merge.
3cdb31d
to
1b3d89b
Compare
Is there any way to add a warning when someone uses the antipattern of persisting pickles across versions? I think a lot of people don't realize that this is a profoundly bad idea. If attrs is going to enable this behavior (even on a best-effort basis), it might be a good idea to at least tell people they are doing something wrong here. |
Cc @djipko for overall assessment |
Across versions - in a general way - sounds difficult as you'd need to store the version in the pickle itself and then you may still suffer from not being able to unpickle in older versions anyway - it'd be backward incompatible. The only viable way would be to store ...but then we'd be changing the state again and we'd need to know that the tuple contained a version in the first item rather than being the tuple state from a pre-v22.2.0 version of attrs.
This is true. Others of us are saddled with these issues until we have resources to be able to undo decisions made by others in the past.
This. I think a nice big admonition in the documentation would be a good idea. A documentation search yields very little - one small warning in the slotted classes glossary entry. It may also be worth stating that some cache implementations, e.g. Django's caching framework, make use of pickle serialization by default, which could lead to compatibility issues across versions. Further, I know that whether this is considered public API is a possible grey area, but given the fragility it's probably worth mentioning backward incompatible changes to pickle serialization in the release notes in future. That linked comment did also say "So if you can make it compatible" which I would argue didn't happen, hence this PR. |
Another thought: Do we want to add a |
I think that's not great because it would warn about exactly one version change, as you wrote. Kinda a mix of enough to be annoying, but not enough to solve the problem. While I agree with Paul that it's not great to pickle across versions, with attrs it shouldn't be a problem and I bet people do it and love it and are gonna yell at us if we annoy them (just as they yelled at us about the 3.6 warning). IOW I agree, but I'm afraid it's too disruptive. I think I'm fine with the code as it is. I'd just like @djipko to give it a look to ensure we're not breaking something only he knows and then it's ready to merge. |
I mean, I think people shouldn't save pickles to disk at all, but 🤷 FWIW, this issue also has blocked me from upgrading attrs internally for now, but I don't mind when this happens so much because it's a decent way to ferret out programs that are inappropriately using pickle. There are a bunch of other ways that this sort of thing can go wrong in much more subtle ways, so I appreciate when things break a bit more loudly. Also, I understand the perspective of "people do it and love it", but I suspect that in reality people are neutral about it. They think pickle is the way you serialize stuff and don't think about it until it breaks, at which point their reactive "You broke me" emotions kick in. It's kind of a thankless job to be like, "Oops you were doing it wrong sorry pickle exists" but it's almost certainly educational. That said, in some ways I think you are a kinder person than me and not as inclined to say, "This is for your own good." when breaking private APIs, and I'm not going to lobby too hard for just letting this stuff break. I figured this point where we know that someone is relying on cross-version unpickling might be a good place to put a warning (annoying enough to show people they are doing the wrong thing but not difficult to silence or fix, and with no blast radius outside of this particular subset), but it's not a hill I'm interested in dying on (or even just getting grass stains on my knees on). |
I think the change itself is fine and something I considered doing in the first place when I proposed to move to dict. If it's breaking enough people and stopping them from upgrading, even though they are doing something that's known to be "a bad idea" (tm), on the whole, that's a bad thing. But as demonstrated in the issue I raised, using tuple is inherently unsafe - now everyone's just broken which is, in some ways better than their code doing the wrong thing (even if it's an edge case that doesn't come up as often). With this change, everyone is still vulnerable to the original issue though which is not great either. Here's what I propose if @hynek agrees - let's have an issue open for say next version (v23.0.0 or whatever) to really remove tuple then? While I don't doubt this will still break some people even then - most folks would have had enough time to re-process data if there are any stragglers. I'd be happy to keep an eye on such issue and put up a fix and land it when safe! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny code nit
I know, I know! 🙈 I'm the victim, I promise! 😛 I'm happy if you want to remove it again in a major version bump - I'd expect nothing less 😄 What this provides is a (few) version(s) where we can read the tuple and dict versions and write the dict version only. That allows for a seamless upgrade of cache values to the new pickle format - via a load-and-dump cycle of the cache - which is not currently possible. Yes, the original issue is still a problem in a sense, but only as far as reading existing pickles goes. With a load-and-dump upgrade this potential issue will go away for us (although our structure is not currently at risk of changing). |
Unfortunately we're using `django-redis` to cache some `attrs` instances and the change to make `__getstate__` and `__setstate__` use `dict` values instead of `tuple` values prevents reading any values from the cache when updating to `attrs>=22.2.0`. We get an `AttributeError` raised for all attributes. This happens because the tuple of values stored in the original pickle is checked for the name of the attribute and so none of the attributes are set on unpickle. To address this I've made a change so that `__setstate__` will still be able to handle values pickled as `tuple`s allowing existing pickles to be unpickled. The test just contains a pre-pickled value due to the complexities around tying to mock methods on a slotted as documented[^1]. I realise that documentation also mentions that we should "think twice" before using pickle, but sometimes those decisions predate our involvement in a project! In addition, pretty much all of Django's cache implementations are built around using pickle by default. This change will allow an easier path to upgrade by unpickling and repickling all values in the cache. Regression in 89a890a. [^1]: https://www.attrs.org/en/latest/glossary.html#term-slotted-classes
ac27afc
to
1139a3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Any chance of cutting a dot release with this? |
One thing to note before cutting an official release with this patch - if code was susceptible to the original issue #1004 i.e. unpickling will assign wrong data to an attribute, and someone runs the newly cut release, that reads and then saves data again, the saved pickled data will now be permanently corrupted (read as tuple which does the wrong thing, but then saved as dict). Releasing it officially will make it more likely people hit this. Now I am not saying not to release it - ultimately saving pickled data is very error prone anyway (hence this patch) - but I feel silently corrupting data is a slightly worse kind of edge case then just loudly breaking, even if it is less likely to happen. |
Why isn't that a great lose-lose scenario. ;) My impression of the situation is that people who will take advantage of this bw-fix are probably holding back their upgrades and had all the time in the world to fix it. There's only so much we can do... |
Yes, I am.
No, I don't. Storing pickles was a choice made by somebody else long before I came along. Although wrong, it worked "fine", then it got broken by a backward incompatible change.
I'm sorry, but I think this is misleading. It'd only be the case if you upgraded and reordered attributes. Having this fix released would allow for stored pickles to be replaced automatically and gradually with a "safer" version. (Yes, we'd still need to not reorder attributes until we were sure that all stored data had been upgraded.) I just get the impression that the argument being made is that we should just break so hard that we intentionally make everyone's lives difficult rather than stick to the promises of backward compatibility and issue a big fat warning. |
That's what I meant. You either held back and have the same pickles as before, or upgraded and fixed your pickles and therefore aren't affected anymore. |
Sorry, I didn't mean to come across so grumpy! Yeah, it's definitely lose-lose 🤷🏻♂️ |
I also think we should release this personally but wanted to make sure we are considering all the angles. It is indeed a lose lose. |
23.1.0 is on PyPI |
Summary
Unfortunately we're using
django-redis
to cache someattrs
instances and the change to make__getstate__
and__setstate__
usedict
values instead oftuple
values prevents reading any values from the cache when updating toattrs>=22.2.0
. We get anAttributeError
raised for all attributes.This happens because the tuple of values stored in the original pickle is checked for the name of the attribute and so none of the attributes are set on unpickle.
To address this I've made a change so that
__setstate__
will still be able to handle values pickled astuple
s allowing existing pickles to be unpickled.The test just contains a pre-pickled value due to the complexities around tying to mock methods on a slotted as documented1.
I realise that documentation also mentions that we should "think twice" before using pickle, but sometimes those decisions predate our involvement in a project! In addition, pretty much all of Django's cache implementations are built around using pickle by default. This change will allow an easier path to upgrade by unpickling and repickling all values in the cache.
Regression in 89a890a.
Pull Request Check List
Our CI fails if coverage is not 100%.
.pyi
).tests/typing_example.py
.attr/__init__.pyi
, they've also been re-imported inattrs/__init__.pyi
.docs/api.rst
by hand.@attr.s()
have to be added by hand too.versionadded
,versionchanged
, ordeprecated
directives.Find the appropriate next version in our
__init__.py
file..rst
files is written using semantic newlines.changelog.d
.Footnotes
https://www.attrs.org/en/latest/glossary.html#term-slotted-classes ↩