-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Serialization recommendation is deprecated #39956
Comments
no just need to update to the renamed pyarrow format |
I may not understand the situation fully -- what is renamed? The deprecation message in the pyarrow docs linked above recommends pickle for non-pyarrow objects. One can convert a dataframe to/from pyarrow table, but it may not be fully compatible. |
@chrisroat thanks for the report! We should indeed have updated our docs after pyarrow deprecated the serialization functionality. The most appropriate alternative will depend on your exact use case, but in general I think we can indeed refer users to use pickle instead. |
removing 1.3 milestone. |
It's important to fix this, as our docs are simply pointing to a (soon) no-longer existing alternative. Since this is arrow-related, will look into it one of the next days (not crucial for the RC of course) |
Opened a PR for this at #41899 |
Does anyone have any information as to why this was deprecated? |
Using Pickle5 (as suggested) doesn't seem to have the same performance as PyArrow's deprecated Serialization method. Is there ANY proper replacements for |
@Neltherion can you show some example code that illustrates the performance difference? That might help finding out the reason / how this can be improved. |
@jorisvandenbossche Here's a simplified code that compares the difference between PyArrow & Pickle when Serializing/Deserializing:
The outputs on my system are:
|
@jorisvandenbossche Did the example help? |
@Neltherion I answered at apache/arrow#11239 |
Location of the documentation
https://pandas.pydata.org/pandas-docs/dev/user_guide/io.html#io-msgpack
Documentation problem
Since the deprecation of msgpack for on-the-wire transmission, it is recommended to use pyarrow serialization/deserialization. However, since pyarrow 2.0, this has been deprecated for arbitrary objects. A deprecation message is emitted when using the documented code snippet.
Suggested fix for documentation
Would pickle be next in line for a recommended on-the-wire format?
The text was updated successfully, but these errors were encountered: