-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Support metadata at de/serialization time #3297
Conversation
@y-p nothing really, but I think that can change pickle to make this compatible in an easy way, because all objects will have a block manager pickle (rather than separately like now), should be pretty straightforward to add. Adding to HDFStore is also straightforward. That said, we again come back to the problem of propogation (which I agree no prop is prob right thing), What are the use cases of this? IOW have you actually needed something like this? |
I've never needed propgation and I don't think it fits pandas' data model well. |
Is there anything you care about for interop of this with HDFStore? |
@y-p no HDFStore of meta is trivial, see storing attributes to the group node: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore, which in fact pickles the data |
Is pickle compatible across py2-py3, at least for JSONables? strings obviously If there's another breaking change planned for pickle, let's roll it all into one - every extra variation is pain, |
Not going to break pickle at all, well only sort of. The code will be able to read all existing pickles (as there exists a compatibiliy layer), but new pickles will only be readable by that version going forward. I don't think that's a problem? (well I haven't changed anything, but this would) |
ofcourse, I meant a new pandas pickle "version", though unfortunately that's |
thats what I was saying, its not hard to add a new 'format' and keep it reasonably back compatibly, e.g.
make things a bit easier, and make it reasonably future proof (as far as pickle can be) |
that's only frame/panel, series does things differently, having to do with |
My only concern is that users would expect |
We'll be very clear in the documentation and RELEASE.rst. and .name does not propogate everywhere either. because that's not well defined. |
@y-p thats what the great unification is about, series is a sub of NDFrame (almost done) |
Note that I'm pretty sure we have to solve the "pickle problem" before Series can be made not an ndarray. |
This is an interesting idea, basically allows meta data attached to the index labels which naturally carry around anyhow. Don't have to deal with the propo issue at all. I guess I would call this a shadow index (kind of like the name attribute on an index) |
Something to think about for the future, but I want to keep this PR strictly about |
http://docs.python.org/2/library/pickle.html#pickle-python-object-serialization
|
Yes, we are aware of the pickle security issues, though that's more an issue to warn Please, try to add a smidgen of prose in the future, so It doesn't feel like |
Since this PR is only concerned with attaching metadata to a serialized object and
or some such. Although I ran across the need for this myself in the past, The difficulty I have |
Upon unserialization ( Because of the warning in the Python documentation, this functionality of Pickle is not an:
JSON is not as fast; but is there a way to not execute arbitrary python code when loading a DataFrame? |
I see, well, if that's a real concern in the context you're using pandas in, there are the following
Hope that helps. |
I've pushed #3472. I agree that some minimal effort to raise awareness among users is |
moved to #3643. |
#2485, with much reduced scope.
While collecting vbench data recently, it became painfully
obvious how useful it would be to be able to attach metadata
tags to serialized frames/series/panel.
This is a rough draft for something I'd like to see happen in 0.12.
It creates yet another version of pickle files, so much testing
needs to be done there.
There's a plan (#686, timeline unclear) for implementing a binary serialization
format for pandas, which will need to replicate this functionality if this makes it in.
@jreback, if you have something planned in this direction, I'm glad to
withdraw this PR, it's just a statement of intent to prod us into getting something
working during the 0.12 release cycle.
The design choices I'm going for right now:
if present, and are guranteed on load, only.
Example use case : store measurements as dataframes with data,
location, etc'. Then, load a mess of them back up and (generally) you may
use the metadata either as column/index labels, or just pick out a subset
based on them, sort based on them, etc'. edit: Document this very clearly.
us back to the pickle problem, might create issues if Create efficient binary storage format alternative to pickle #686 gets implemented
(difficulty in serializing arbitrary objects in metadata without using pickle), and is generally
an unknown quantity. JSON can cover a lot of mileage.
letting objects define their own way to be serialized for metadata purposes
is reinventing pickle, so no. (JSONable not yet enforced in the code).
o, meta = pd.load()
?).meta
that gets pickled and unpickled with the object.code taught me that much.