Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFStore('file.ext') can't load HDF5 stores created by pytables #1985

Closed
vitteloil opened this issue Sep 28, 2012 · 12 comments
Closed

HDFStore('file.ext') can't load HDF5 stores created by pytables #1985

vitteloil opened this issue Sep 28, 2012 · 12 comments
Labels
IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@vitteloil
Copy link

Hello,

A store created outside of pandas can't be loaded into pandas with the HFDStore() method.

An error occurs : AttributeError: Attribute 'pandas_type' does not exist in node: '/'

To reproduce, create a store using pytables, and try to load it in pandas. It will raise this error.

In my opinion, if pandas is supposed to suppord HDF5, it should do so, without only supporting stores created in pandas. If nothing can be done, can you tell me what is is attribute so that I can create it when I create my store with pytables ?

Thanks !

@leroygr
Copy link

leroygr commented Oct 1, 2012

Hello,

I have the same issue. I created a PyTable using Pandas to store a Dataframe and a Panel. After closing the file, I reopen it using tables to store metadata in the file by creating a new node (named 'description') in the root.

If I want to read the pytable using HDFStore, I must specify the pandas object to retrieve (i.e., the name of the Dataframe or Panel that I stored) otherwise I get the following error:

AttributeError: Attribute 'pandas_type' does not exist in node: '/description'

I agree with enjoyaol. The HDF5 support is very nice with pandas, but we should be able to access more functionalities from PyTables. It would also be nice to add metadata to the pytables generated with Pandas, or directly to the Pandas Dataframes/Panels. For now I'm doing it manually, but it is not really an efficient way as we get an attribute error.

Regards,

Greg

@ghost
Copy link

ghost commented Dec 12, 2012

@jreback, would appreciate your opinion.

@wesm
Copy link
Member

wesm commented Dec 12, 2012

I'm not sure what's desired here. The point of HDFStore is to deal with the mungy details of storing pandas objects and all of their requisite metadata in PyTables so that they can survive the serialization roundtrip. In some sense, PyTables is merely an implementation detail for a fast, somewhat portable binary data format for pandas objects

@ghost
Copy link

ghost commented Dec 12, 2012

I read it as a question about HDF as a data source for pandas: like xls or csv.
obviously there are constraints on what can be accepted, but what's the state of
support for creating HDF files outside pandas as a preferred storage format, and
allowing pandas to import the data in some useful way. Does that make sense at all?

@jreback
Copy link
Contributor

jreback commented Dec 12, 2012

I think this is a bit more complicated than roundtripping to xls/csv formats because those have really well defined APIs for their storage formats.

Pytables itself is well defined, but pandas implementation of various types is fairly complicated and type dependent (csv is pretty straightforward but of course has many variants that make parse ability hard, xls has the excel front end)

that said one COULD externally create a file that HDFStore can read natively - creating a series or DataFrame format is not that hard actually - I could write up a short bit If that is of interest

I don't how many people will really want to do it this way - if u already have it stored in pytables, then just read it into memory and put it in a frame and save it back as a store

so my vote is provide a simple API (and/or doc the current frame API)

On Dec 12, 2012, at 9:58 AM, y-p notifications@github.com wrote:

I read it as a question about HDF as a data source for pandas: like xls or csv.
obviously there are constraints on what can be accepted, but what's the state of
support for creating HDF files outside pandas as a preferred storage format, and
allowing pandas to import the data in some useful way. Does that make sense at all?


Reply to this email directly or view it on GitHub.

@wesm
Copy link
Member

wesm commented Dec 12, 2012

I agree a simple API for reading Table (which gives you column names) objects and Array formats (which doesn't) would be a good idea

@ghost
Copy link

ghost commented Dec 12, 2012

also a clarvoyant request to serialize metadata to HDFS

@jreback
Copy link
Contributor

jreback commented Dec 12, 2012

done c0003cb

so I save the attribute 'meta' if its there, on the roundtrip only populate if its not None

of course since its not-propogated by anything right now then you have to add it right before you save in HDFStore

# ok
df.meta = 'blahh'
store['a'] = df

# not ok now
df = df['a slicing operation']
store['a'] = df

@jreback
Copy link
Contributor

jreback commented Dec 12, 2012

@enjoyaol or @leroygr can you provide a sample of the pytables data structure you wish to store?

@jreback
Copy link
Contributor

jreback commented Jan 6, 2013

@enjoyaol or @leroygr any comments on this?

@leroygr
Copy link

leroygr commented Jan 8, 2013

Hi all,

Sorry for my late answer. I used @jreback method and it fits my needs! Thanks for this.

Greg

@jreback
Copy link
Contributor

jreback commented Jan 15, 2013

this is fixed in GH #2675
HDFStore will read a pytables like table (works from R as well)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

4 participants