Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: remove the table keyword, replaced by fmt='s|t' #4645

Merged
merged 3 commits into from
Aug 26, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Aug 22, 2013

  • API: the fmt keyword now replaces the table keyword; allowed values are s|t
    the same defaults as prior < 0.13.0 remain, e.g. put implies 's' (Storer) format
    and append imples 't' (Table) format

closes #4584 as well

@jreback
Copy link
Contributor Author

jreback commented Aug 22, 2013

cc @michaelaye, cc @Meteore, cc @bluefir

since you guys have given comments recently...any thoughts on this API change?

@bluefir
Copy link

bluefir commented Aug 23, 2013

Looks good to me!

…es are ``s|t``

     the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies 's' (Storer) format
     and ``append`` imples 't' (Table) format
jreback added a commit that referenced this pull request Aug 26, 2013
API: remove the table keyword, replaced by fmt='s|t'
@jreback jreback merged commit 49a21db into pandas-dev:master Aug 26, 2013
@michaelaye
Copy link
Contributor

Sorry for my silence, I was in Yellowstone completely offline! ;) I haven't used this functionality, what I am in general worried about is that knowledge of pytables becomes more and more a requirement for using pandas properly, at least for the hdf functionality. One could argue that data people need to deal with it in any case but I am loving pandas so much because it integrates many other python libraries seemlessly. In this case I wouldn't know what a 'storer' really is, apart from it's shown usage in the docs. Maybe my worries could be nullyfied by a helpful intro paragraph, unless that already exists and my 2 weeks absence made me miss it.

@jreback
Copy link
Contributor Author

jreback commented Aug 26, 2013

http://pandas.pydata.org/pandas-docs/dev/io.html#storer-format

if you can thing of a better name that storer let me know)

you don't need knowledge of the internals just reading the docs for various formats that u can store

lmk if this is still unclear

@alvorithm
Copy link

Sorry, was offline as well.

The new convention seems a bit more cryptic, but I have no other objection, and no code depending on the old one (but I am planning on using the HDF IO very soon). The name 'storer' could be substituted by something more indicative of what it is as opposed to 'table' (that also is a storer in a wide sense), though admittedly that may involve mentioning some pytables|hdf5-specific lingo.

Not yet landed home, just a first impression from a quick glance. Grain of salt.

@michaelaye
Copy link
Contributor

maybe 'fixed'(format) vs 'table' ?

@michaelaye
Copy link
Contributor

Reading a bit further, I definitely agree with the API change due to my feeling that there is no natural preference between one or the other format, something that easily could be presumed using booleans as switch.
I am wondering though, could it be benefitial to offer 2 more wrapper calls that imply the respective format setting? Something like to_hdfixed() (or hdstorer()) and to_hdtable() maybe? Or is this cluttering the API too much?

@jreback
Copy link
Contributor Author

jreback commented Aug 27, 2013

I like your suggestion - going to go with

format=fixed(f) | table(t)

I'll changed all the storer refs to fixed

I don't think should make additional to_hdf methods to much clutter ; and I think it makes sense to have a default of format=fixed (which is the equivalent of table=False)

@michaelaye
Copy link
Contributor

and I think it makes sense to have a default of format=fixed (which is the equivalent of table=False)

Really? I would have thought that most users expect the table to be append-able? Something along the credo of 'functionality before speed', so let the hardcore user that requires speed find out about the non-default setting?

@jreback
Copy link
Contributor Author

jreback commented Aug 27, 2013

the reason fixed is the default is just back compat (HDFStore originally started with just a fixed type)

what about an option setting eg

io.hdf_format = fixed (but you can changed the default to table)

then to_hdf will respect a passed format but default o the option setting?

@michaelaye
Copy link
Contributor

I like that!

@jtratner
Copy link
Contributor

I also find the behavior of HDFStore confusing to understand. What happens
if you get table instead of storer? If the only difference is a performance
hit, then maybe you could consider changing the default? Global default is
nice (though maybe it should be set via a method instead? (since I don't
think you can have module-level properties...)

@jreback
Copy link
Contributor Author

jreback commented Aug 28, 2013

tables are fundamentally different than fixed

they can be appended and queried (via expression)

see put vs append

the default is for back compat

they are two different storage back ends

think hard disk vs tape (not a great analogy because fixed are much faster)

PyTables supports many different types of storage formats (because HDF5 does)

the impetuous for the format parameter in general is really to support a new table type at some point

ctable - or column oriented tables

the user has to select a backend at creation time and they each have fundamental different access patterns and perf characteristics and so can/should be used in diverse situations

u basically pick the format depending in the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: 0.12.0 using DataFrame.to_hdf() with mode='a' does not append the data
5 participants