Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: HDFStore enhancements #3531

Merged
merged 2 commits into from
May 8, 2013
Merged

ENH: HDFStore enhancements #3531

merged 2 commits into from
May 8, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented May 6, 2013

will warn if the existing frequency of an index is different than an appended one
(thought raising was too strict)

In [5]: df  = DataFrame(dict(A = Series(xrange(3), index=date_range('2000-1-1',periods=3,freq='H'))))

In [6]: df2 = DataFrame(dict(A = Series(xrange(3), index=date_range('2002-1-1',periods=3,freq='D'))))

In [9]: df.index
Out[9]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 00:00:00, ..., 2000-01-01 02:00:00]
Length: 3, Freq: H, Timezone: None

In [10]: df2.index
Out[10]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2002-01-01 00:00:00, ..., 2002-01-03 00:00:00]
Length: 3, Freq: D, Timezone: None

In [12]: df.to_hdf('test.h5','data',mode='w',append=True)

In [13]: df2.to_hdf('test.h5','data',append=True)
pandas/io/pytables.py:1148: FrequencyWarning: 
the frequency of the existing index is [<1 Hour>] which conflicts with the new freq [<1 Day>],
resetting the frequency to None

  warnings.warn(ws, FrequencyWarning)

@jreback
Copy link
Contributor Author

jreback commented May 6, 2013

this is technically an API change, so anyone have an issue with it?

or should I allow it, and maybe do a warning?

as this is technically ok, but I think with an HDFStore it is just wrong....

@wesm ?

In [1]: date_range('20130101',periods=3,freq='D')
Out[1]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-03 00:00:00]
Length: 3, Freq: D, Timezone: None

In [2]: date_range('20130101',periods=3,freq='H')
Out[2]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-01 02:00:00]
Length: 3, Freq: H, Timezone: None

In [3]: date_range('20130101',periods=3,freq='D')+date_range('20130101',periods=3,freq='H')
Out[3]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-03 00:00:00]
Length: 5, Freq: None, Timezone: None

jreback added 2 commits May 7, 2013 10:01
…on (GH3499_)

TST: added legacy_table_0.11 table and tests
DOC: update release notes/whatsnew, added whatsnew 0.11.1 to index.rst

ENH: warn a FrequencyWarning if appending with a different frequency that existing
@jreback
Copy link
Contributor Author

jreback commented May 7, 2013

I changed to a warning (see description) if attempting to append with a different frequency than what exists in the store (so it allows you to catch the warning if you really want to do this)

appending a timezone that is different will raise, however

@rockg
Copy link
Contributor

rockg commented Jul 1, 2013

This is great, but I think there is a problem with the tz storage. If you run the test with tz set in date_range, the resulting date_range retrieved from the store is shifted 5 hours.

df = DataFrame(dict(A = Series(xrange(3), index=date_range('2000-1-1',periods=3,freq='H', tz='US/Eastern')))) will fail the equal test.

@jreback
Copy link
Contributor Author

jreback commented Jul 1, 2013

works in master (this is not released yet btw), its in 0.12

what version are you trying on?

In [13]: df
Out[13]: 
                           A
2000-01-01 00:00:00-05:00  0
2000-01-01 01:00:00-05:00  1
2000-01-01 02:00:00-05:00  2

In [14]: df.index
Out[14]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 00:00:00, ..., 2000-01-01 02:00:00]
Length: 3, Freq: H, Timezone: US/Eastern

In [15]: df.to_hdf('tz.h5','df',mode='w',table=True)

In [16]: pd.read_hdf('tz.h5','df')
Out[16]: 
                           A
2000-01-01 05:00:00-05:00  0
2000-01-01 06:00:00-05:00  1
2000-01-01 07:00:00-05:00  2

In [17]: pd.read_hdf('tz.h5','df').index
Out[17]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 05:00:00, ..., 2000-01-01 07:00:00]
Length: 3, Freq: H, Timezone: US/Eastern

In [18]: pd.__version__
Out[18]: '0.12.0.dev-f09a03c'

@jreback
Copy link
Contributor Author

jreback commented Jul 1, 2013

never mind it is NOT workg...thought my test was catching it...thanks!

@jreback
Copy link
Contributor Author

jreback commented Jul 1, 2013

see #4098; this was tested as a column (and not an index), I'll see if I can fix this

@rockg
Copy link
Contributor

rockg commented Jul 1, 2013

Seems to me like the table mode breaks which I think is consistent with what you're saying (not clear to me in the first example without table mode if the DatetimeIndex is being stored as such rather than a column). Also, I'm curious to see if 'select' will be impacted at all by this change.

@jreback
Copy link
Contributor Author

jreback commented Jul 1, 2013

@rockg thanks for the report...fixed in master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants