Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFStore loses data because it silently loses microseconds in datetime index conversion #513

Closed
bshanks opened this issue Dec 20, 2011 · 1 comment
Labels
Datetime Datetime data dtype
Milestone

Comments

@bshanks
Copy link

bshanks commented Dec 20, 2011

store = HDFStore('test.h5')
store.put('test', DataFrame([0, 1, 2], [datetime.utcnow(), datetime.utcnow(),datetime.utcnow(),], ['col1']), table=True)
store['test']
Duplicate entries in table, taking most recently appended
Out[50]: 
                     col1
2011-12-20 19:20:19  2   

This happens because HDFStore uses .timetuple() to serialize, but two datetimes can be unique yet have the same .timetuple(), because .timetuple discards microseconds (slightly related discussion: http://bugs.python.org/issue2736 ; i suppose that whatever they decided to do to convert datetime to datetime64 might work?).

@bshanks
Copy link
Author

bshanks commented Dec 20, 2011

i think this is what datetime64 does (from https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_datetime.py):

        # Construction from datetime.datetime
        assert_equal(np.datetime64('1980-01-25T14:36:22.5Z'),
                     np.datetime64(datetime.datetime(1980,1,25,
                                                14,36,22,500000)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

2 participants