Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.to_records converts dates wrongly #1908

Closed
ukch opened this issue Sep 13, 2012 · 7 comments
Closed

DataFrame.to_records converts dates wrongly #1908

ukch opened this issue Sep 13, 2012 · 7 comments
Milestone

Comments

@ukch
Copy link

ukch commented Sep 13, 2012

Possibly related to #1720:

When converting a DataFrame to a recarray using df.to_records, date indexes are incorrectly converted.

>>> import pandas
>>> df = pandas.DataFrame([["one", "two", "three"], ["four", "five", "six"]], index=pandas.date_range("2012-01-01", "2012-01-02"))
>>> df
               0     1      2
2012-01-01   one   two  three
2012-01-02  four  five    six
>>> df.to_records()
rec.array([(datetime.datetime(1970, 1, 16, 224, 0), 'one', 'two', 'three'),
       (datetime.datetime(1970, 1, 16, 248, 0), 'four', 'five', 'six')], 
      dtype=[('index', ('<M8[ns]', {})), ('0', '|O8'), ('1', '|O8'), ('2', '|O8')])

Notice the dates have been converted to 1970, even though the original dates were in 2012.

@ukch
Copy link
Author

ukch commented Sep 13, 2012

>>> pandas.__version__
'0.9.0.dev-a83e691'
>>> numpy.__version__
'1.6.2'

@ukch
Copy link
Author

ukch commented Sep 13, 2012

I have found that converting the index to Python datetime values (using index.topydatetime()) yields the expected value.

@wesm
Copy link
Member

wesm commented Sep 13, 2012

It's a display/repr issue in NumPy 1.6 unfortunately. The actual nanosecond timestamps have not been altered

@ukch
Copy link
Author

ukch commented Sep 13, 2012

I am pretty sure this is not simply a display/repr issue. See the following output:

>>> recs[0][0]
1970-01-16 224:00:00
>>> recs[0][0].astype(datetime.datetime)
datetime.datetime(1970, 1, 16, 224, 0)

I noticed this problem while trying to convert a DataFrame object into a PostgreSQL table using the psycopg2 library. The values generated by psycopg2 when passed the above datetime-converted objects were for dates in 1970.

@wesm
Copy link
Member

wesm commented Sep 13, 2012

All caused by the same NumPy 1.6 bug. Maybe a solution is to add an option to to_records which sidesteps NumPy to properly convert the values to datetime.datetime

@petergx
Copy link

petergx commented Oct 31, 2012

+1

@paulproteus
Copy link

A new contributor who can reproduce this on their system should be able to write an implementation of this fairly quickly. (They would need to have a buggy version of NumPy, which may be very common! Otherwise, they would need to be familiar with 'pip' or other ways of installing/changing the version of NumPy installed.)

A question: should skipping NumPy be the default mode?

It should be possible to write a test case that checks that NumPy is skipped with the new argument to to_records, so it seems to me that the pull request should include a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants