Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series ops with a rhs of a Timestamp raising exception (#2898) #2899

Merged
merged 1 commit into from
Feb 23, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Feb 19, 2013

  • Series ops with a rhs of a Timestamp was throwing an exception (BUG: Series subtraction with Timestamp on rhs #2898)
  • fixed issue in _index.convert_scalar where the rhs was a non-scalar, lhs was dtype of M8[ns], and was trying
    to convert to a scalar (but didn't need conversion)
  • added some utilities in tslib.pyx, inference.pyx to detect and convert timedelta/timedelta64
  • added timedelta64 to checknull routines, representing timedelta64 as NaT
  • aded setitem support to set NaT via np.nan (analagously to datetime64)
  • added ability for Series to detect and set a timedelta64[ns] dtype (if all passed objects are timedelta like)
  • added correct printing of timedelta64[ns](looks like py3.3 changed the default for str%28x%29, so rolled a new one)
In [149]: from datetime import datetime, timedelta

In [150]: s  = Series(date_range('2012-1-1', periods=3, freq='D'))

In [151]: td = Series([ timedelta(days=i) for i in range(3) ])

In [152]: df = DataFrame(dict(A = s, B = td))

In [153]: df
Out[153]: 
                    A                B
0 2012-01-01 00:00:00          0:00:00
1 2012-01-02 00:00:00   1 day, 0:00:00
2 2012-01-03 00:00:00  2 days, 0:00:00

In [154]: df['C'] = df['A'] + df['B']

In [155]: df
Out[155]: 
                    A                B                   C
0 2012-01-01 00:00:00          0:00:00 2012-01-01 00:00:00
1 2012-01-02 00:00:00   1 day, 0:00:00 2012-01-03 00:00:00
2 2012-01-03 00:00:00  2 days, 0:00:00 2012-01-05 00:00:00

In [156]: df.dtypes
Out[156]: 
A     datetime64[ns]
B    timedelta64[ns]
C     datetime64[ns]
Dtype: object

In [60]: s - s.max()
Out[60]: 
0    -2 days, 0:00:00
1     -1 day, 0:00:00
2             0:00:00
Dtype: timedelta64[ns]

In [61]: s - datetime(2011,1,1,3,5)
Out[61]: 
0    364 days, 20:55:00
1    365 days, 20:55:00
2    366 days, 20:55:00
Dtype: timedelta64[ns]

In [62]: s + timedelta(minutes=5)
Out[62]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
Dtype: datetime64[ns]

In [160]: y = s - s.shift()

In [161]: y
Out[161]: 
0              NaT
1   1 day, 0:00:00
2   1 day, 0:00:00
Dtype: timedelta64[ns]

The can be set to NaT using np.nan analagously to datetimes

In [162]: y[1] = np.nan

In [163]: y
Out[163]: 
0              NaT
1              NaT
2   1 day, 0:00:00
Dtype: timedelta64[ns]

# works on lhs too
In [64]: s.max() - s
Out[64]: 
0    2 days, 0:00:00
1     1 day, 0:00:00
2            0:00:00
Dtype: timedelta64[ns]

In [65]: datetime(2011,1,1,3,5) - s
Out[65]: 
0    -365 days, 3:05:00
1    -366 days, 3:05:00
2    -367 days, 3:05:00
Dtype: timedelta64[ns]

In [66]: timedelta(minutes=5) + s
Out[66]: 
0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
Dtype: datetime64[ns]

@jreback
Copy link
Contributor Author

jreback commented Feb 19, 2013

anybody have an issue with the API change vai unique, e.g. that it now returns a Series rather than an ndarray?

@wesm
Copy link
Member

wesm commented Feb 19, 2013

I do. I would rather users use drop_duplicates than unique to achieve this

@jreback
Copy link
Contributor Author

jreback commented Feb 19, 2013

prob is I was using this with a datetime64[ns] columns
since it is not boxed as a ts, I think it's a bit unexpected (though correct)

@jreback
Copy link
Contributor Author

jreback commented Feb 19, 2013

all fixed up, restored unique behavior

fyi...py3 does funny things with np.timedelta64...., its auto converts them to int when you astype('O')
(which no other python version does!)

@jreback
Copy link
Contributor Author

jreback commented Feb 22, 2013

@y-p when you have a chance, can you take a look at test_timedelta in tests/test_format.py. I had to put in my own formatter to handle NaT and py3.3 change in the way printing works for timedelta64. It works and returns a string. Not sure if I need to deal with unicode in any way? thanks

@ghost
Copy link

ghost commented Feb 22, 2013

you're returning a string tha always contains pure ascii characters,
that should be ok.
python2 would try to convert that to unicode as needed using the system
encoding, and that should succeed on all encodings I've seen.

@jreback
Copy link
Contributor Author

jreback commented Feb 22, 2013

great thanks

@jreback
Copy link
Contributor Author

jreback commented Feb 22, 2013

@wesm any problem with merging this in?

     unique of a Series now returns a Series
     _index.convert_scalar now will ignore a ndarray object if the lhs is timelike
        (eg a non-scalar that is a series/ndarray is passed)

BUG: py3 issue with np.datetime64 conversion
     added array_timedelta_to_int conversions in tslib.pyx
     removed unique changes

DOC: added docs to timeseries for timedeltas and v0.11.0 whatsnew
     minor fixing of doc refernce in v0.10.0
     updated RELASE.rst

ENH: added ability for Series to set its dtype if it detects all timedelta like objects

ENH: added null checking and NaT support for timedelta64
     added formatter for timedelta64
     added setting via np.nan for NaT values (similar to datetime64[ns] support)

FMT: fixed timedelta64[ns] formatting (was breaking on py3.3)
     had to roll a new printer (repr_timedelta64) rather than use
     default str(x)

ENH: increased robustness in detection/conversion of timedelta64
     that are intermixed with np.nan,NaT,iNaT
jreback added a commit that referenced this pull request Feb 23, 2013
BUG: Series ops with a rhs of a Timestamp raising exception (#2898)
@jreback jreback merged commit e9b9ca5 into pandas-dev:master Feb 23, 2013
@ghost
Copy link

ghost commented Mar 18, 2013

Putting example from docs here, for readers of RELEASE.txt

  - Timedeltas are now fully operational (closes GH2898_)

    .. ipython:: python

        from datetime import datetime, timedelta
        s  = Series(date_range('2012-1-1', periods=3, freq='D'))
        td = Series([ timedelta(days=i) for i in range(3) ])
        df = DataFrame(dict(A = s, B = td))
        df
        s - s.max()
        df['C'] = df['A'] + df['B']
        df
        df.dtypes

        # timedelta are representas as ``NaT``
        y = s - s.shift()
    y

    # can be set via ``np.nan``
    y[1] = np.nan
    y

        # works on lhs too
    s.max() - s

        # some timedelta numeric operations are supported
        td - timedelta(minutes=5,seconds=5,microseconds=5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants