Skip to content

Commit

Permalink
BUG: bugs in DataFrame.duplicated with datetime64 columns close #1833
Browse files Browse the repository at this point in the history
  • Loading branch information
wesm committed Sep 9, 2012
1 parent 3a1de71 commit dc0db65
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 2 deletions.
2 changes: 2 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ pandas 0.9.0
- Don't modify NumPy suppress printoption at import time
- The internal HDF5 data arrangement for DataFrames has been
transposed. Legacy files will still be readable by HDFStore (#1834, #1824)
- Legacy cruft removed: pandas.stats.misc.quantileTS

**Bug fixes**

Expand Down Expand Up @@ -118,6 +119,7 @@ pandas 0.9.0
- Fix bug in __doc__ patching when -OO passed to interpreter (#1792, #1741)
- Fix unicode console encoding issue in IPython notebook (#1782, #1768)
- Fix unicode formatting issue with Series.name (#1782)
- Fix bug in DataFrame.duplicated with datetime64 columns (#1833)

pandas 0.8.1
============
Expand Down
10 changes: 8 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2663,11 +2663,17 @@ def duplicated(self, cols=None, take_last=False):
-------
duplicated : Series
"""
# kludge for #1833
def _m8_to_i8(x):
if issubclass(x.dtype.type, np.datetime64):
return x.view(np.int64)
return x

if cols is None:
values = list(self.values.T)
values = list(_m8_to_i8(self.values.T))
else:
if np.iterable(cols):
values = [self[x].values for x in cols]
values = [_m8_to_i8(self[x].values) for x in cols]
else:
values = [self[cols]]

Expand Down
11 changes: 11 additions & 0 deletions pandas/tseries/tests/test_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -1085,6 +1085,17 @@ def test_frame_timeseries_to_records(self):

result = df.to_records(index=False)

def test_frame_datetime64_duplicated(self):
dates = date_range('2010-07-01', end='2010-08-05')

tst = DataFrame({'symbol': 'AAA', 'date': dates})
result = tst.duplicated(['date', 'symbol'])
self.assert_((-result).all())

tst = DataFrame({'date': dates})
result = tst.duplicated()
self.assert_((-result).all())

def _simple_ts(start, end, freq='D'):
rng = date_range(start, end, freq=freq)
return Series(np.random.randn(len(rng)), index=rng)
Expand Down

0 comments on commit dc0db65

Please sign in to comment.