You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataFrame.drop_duplicates() does not properly handle array objects returned by DataFrame.columns (whether or not you use DataFrame.columns.values to get a NumPy array). If you compute
list(DataFrame.columns.values)
then it works, but this is needless overkill, especially when dealing with a large number of columns. Below is an example from IPython.
In [71]: dfrm=pandas.DataFrame({"A":[1,2,1,2,1,2], "B":[3,4,3,4,3,4], "C":[1,2,1,2,1,3]})
In [72]: dfrmOut[72]:
ABC013112422131324241315243In [73]: dfrm.drop_duplicates(dfrm.columns)
ERROR: AnunexpectederroroccurredwhiletokenizinginputThefollowingtracebackmaybecorruptedorinvalidTheerrormessageis: ('EOF in multi-line statement', (882, 0))
ERROR: AnunexpectederroroccurredwhiletokenizinginputThefollowingtracebackmaybecorruptedorinvalidTheerrormessageis: ('EOF in multi-line statement', (6442, 0))
---------------------------------------------------------------------------AssertionErrorTraceback (mostrecentcalllast)
/home/espears/<ipython-input-73-bee9ee352073>in<module>()
---->1dfrm.drop_duplicates(dfrm.columns)
/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/core/frame.pycindrop_duplicates(self, cols, take_last)
2254deduplicated : DataFrame2255"""-> 2256 duplicated = self.duplicated(cols, take_last=take_last) 2257 return self[-duplicated] 2258/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/core/frame.pyc in duplicated(self, cols, take_last) 2283 2284 duplicated = lib.duplicated(keys, take_last=take_last)-> 2285 return Series(duplicated, index=self.index) 2286 2287 #----------------------------------------------------------------------/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/core/series.pyc in __new__(cls, data, index, dtype, name, copy) 286 else: 287 subarr = subarr.view(Series)--> 288 subarr.index = index 289 subarr.name = name 290/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/_tseries.so in pandas._tseries.SeriesIndex.__set__ (pandas/src/tseries.c:73097)()AssertionError: Index length did not match valuesIn [74]: dfrm.drop_duplicates(dfrm.columns.values)ERROR: An unexpected error occurred while tokenizing inputThe following traceback may be corrupted or invalidThe error message is: ('EOF in multi-line statement', (882, 0))ERROR: An unexpected error occurred while tokenizing inputThe following traceback may be corrupted or invalidThe error message is: ('EOF in multi-line statement', (6442, 0))---------------------------------------------------------------------------AssertionError Traceback (most recent call last)/home/espears/<ipython-input-74-cb96df701a9b> in <module>()----> 1 dfrm.drop_duplicates(dfrm.columns.values)/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/core/frame.pyc in drop_duplicates(self, cols, take_last) 2254 deduplicated : DataFrame 2255 """->2256duplicated=self.duplicated(cols, take_last=take_last)
2257returnself[-duplicated]
2258/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/core/frame.pycinduplicated(self, cols, take_last)
22832284duplicated=lib.duplicated(keys, take_last=take_last)
->2285returnSeries(duplicated, index=self.index)
22862287#----------------------------------------------------------------------/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/core/series.pycin__new__(cls, data, index, dtype, name, copy)
286else:
287subarr=subarr.view(Series)
-->288subarr.index=index289subarr.name=name290/opt/epd/7.2-1/lib/python2.7/site-packages/pandas/_tseries.soinpandas._tseries.SeriesIndex.__set__ (pandas/src/tseries.c:73097)()
AssertionError: IndexlengthdidnotmatchvaluesIn [75]: dfrm.columns.valuesOut[75]: array([A, B, C], dtype=object)
In [76]: list(dfrm.columns.values)
Out[76]: ['A', 'B', 'C']
In [77]: dfrm.drop_duplicates(list(dfrm.columns.values))
Out[77]:
ABC013112425243
FWIW:
In [91]: pandas.__version__
Out[91]: '0.7.3'
The text was updated successfully, but these errors were encountered:
In [2]: dfrm=pandas.DataFrame({"A":[1,2,1,2,1,2], "B":[3,4,3,4,3,4], "C":[1,2,1,2,1,3]})
In [3]: dfrm.drop_duplicates(dfrm.columns)
Out[3]:
ABC013112425243
DataFrame.drop_duplicates()
does not properly handle array objects returned byDataFrame.columns
(whether or not you useDataFrame.columns.values
to get a NumPy array). If you computethen it works, but this is needless overkill, especially when dealing with a large number of columns. Below is an example from IPython.
FWIW:
The text was updated successfully, but these errors were encountered: