Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qcut user-reported failure #1978

Closed
wesm opened this issue Sep 26, 2012 · 5 comments
Closed

qcut user-reported failure #1978

wesm opened this issue Sep 26, 2012 · 5 comments
Assignees
Labels
Milestone

Comments

@wesm
Copy link
Member

wesm commented Sep 26, 2012

from pystatsmodels mailing list

qcut in pandas 0.8.1 is failing for some quantile lists but not others (see below). Sorry if I'm missing something...

type(F.g)

Out[3]:
pandas.core.series.TimeSeries

chg = (F.g[20:]-F.g[20:].shift(1))
fac = qcut(chg, [0, .5, 1])
fac

Out[4]:
Categorical: g
array([nan, [-1005094.81, 0], [-1005094.81, 0], ..., (0, 1478547.3],
       (0, 1478547.3], (0, 1478547.3]], dtype=object)
Levels (2): Index([[-1005094.81, 0], (0, 1478547.3]], dtype=object)

chg = (F.g[20:]-F.g[20:].shift(1))
fac = qcut(chg, [0, .5, 1])
fac

Out[5]:
Categorical: g
array([nan, [-1005094.81, 0], [-1005094.81, 0], ..., (0, 1478547.3],
       (0, 1478547.3], (0, 1478547.3]], dtype=object)
Levels (2): Index([[-1005094.81, 0], (0, 1478547.3]], dtype=object)

fac = qcut(chg, [0, .5, .75, 1])
fac

Out[6]:
Categorical: g
array([nan, [-1005094.81, 0], [-1005094.81, 0], ..., (0, 1478547.3],
       (0, 1478547.3], (0, 1478547.3]], dtype=object)
Levels (3): Index([[-1005094.81, 0], (0, 0], (0, 1478547.3]], dtype=object)

fac = qcut(chg, [0, .25, .5, .75, 1])
fac

Out[7]:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/birone/<ipython-input-9-3896feb36dcd> in <module>()
----> 1 fac = qcut(chg, [0, .25, .5, .75, 1])
      2 fac

/usr/lib/pymodules/python2.7/pandas/tools/tile.pyc in qcut(x, q, labels, retbins, precision)
    139     bins = algos.quantile(x, quantiles)
    140     return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
--> 141                          precision=precision, include_lowest=True)
    142 
    143 

/usr/lib/pymodules/python2.7/pandas/tools/tile.pyc in _bins_to_cuts(x, bins, right, labels, retbins, precision, name, include_lowest)
    177         levels = np.asarray(levels, dtype=object)
    178         np.putmask(ids, na_mask, 0)
--> 179         fac = Categorical(ids - 1, levels, name=name)
    180     else:
    181         fac = ids - 1

/usr/lib/pymodules/python2.7/pandas/core/categorical.pyc in __init__(self, labels, levels, name)
     43     def __init__(self, labels, levels, name=None):
     44         self.labels = labels
---> 45         self.levels = levels
     46         self.name = name
     47 

/usr/lib/pymodules/python2.7/pandas/core/categorical.pyc in _set_levels(self, levels)
     62         levels = _ensure_index(levels)
     63         if not levels.is_unique:
---> 64             raise ValueError('Categorical levels must be unique')
     65         self._levels = levels
     66 

ValueError: Categorical levels must be unique

fac = qcut(chg, n) fails with the same error for n>3
@ghost ghost assigned wesm Nov 2, 2012
@wesm
Copy link
Member Author

wesm commented Nov 3, 2012

Leaving this issue open with no milestone until reproducible test case found

@adamgreenhall
Copy link
Contributor

I get this error with an all zeros timeseries:

x = Series(0, index=date_range('2011-01-01', end='2011-01-31'))
cut(x, 10)

I think the series being cut needs more than one unique values for this error not to happen.

@wesm
Copy link
Member Author

wesm commented Feb 18, 2013

from ML "I hit this error, and there is an easy way to reproduce it... it is just a matter of having an array with multiple repeated values, with one of them being as large as the numbe rof elements on one category.
"

@why-not
Copy link

why-not commented Feb 26, 2013

Another example:

import pandas as pd
quantile = [0, 0.50, 0.75, 0.90, 0.95, 0.99, 1]
pd.qcut([0,0,0,0,0,0,0,0,0,0], quantile)

results in

ValueError: Categorical levels must be unique

Note:

pd.__version__ 
'0.10.1'

@wesm wesm closed this as completed in 8a18af8 Mar 28, 2013
@wesm
Copy link
Member Author

wesm commented Mar 28, 2013

Added a better error message. Even R craps out if there are duplicate bin edges so it doesn't make sense for qcut to "guess".

In [1]: paste
import pandas as pd
quantile = [0, 0.50, 0.75, 0.90, 0.95, 0.99, 1]
pd.qcut([0,0,0,0,0,0,0,0,0,0], quantile)

## -- End pasted text --
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-774b2988cbd4> in <module>()
      1 import pandas as pd
      2 quantile = [0, 0.50, 0.75, 0.90, 0.95, 0.99, 1]
----> 3 pd.qcut([0,0,0,0,0,0,0,0,0,0], quantile)

/home/wesm/code/pandas/pandas/tools/tile.pyc in qcut(x, q, labels, retbins, precision)
    139     bins = algos.quantile(x, quantiles)
    140     return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
--> 141                          precision=precision, include_lowest=True)
    142 
    143 

/home/wesm/code/pandas/pandas/tools/tile.pyc in _bins_to_cuts(x, bins, right, labels, retbins, precision, name, include_lowest)
    152 
    153     if len(algos.unique(bins)) < len(bins):
--> 154         raise Exception('Bin edges must be unique: %s' % repr(bins))
    155 
    156     if include_lowest:

Exception: Bin edges must be unique: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants