Skip to content

Commit

Permalink
ENH: upgrade categoricals to a first class pandas type
Browse files Browse the repository at this point in the history
     GH3943, GH5313, GH5314, GH7444

ENH: delegate _reduction and ops from Series to the categorical
     to support min/max and raise TypeError on other ops (numerical) and reduction

Add Categorical Properties to Series

Default to 'ordered' Categoricals if values are ordered

Categorical: add level assignments and reordering + changed default for ordered

Add a `Categorical.reorder_levels()` method. Change some naming in `Series`,
so that the methods do not clash with established standards and rename the
other categorical methods accordingly.

Also change the default for `ordered` to True if values + levels are passed
in at creation time.

Initial doc version for working with Categorical data

Categorical: add Categorical.mode() and use that in Series.mode()

Categorical: implement remove_unused_levels()

Categorical: implement value_count() for categorical series

Categorical: make Series.astype("category") work

ENH: add setitem to Categorical

BUG: assigning to levels not in level set now raises ValueError

API: disallow numpy ufuncs with categoricals

Categorical: Categorical assignment to int/obj column

ENH: add support for fillna to Categoricals

API: deprecate old style categorical constructor usage and change default

Before it was possible to pass in precomputed labels/pointer and the
corresponding levels (e.g.: `Categorical([0,1,2], levels=["a","b","c"])`).

This could lead to subtle errors in case of integer categoricals: the
following could be both interpreted as "precomputed pointers and
levels" or "values and levels", but converting it back to a integer
array would result in different arrays:

`np.array(Categorical([1,2], levels=[1,2,3]))`
interpreted as pointers: `[2,3]`
interpreted as values: `[1,2]`

Up to now we would favour old style "pointer and levels" if these
values could be interpreted as such (see code for details...). With
this commit we favour new style "values and levels" and only attempt
to interprete them as "pointers and levels" if "compat=True" is passed
to the constructor.

BREAKS: This will break code which uses Categoricals with "pointer and
levels". A short google search and a search on stackoverflow revealed
no such useage.

Categorical: document constructor changes and small fixes

Categorical: document that inappropriate numpy functions won't work anymore

ENH: concat support
  • Loading branch information
jreback committed Jul 14, 2014
1 parent 4ddb73a commit 0f62d3f
Show file tree
Hide file tree
Showing 30 changed files with 3,233 additions and 224 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,5 @@ doc/source/vbench
doc/source/vbench.rst
doc/source/index.rst
doc/build/html/index.html
# Windows specific leftover:
doc/tmp.sv
50 changes: 49 additions & 1 deletion doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ Time series-related
Series.tz_localize

String handling
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~
``Series.str`` can be used to access the values of the series as
strings and apply several methods to it. Due to implementation
details the methods show up here as methods of the
Expand Down Expand Up @@ -468,6 +468,54 @@ details the methods show up here as methods of the
StringMethods.upper
StringMethods.get_dummies

.. _api.categorical:

Categorical
~~~~~~~~~~~

.. currentmodule:: pandas.core.categorical

If the Series is of dtype ``category``, ``Series.cat`` can be used to access the the underlying
``Categorical``. This data type is similar to the otherwise underlying numpy array
and has the following usable methods and properties (all available as
``Series.cat.<method_or_property>``).


.. autosummary::
:toctree: generated/

Categorical
Categorical.levels
Categorical.ordered
Categorical.reorder_levels
Categorical.remove_unused_levels
Categorical.min
Categorical.max
Categorical.mode

To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
are also introduced. Apart from these methods, ``np.asarray(categorical)`` works by implementing the
array interface (`Categorical.__array__()`). Be aware, that this converts the
Categorical back to a numpy array, so levels and order information is not preserved!

.. autosummary::
:toctree: generated/

Categorical.from_array
Categorical.get_values
Categorical.copy
Categorical.dtype
Categorical.ndim
Categorical.sort
Categorical.describe
Categorical.equals
Categorical.unique
Categorical.order
Categorical.argsort
Categorical.fillna
Categorical.__array__


Plotting
~~~~~~~~
.. currentmodule:: pandas
Expand Down
Loading

0 comments on commit 0f62d3f

Please sign in to comment.