Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: deprecate setting of .ordered directly (GH9347, GH9190) #9611

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,8 @@ following usable methods and properties (all available as ``Series.cat.<method_o
Categorical.remove_categories
Categorical.remove_unused_categories
Categorical.set_categories
Categorical.as_ordered
Categorical.as_unordered
Categorical.codes

To create a Series of dtype ``category``, use ``cat = s.astype("category")``.
Expand Down
52 changes: 31 additions & 21 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,6 @@ By using some special functions:
See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`.

By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to a `DataFrame`.
This is the only possibility to specify differently ordered categories (or no order at all) at
creation time and the only reason to use :class:`pandas.Categorical` directly:

.. ipython:: python

Expand All @@ -103,6 +101,14 @@ creation time and the only reason to use :class:`pandas.Categorical` directly:
df["B"] = raw_cat
df

You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype()``:

.. ipython:: python

s = Series(["a","b","c","a"])
s_cat = s.astype("category", categories=["b","c","d"], ordered=False)
s_cat

Categorical data has a specific ``category`` :ref:`dtype <basics.dtypes>`:

.. ipython:: python
Expand Down Expand Up @@ -176,10 +182,9 @@ It's also possible to pass in the categories in a specific order:
s.cat.ordered

.. note::
New categorical data is automatically ordered if the passed in values are sortable or a
`categories` argument is supplied. This is a difference to R's `factors`, which are unordered
unless explicitly told to be ordered (``ordered=TRUE``). You can of course overwrite that by
passing in an explicit ``ordered=False``.

New categorical data are NOT automatically ordered. You must explicity pass ``ordered=True`` to
indicate an ordered ``Categorical``.


Renaming categories
Expand Down Expand Up @@ -270,29 +275,37 @@ Sorting and Order

.. _categorical.sort:

.. warning::

The default for construction has change in v0.16.0 to ``ordered=False``, from the prior implicit ``ordered=True``

If categorical data is ordered (``s.cat.ordered == True``), then the order of the categories has a
meaning and certain operations are possible. If the categorical is unordered, a `TypeError` is
raised.
meaning and certain operations are possible. If the categorical is unordered, an ``OrderingWarning`` is shown.

.. ipython:: python

s = Series(Categorical(["a","b","c","a"], ordered=False))
try:
s.sort()
except TypeError as e:
print("TypeError: " + str(e))
s = Series(["a","b","c","a"], dtype="category") # ordered per default!
s.sort()
s = Series(["a","b","c","a"]).astype('category',ordered=True)
s.sort()
s
s.min(), s.max()

You can set categorical data to be ordered by using ``as_ordered()`` or unordered by using ``as_unordered()``. These will by
default return a *new* object.

.. ipython:: python

s.cat.as_ordered()
s.cat.as_unordered()

Sorting will use the order defined by categories, not any lexical order present on the data type.
This is even true for strings and numeric data:

.. ipython:: python

s = Series([1,2,3,1], dtype="category")
s.cat.categories = [2,3,1]
s = s.cat.set_categories([2,3,1], ordered=True)
s
s.sort()
s
Expand All @@ -310,7 +323,7 @@ necessarily make the sort order the same as the categories order.
.. ipython:: python

s = Series([1,2,3,1], dtype="category")
s = s.cat.reorder_categories([2,3,1])
s = s.cat.reorder_categories([2,3,1], ordered=True)
s
s.sort()
s
Expand All @@ -326,8 +339,8 @@ necessarily make the sort order the same as the categories order.

.. note::

If the `Categorical` is not ordered, ``Series.min()`` and ``Series.max()`` will raise
`TypeError`. Numeric operations like ``+``, ``-``, ``*``, ``/`` and operations based on them
If the `Categorical` is not ordered, ``Series.min()`` and ``Series.max()`` will show an ``OrderingWarning``
Numeric operations like ``+``, ``-``, ``*``, ``/`` and operations based on them
(e.g.``Series.median()``, which would need to compute the mean between two values if the length
of an array is even) do not work and raise a `TypeError`.

Expand All @@ -339,7 +352,7 @@ The ordering of the categorical is determined by the ``categories`` of that colu

.. ipython:: python

dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b']),
dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b'],ordered=True),
'B' : [1,2,1,2,2,1,2,1] })
dfs.sort(['A','B'])

Expand Down Expand Up @@ -664,9 +677,6 @@ The following differences to R's factor functions can be observed:

* R's `levels` are named `categories`
* R's `levels` are always of type string, while `categories` in pandas can be of any dtype.
* New categorical data is automatically ordered if the passed in values are sortable or a
`categories` argument is supplied. This is a difference to R's `factors`, which are unordered
unless explicitly told to be ordered (``ordered=TRUE``).
* It's not possible to specify labels at creation time. Use ``s.cat.rename_categories(new_labels)``
afterwards.
* In contrast to R's `factor` function, using categorical data as the sole input to create a
Expand Down
1 change: 1 addition & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Highlights include:
- ``Series.to_coo/from_coo`` methods to interact with ``scipy.sparse``, see :ref:`here <whatsnew_0160.enhancements.sparse>`
- Backwards incompatible change to ``Timedelta`` to conform the ``.seconds`` attribute with ``datetime.timedelta``, see :ref:`here <whatsnew_0160.api_breaking.timedelta>`
- Changes to the ``.loc`` slicing API to conform with the behavior of ``.ix`` see :ref:`here <whatsnew_0160.api_breaking.indexing>`
- Changes to the default for ordering in the ``Categorical`` constructor, see :ref:`here <whatsnew_0160.api_breaking.categorical>`

See the :ref:`v0.16.0 Whatsnew <whatsnew_0160>` overview or the issue tracker on GitHub for an extensive list
of all API changes, enhancements and bugs that have been fixed in 0.16.0.
Expand Down
64 changes: 64 additions & 0 deletions doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ users upgrade to this version.
* ``Series.to_coo/from_coo`` methods to interact with ``scipy.sparse``, see :ref:`here <whatsnew_0160.enhancements.sparse>`
* Backwards incompatible change to ``Timedelta`` to conform the ``.seconds`` attribute with ``datetime.timedelta``, see :ref:`here <whatsnew_0160.api_breaking.timedelta>`
* Changes to the ``.loc`` slicing API to conform with the behavior of ``.ix`` see :ref:`here <whatsnew_0160.api_breaking.indexing>`
* Changes to the default for ordering in the ``Categorical`` constructor, see :ref:`here <whatsnew_0160.api_breaking.categorical>`

- Check the :ref:`API Changes <whatsnew_0160.api>` and :ref:`deprecations <whatsnew_0160.deprecations>` before updating

Expand Down Expand Up @@ -307,6 +308,69 @@ Backwards incompatible API changes
- ``Series.describe`` for categorical data will now give counts and frequencies of 0, not ``NaN``, for unused categories (:issue:`9443`)


Categorical Changes
~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0160.api_breaking.categorical:

In prior versions, ``Categoricals`` that had an unspecified ordering (meaning no ``ordered`` keyword was passed) were defaulted as ``ordered`` Categoricals. Going forward, the ``ordered`` keyword in the ``Categorical`` constructor will default to ``False``, ordering must now be explicit.

Furthermore, previously you *could* change the ``ordered`` attribute of a Categorical by just setting the attribute, e.g. ``cat.ordered=True``; This is now deprecated and you should use ``cat.as_ordered()`` or ``cat.as_unordered()``. These will by default return a **new** object and not modify the existing object. (:issue:`9347`, :issue:`9190`)

Previous Behavior

.. code-block:: python

In [3]: s = Series([0,1,2], dtype='category')

In [4]: s
Out[4]:
0 0
1 1
2 2
dtype: category
Categories (3, int64): [0 < 1 < 2]

In [5]: s.cat.ordered
Out[5]: True

In [6]: s.cat.ordered = False

In [7]: s
Out[7]:
0 0
1 1
2 2
dtype: category
Categories (3, int64): [0, 1, 2]

New Behavior

.. ipython:: python

s = Series([0,1,2], dtype='category')
s
s.cat.ordered
s = s.cat.as_ordered()
s
s.cat.ordered

# you can set in the constructor of the Categorical
s = Series(Categorical([0,1,2],ordered=True))
s
s.cat.ordered

For ease of creation of series of categorical data, we have added the ability to pass keywords when calling ``.astype()``, these are passed directly to the constructor.

.. ipython:: python

s = Series(["a","b","c","a"]).astype('category',ordered=True)
s
s = Series(["a","b","c","a"]).astype('category',categories=list('abcdef'),ordered=False)
s

- In prior versions, trying to ``.order()/.argsort()/.searchsorted()`` on an unordered ``Categorical`` would raise a ``TypeError``. This has been relaxed in that the operation will now succeed but show an ``OrderingWarning``. This will perform the ordering in the order of the categories, then in order of appearance for the values within that category. This operation will NOT modify the existing object. (:issue:`9148`)

Indexing Changes
~~~~~~~~~~~~~~~~

Expand Down
Loading