Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Added axis argument to rename, reindex #17800

Merged
merged 2 commits into from
Oct 10, 2017

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Oct 5, 2017

xref: #12392

I want to test this a bit further and clean up the code for a bit, but figured I'd put this up as I'm going offline for a few hours.

Any comments on the general approach of adding *args, and then manually validating that the parameters are consistent?

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone Oct 5, 2017
@TomAugspurger
Copy link
Contributor Author

Ahh, I forgot that was keyword-only args are python3 only :/

@TomAugspurger
Copy link
Contributor Author

OK, this is closer to @jorisvandenbossche's suggesting of adding a new parameter at the beginning. I chose mapper for this, it'l be labels for reindex.

@codecov
Copy link

codecov bot commented Oct 6, 2017

Codecov Report

Merging #17800 into master will decrease coverage by 0.02%.
The diff coverage is 82.6%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17800      +/-   ##
==========================================
- Coverage   91.24%   91.22%   -0.03%     
==========================================
  Files         163      163              
  Lines       49967    49993      +26     
==========================================
+ Hits        45593    45606      +13     
- Misses       4374     4387      +13
Flag Coverage Δ
#multiple 89.03% <82.6%> (-0.01%) ⬇️
#single 40.24% <30.43%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 94.89% <ø> (ø) ⬆️
pandas/core/panel.py 96.94% <ø> (ø) ⬆️
pandas/core/frame.py 97.76% <100%> (-0.09%) ⬇️
pandas/core/generic.py 91.85% <33.33%> (-0.19%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/internals.py 94.38% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22515f5...7ac2e3b. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 6, 2017

Codecov Report

Merging #17800 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17800      +/-   ##
==========================================
+ Coverage   91.22%   91.22%   +<.01%     
==========================================
  Files         163      163              
  Lines       49994    50009      +15     
==========================================
+ Hits        45606    45621      +15     
  Misses       4388     4388
Flag Coverage Δ
#multiple 89.03% <100%> (+0.02%) ⬆️
#single 40.25% <46.42%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/panel.py 96.94% <ø> (ø) ⬆️
pandas/core/series.py 94.88% <ø> (-0.1%) ⬇️
pandas/core/generic.py 92.03% <ø> (-0.06%) ⬇️
pandas/core/sparse/series.py 95.26% <ø> (ø) ⬆️
pandas/core/frame.py 97.77% <100%> (-0.08%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/datetimes.py 95.58% <0%> (+0.09%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d12a7a0...0e668bf. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 6, 2017

DataFrame docstring:

Signature: df.rename(mapper=None, index=None, columns=None, axis=None, **kwargs)
Docstring:
Alter axes input function or functions. Function / dict values must be
unique (1-to-1). Labels not contained in a dict / Series will be left
as-is. Extra labels listed don't throw an error. Alternatively, change
``Series.name`` with a scalar value (Series only).

Parameters
----------
mapper : dict-like or function
    Applied to the axis specified by `axis`
index, columns : scalar, list-like, dict-like or function, optional
    Scalar or list-like will alter the ``Series.name`` attribute,
    and raise on DataFrame or Panel.
    dict-like or functions are transformations to apply to
    that axis' values
copy : boolean, default True
    Also copy underlying data
inplace : boolean, default False
    Whether to return a new DataFrame. If True then value of copy is
    ignored.
level : int or level name, default None
    In case of a MultiIndex, only rename labels in the specified
    level.

Returns
-------
renamed : DataFrame (new object)
...

Examples:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], 'C': [1, 2]})

In [4]: df.rename(id)
Out[4]:
            A  B  C
4370471600  1  a  1
4370471632  2  b  2

In [5]: df.rename(id, axis=1)
Out[5]:
   4373200824 4372848512  4372782920
0           1          a           1
1           2          b           2

In [6]: df.rename(id, id)
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py:2936: UserWarning: Interpreting call to '.rename(a, b)' as '.rename(index=a, columns=b)'. Use keyword arguments to remove ambiguity.
  UserWarning)
Out[6]:
            4373200824 4372848512  4372782920
4370471600           1          a           1
4370471632           2          b           2

In [7]: df.rename(index=id, columns=id)
Out[7]:
            4373200824 4372848512  4372782920
4370471600           1          a           1
4370471632           2          b           2

In [8]: df.rename(index=id, axis=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-8a1d2adfffdf> in <module>()
----> 1 df.rename(index=id, axis=1)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in rename(self, mapper, index, columns, axis, **kwargs)
   2916             axis = self._get_axis_name(axis)
   2917             if index is not None or columns is not None:
-> 2918                 raise TypeError("Can't specify both 'axis' and 'index' or "
   2919                                 "'columns' Specify either\n"
   2920                                 "\t.rename(mapper, axis=axis), or\n"

TypeError: Can't specify both 'axis' and 'index' or 'columns' Specify either
        .rename(mapper, axis=axis), or
        .rename(index=index, columns=columns)

In [9]: df.rename(columns=id, axis=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-2dd7c9eb0abf> in <module>()
----> 1 df.rename(columns=id, axis=1)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in rename(self, mapper, index, columns, axis, **kwargs)
   2916             axis = self._get_axis_name(axis)
   2917             if index is not None or columns is not None:
-> 2918                 raise TypeError("Can't specify both 'axis' and 'index' or "
   2919                                 "'columns' Specify either\n"
   2920                                 "\t.rename(mapper, axis=axis), or\n"

TypeError: Can't specify both 'axis' and 'index' or 'columns' Specify either
        .rename(mapper, axis=axis), or
        .rename(index=index, columns=columns)

In [10]: df.rename(index=id, columns=id, axis=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-a15c6c06cf1d> in <module>()
----> 1 df.rename(index=id, columns=id, axis=1)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in rename(self, mapper, index, columns, axis, **kwargs)
   2916             axis = self._get_axis_name(axis)
   2917             if index is not None or columns is not None:
-> 2918                 raise TypeError("Can't specify both 'axis' and 'index' or "
   2919                                 "'columns' Specify either\n"
   2920                                 "\t.rename(mapper, axis=axis), or\n"

TypeError: Can't specify both 'axis' and 'index' or 'columns' Specify either
        .rename(mapper, axis=axis), or
        .rename(index=index, columns=columns)

In [11]: df.rename(mapper=id, index=id, columns=id)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-0175716afaa0> in <module>()
----> 1 df.rename(mapper=id, index=id, columns=id)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in rename(self, mapper, index, columns, axis, **kwargs)
   2925                 columns = mapper
   2926         elif all(x is not None for x in (mapper, index, columns)):
-> 2927             raise TypeError("Cannot specify all of 'mapper', 'index', and "
   2928                             "'columns'. Specify 'mapper' and 'axis', or "
   2929                             "'index' and 'columns'.")

TypeError: Cannot specify all of 'mapper', 'index', and 'columns'. Specify 'mapper' and 'axis', or 'index' and 'columns'.

The most concerning part (to me) is the one emitting a warning. I interpret df.rename(id, id), as df.rename(index=id, columns=id), which was the previous signature. This means we also interpret df.rename(mapper=id, index=id) as df.rename(index=id, columns=id), which feels wrong, but I think is unavoidable.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 6, 2017

Probably too late, but in case it can still be used as inspiration, my original branch: https://github.com/pandas-dev/pandas/compare/master...jorisvandenbossche:harmonize-reindex-drop-original?expand=1 (from a long time ago, in the recent PR I removed the reindex related changes and only cleaned-up drop)

Using axis-style parameters

>>> df.rename(str.lower, axis='columns')
A B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case ?

@jorisvandenbossche
Copy link
Member

Are you also going to do reindex ? In that case, you can copy the tests from the branch I linked to above (and maybe also code)

@TomAugspurger
Copy link
Contributor Author

Are you also going to do reindex ?

Working on it now. 1e62a0e refactored the common bits out. Perhaps I'll just push my reindex branch in this PR? They're similar enough.

I'll take your tests.

@jorisvandenbossche
Copy link
Member

Perhaps I'll just push my reindex branch in this PR? They're similar enough.

As you prefer, no problem to have them both here (certainly if they would share code to do the validation)

@TomAugspurger
Copy link
Contributor Author

Ok (force) pushed two clean commits, one for rename and one for reindex.

I think this should be ready for review.

@@ -109,6 +109,33 @@ For example:
# the following is now equivalent
df.drop(columns=['B', 'C'])

``rename``, ``reindex`` now also accepts axis keyword
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, typo on accepts -> accept.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 6, 2017

Trying to figure out the failing Panel test https://travis-ci.org/pandas-dev/pandas/jobs/284200489#L1392 now (cc @jreback)

Edit: nvm, I think I found it.

@TomAugspurger TomAugspurger changed the title [WIP]API: Added axis argument to rename API: Added axis argument to rename Oct 6, 2017
elif axis == 'columns':
columns = arg

elif all(x is not None for x in (arg, index, columns)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use pandas.core.common._all_not_none(arg, index, columns) here instead.

).format(arg_name=arg_name)
raise TypeError(msg)

elif axis is None and (arg is not None and index is not None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just be _all_not_none(arg, index). The axis is None part should always be true due to the first if statement.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(didn't look at the tests yet)

Implementation-wise, I am wondering if it would be cleaner to convert the args to axis style instead to index/columns style. As inside reindex and rename, the index/column style is then converted to axis style (with _construct_axes_from_arguments). So now you do the conversion sometimes twice in the different direction.
But if that is harder, I find it more important we get this in than to make it internally cleaner :-)

On a separate note, we should think whether we want to deprecate reindex_axis and rename_axis or not (can be separate though).

optional_labels="""labels : array-like
New labels / index to conform the axis specified by 'axis' to.""",
optional_axis="""axis : int or str, optional
Axis to target. Can be either the axis name ('rows', 'columns')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rows -> index ?

versionadded_to_excel='',
optional_mapper="""mapper : dict-like or function
Applied to the axis specified by `axis`""",
optional_labels="""labels : array-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 'optional' as well (this is also specified for index/columns)

@@ -742,11 +742,13 @@ def swaplevel(self, i=-2, j=-1, axis=0):

Parameters
----------
%(optional_mapper)s
%(axes)s : scalar, list-like, dict-like or function, optional
Scalar or list-like will alter the ``Series.name`` attribute,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also true for the mapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Series.rename docstring right now:

Signature: df.A.rename(index=None, **kwargs)
Docstring:
Alter axes input function or functions. Function / dict values must be
unique (1-to-1). Labels not contained in a dict / Series will be left
as-is. Extra labels listed don't throw an error. Alternatively, change
``Series.name`` with a scalar value (Series only).

Parameters
----------

index : scalar, list-like, dict-like or function, optional
    Scalar or list-like will alter the ``Series.name`` attribute,
    and raise on DataFrame or Panel.
    dict-like or functions are transformations to apply to
    that axis' values

So i didn't modify the signature at all. It doesn't have an mapper argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that not having mapper or axis kwargs is the better option here, but I may be wrong)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wanted to say that for the DataFrame docstring (this is in generic.py).
So I meant, for a dataframe.rename, is the explanation of index/column, also valid for mapper (scalar vs mapping)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the dict / function-like arguments descriptions are correct. Scalar / list-like don't apply. I could make that bit a shared_doc section... Let me try quick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, at some point a separate docstring might be easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this is pretty messy. I'm just going to leave as-is for now. For DataFrame.rename the docstring starts

Parameters
----------
mapper : dict-like or function
    Applied to the axis specified by `axis`
index, columns : scalar, list-like, dict-like or function, optional
    Scalar or list-like will alter the ``Series.name`` attribute,
    and raise on DataFrame or Panel.
    dict-like or functions are transformations to apply to
    that axis' values
axis : int or str, optional
    Axis to target. Can be either the axis name ('index', 'columns')
    or number (0, 1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would also create separate docstrings for the series/dataframe versions of those methods. It will give a bit of duplication, but it will be much easier and take less time to make sure this duplication is done correctly than trying to figure out how to make it work with the docstring substitution.
But agree that can be in another PR.

Note that ``.reindex`` can be called by specifying either

* One or both of ``index`` and ``columns``
* ``labels`` and ``axis``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no indentation needed

def _validate_axis_style_args(self, arg, arg_name, index, columns,
axis, method_name):
if axis is not None:
# Using "axis" style, along with a positional arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it would make it a lot harder if the default axis=0 is used instead of None ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I raise on

df.reindex(labels, axis=0)
df.reindex(index=labels, axis=0)

If I change the default axis to 0, then I can't detect those cases. I could go either way here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you raise on df.reindex(labels, axis=0) ? That seems perfectly valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid, just the axis=0 is redundant. Likewise with columns=labels, axis=1 (I currently raise). Happy to adjust that though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that are two different things; because you can either specify mapper/axis or index/columns.
df.reindex(labels, axis=0) is a case of mapper/axis, and is thus perfectly valid (and even more explicit than leaving out axis=0, although it is the default, so I don't think we should raise on this), while df.reindex(columns=labels, axis=1) is mixture of columns and axis, so OK to raise on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.reindex(index=labels, axis=0) is a bit the tricky case, as this one is indeed redundant and mixing the two. So in principle I would also raise here like for df.reindex(columns=labels, axis=1).
But if it is easier implementation-wise to allow that, I think that is OK (as although it is mixing both idioms, it is consistent in which axis compared to eg df.reindex(index=labels, axis=1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, #17800 (comment) was incorrect. .reindex(labels, axis=0) should clearly work!

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do raise on the "mixing" case, so I think we agree on what should happen? And I think the current implementation does that. I'll ensure there are tests for all this.

In [1]: import pandas as pd
d
In [2]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

In [3]: df.reindex([0, 1], axis=0)
Out[3]:
   A  B
0  1  4
1  2  5

In [4]: df.reindex(index=[0, 1], axis=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-d6f30fd70cc7> in <module>()
----> 1 df.reindex(index=[0, 1], axis=0)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in reindex(self, labels, index, columns, axis, **kwargs)
   2945         index, columns = self._validate_axis_style_args(labels, 'labels',
   2946                                                         index, columns,
-> 2947                                                         axis, 'reindex')
   2948         return super(DataFrame, self).reindex(index=index, columns=columns,
   2949                                               **kwargs)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in _validate_axis_style_args(self, arg, arg_name, index, columns, axis, method_name)
   2788                     "\t.{method_name}.rename(index=index, columns=columns)"
   2789                 ).format(arg_name=arg_name, method_name=method_name)
-> 2790                 raise TypeError(msg)
   2791             if axis == 'index':
   2792                 index = arg

TypeError: Can't specify both 'axis' and 'index' or 'columns'. Specify either
        .reindex.rename(labels, axis=axis), or
        .reindex.rename(index=index, columns=columns)

In [5]: df.reindex(columns=[0, 1], axis=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-02a868f92f79> in <module>()
----> 1 df.reindex(columns=[0, 1], axis=1)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in reindex(self, labels, index, columns, axis, **kwargs)
   2945         index, columns = self._validate_axis_style_args(labels, 'labels',
   2946                                                         index, columns,
-> 2947                                                         axis, 'reindex')
   2948         return super(DataFrame, self).reindex(index=index, columns=columns,
   2949                                               **kwargs)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in _validate_axis_style_args(self, arg, arg_name, index, columns, axis, method_name)
   2788                     "\t.{method_name}.rename(index=index, columns=columns)"
   2789                 ).format(arg_name=arg_name, method_name=method_name)
-> 2790                 raise TypeError(msg)
   2791             if axis == 'index':
   2792                 index = arg

TypeError: Can't specify both 'axis' and 'index' or 'columns'. Specify either
        .reindex.rename(labels, axis=axis), or
        .reindex.rename(index=index, columns=columns)

@@ -109,6 +109,33 @@ For example:
# the following is now equivalent
df.drop(columns=['B', 'C'])

``rename``, ``reindex`` now also accepts axis keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ref tag

@@ -109,6 +109,33 @@ For example:
# the following is now equivalent
df.drop(columns=['B', 'C'])

``rename``, ``reindex`` now also accepts axis keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be same length as the title (hard to tell)

.. ipython::

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(str.lower, axis='columns')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe show both usages?

raise TypeError(msg)

elif _all_not_none(arg, index):
# This is the "ambiguous" case, so emit a warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth factoring this function out if its common with the drop changes? not sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put in pandas/util/_validators.py with all other arg validation code

@@ -837,6 +837,107 @@ def test_rename_objects(self):
assert 'FOO' in renamed
assert 'foo' not in renamed

def test_rename_columns(self):
df = pd.DataFrame({"A": [1, 2], "B": [1, 2]}, index=['X', 'Y'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue number

@TomAugspurger TomAugspurger changed the title API: Added axis argument to rename API: Added axis argument to rename, reindex Oct 6, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor doc-comments, otherwise lgtm.


And ``reindex``:

.. ipython::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some docs where these are using (in basics.rst), or a link to here?

@@ -802,12 +805,27 @@ def swaplevel(self, i=-2, j=-1, axis=0):
0 1 4
1 2 5
2 3 6

Using axis-style parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you showing examples of both? maybe put your highly recommended warning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added.

@@ -2825,6 +2849,26 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
IE10 404 0.08
Chrome 200 0.02

We can also reindex the columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same add the warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added above.

@TomAugspurger
Copy link
Contributor Author

Got all the doc comments addressed I think.

I haven't had time to clean up the implementation to either refactor the changes to .drop to share code, or to use Joris' suggestion of using axis-style instead of index/columns.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2017

@TomAugspurger ping when ready to have a look

@TomAugspurger
Copy link
Contributor Author

I think it's ready.

I still haven't had time to refactor the implementation, and probably won't have time before the RC.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments/

There are two cases that currently do not fail:

In [9]:  df.rename(index=id, axis=0)
Out[9]: 
                 A  B
139753784065696  1  4
139753784065728  2  5
139753784065760  3  6

In [10]:  df.rename(columns=id, axis=0)
Out[10]: 
   139753756957136  139753785070512
0                1                4
1                2                5
2                3                6

But since axis=0 is the default we cannot detect this I suppose? (and I asked myself to make axis=0 the default instead of axis=None :-)). In that case, I prefer keeping the default axis and not raising for those above.
(so as it is now :-))

The :meth:`DataFrame.rename` and :meth:`DataFrame.reindex` methods have gained
the ``axis`` keyword to specify the axis to target with the operation. This
harmonizes the method signature of these methods with most of the rest of the
library (:issue:`12392`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally wouldn't add the "harmonizes the method signature of these methods with most of the rest of the library " (but that's more personal preference maybe :-) as I said in the issue, IMO it is not comparable with many other functions that use axis as arg, as it is a different use case, it does make it consistent with drop though, so I would mention that (which also makes it more match with the whatsnew entry about drop above))

.. ipython:: python

df.rename(index=id, columns=str.lower)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C']) as well? to make it clear that it is not only rename but also reindex that keeps the old way

@@ -742,11 +742,13 @@ def swaplevel(self, i=-2, j=-1, axis=0):

Parameters
----------
%(optional_mapper)s
%(axes)s : scalar, list-like, dict-like or function, optional
Scalar or list-like will alter the ``Series.name`` attribute,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would also create separate docstrings for the series/dataframe versions of those methods. It will give a bit of duplication, but it will be much easier and take less time to make sure this duplication is done correctly than trying to figure out how to make it work with the docstring substitution.
But agree that can be in another PR.

warnings.warn(msg, stacklevel=3)
index, columns = arg, index
elif index is None:
index = arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the default (labels, axis=0) case ? (if so, maybe good to add a comment)

with tm.assert_raises_regex(TypeError, None):
df.rename(str.lower, str.lower, str.lower)

def test_drop_api_equivalence(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one already existed, and I don't see it removed somewhere else in the diff (if you would have moved it from another file). So maybe this was duplicated by accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I messed up when I brought in your tests.

@@ -2891,7 +2940,11 @@ def align(self, other, join='outer', axis=None, level=None, copy=True,
broadcast_axis=broadcast_axis)

@Appender(_shared_docs['reindex'] % _shared_doc_kwargs)
def reindex(self, index=None, columns=None, **kwargs):
def reindex(self, labels=None, index=None, columns=None, axis=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be easier to simply default axis=None, then in the code if you can set it to 0 where appropriate. This way you can detect whether its set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This messes with the default displayed in the method signature though (I believe that was @jorisvandenbossche's reason for wanting axis=0 as the default?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am not too fixed on it. It is just a trade-off between having a more informative function signature (having the actual default value) vs being able to raise an error in the corner case of the user specifying df.reindex/rename(columns=.., axis=0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you have to give that up. We don't do this anywhere else. Sure if you can preserve it would be great, but generally wouldn't work too hard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I mean we don't do this anywhere else is that when we need to change an arg to see if we need to deprcate it, it should simply be None. this case is a little different so it may be possible to preserve it.

@TomAugspurger
Copy link
Contributor Author

Fixups:

  1. Dedicated docstrings for Series.rename and DataFrame.rename. I bit of duplication, but agreed w/ @jorisvandenbossche that it's better
  2. Changed the default axis from 0 to None for DataFrame.reindex and rename. I think it's better to catch this.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some doc comments

----------
mapper : dict-like or function, optional
Applied to the axis specified by ``axis``
index, columns : scalar, list-like, dict-like or function, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'scalar' is not correct anymore ? (as this is only for series)

also list-like is here and not for mapper. Although I am not sure when you actually can use a list-like

BTW, since the type and explanation of both mapper and index/columns are the same, we could also combine all three of them like now index, columns are combined.
Then inside the description can explain that either axis needs to be specified or either index/columns can be used.


See Also
--------
pandas.NDFrame.rename_axis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NDFrame -> DataFrame

def rename(self, index=None, **kwargs):
"""Alter Series row labels or name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'row labels' -> 'index labels' ? (only a suggestion)


Parameters
----------
index : scalar, list-like, dict-like or function, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as before, I am not sure about the 'list-like'. I tried a few cases and it just errors. When looking at the code, list-likes seem to take the "changing Series.name" route, but names need to be hashable. So a tuple works, but not a list.


See Also
--------
pandas.DataFrame.rename_axis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFrame -> Series

@jreback
Copy link
Contributor

jreback commented Oct 10, 2017

lgtm merge when ready (rebase)

@TomAugspurger TomAugspurger merged commit 727ea20 into pandas-dev:master Oct 10, 2017
@TomAugspurger TomAugspurger deleted the harmonize-rename branch October 10, 2017 15:18
@jorisvandenbossche
Copy link
Member

@TomAugspurger Thanks a lot for finishing this!

Any opinion on #17833 ?

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
* API: Added axis argument to rename

xref: pandas-dev#12392

* API: Accept 'axis' keyword argument for reindex
@TomAugspurger
Copy link
Contributor Author

One backwards-incompatible change that I didn't consider, using

In [8]: df = pd.DataFrame(columns=['A', 'B'])

In [10]: df.rename(None, str.lower)
Out[10]:
Empty DataFrame
Columns: [A, B]
Index: []

In previous versions that meant .rename(index=None, columns=str.lower), but now it's .rename(mapper=None, index=str.lower). I think this is fundamentally ambiguous correct? This is breaking dask, which used positional arguments, but I suspect it'll break more code :/

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
* API: Added axis argument to rename

xref: pandas-dev#12392

* API: Accept 'axis' keyword argument for reindex
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
* API: Added axis argument to rename

xref: pandas-dev#12392

* API: Accept 'axis' keyword argument for reindex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants