Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename signature fix #17966

Merged
merged 13 commits into from
Oct 27, 2017
Merged

Conversation

TomAugspurger
Copy link
Contributor

Sample usage:

In [3]: df = pd.DataFrame(index=['a', 'b'], columns=['A', 'B'])

In [4]: df.reindex(['a'], ['A'])
/Users/taugspurger/.virtualenvs/pandas-dev/bin/ipython:1: UserWarning: Intepreting call
        '.reindex(a, b)' as
        '.reindex(index=a, columns=b)'.
Use named arguments to remove any ambiguity.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3.6
Out[4]:
     A
a  NaN

In [5]: df.reindex(None, ['A'])
/Users/taugspurger/.virtualenvs/pandas-dev/bin/ipython:1: UserWarning: Intepreting call
        '.reindex(a, b)' as
        '.reindex(index=a, columns=b)'.
Use named arguments to remove any ambiguity.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3.6
Out[5]:
     A
a  NaN
b  NaN

I'd like to simplify this further if possible, things go messy :/
I'd also like to fix the function signature, which is doable on python3 at least.

@TomAugspurger
Copy link
Contributor Author

Added signature re-writing for python3 only. AFAICT, there's not a simple way to do that for both python 2 and 3.

@codecov
Copy link

codecov bot commented Oct 24, 2017

Codecov Report

Merging #17966 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17966      +/-   ##
==========================================
- Coverage   91.26%   91.23%   -0.04%     
==========================================
  Files         163      163              
  Lines       50117    50165      +48     
==========================================
+ Hits        45739    45766      +27     
- Misses       4378     4399      +21
Flag Coverage Δ
#multiple 89.04% <100%> (-0.02%) ⬇️
#single 40.33% <75.29%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/panel.py 97% <100%> (+0.04%) ⬆️
pandas/core/panel4d.py 100% <100%> (ø) ⬆️
pandas/core/generic.py 92.44% <100%> (-0.09%) ⬇️
pandas/core/frame.py 97.75% <100%> (-0.1%) ⬇️
pandas/util/_decorators.py 80.7% <100%> (+2.7%) ⬆️
pandas/util/_validators.py 96.34% <100%> (+2.59%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7aeccd3...edcbb9a. Read the comment docs.

@jorisvandenbossche
Copy link
Member

In principle this is fine for me (anyhow, the solution to interpret it as positional index=, columns= but warning is good), but have the feeling it is more complex as needed?
In the case the user passes positional arguments index, columns, this is captured as labels=index, index=columns, but specifying both labels as index is not valid? So normally we should raise for that case, but can't we catch this situation, reinterprete and do the warning as in this PR? Or am I missing something?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 24, 2017 via email

@jorisvandenbossche
Copy link
Member

Ah, yes, that's a case I didn't think of. Personally, I would go for accepting that minor disadvantage to have the simpler code. The question then maybe is if we would make the warning stronger in mentioning that this will be changed in the future (actually deprecate instead of a UserWarning), so that in the future we can actually raise for the labels=foo, index=bar case


sig = inspect.Signature(params)

func.__signature__ = sig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool that u can do this

axes = self._validate_axis_style_args(
labels, 'labels', axes=[items, major_axis, minor_axis],
axis=axis, method_name='reindex')
def reindex(self, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might actually make this more complicated, but would be ok with leaving panel/panel4d as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly tried that, but had to add several if obj.ndim >= 3 inside internals. Got a bit messy.

@TomAugspurger
Copy link
Contributor Author

Ah, yes, that's a case I didn't think of. Personally, I would go for accepting that minor disadvantage to have the simpler code

Hmm I'm having trouble making the simple way (no *args, **kwargs) work. IIUC, then all three of these have to be handled the same way (this is master):

In [15]: df.rename(None, str.lower)  # <----- This is the problem one
Out[15]:
     A    B
a  NaN  NaN
b  NaN  NaN

In [16]: df.rename(None, index=str.lower)
Out[16]:
     A    B
a  NaN  NaN
b  NaN  NaN

In [17]: df.rename(mapper=None, index=str.lower)
Out[17]:
     A    B
a  NaN  NaN
b  NaN  NaN

We want to treat [15] differently, since that is (presumably) using the old index/columns style.

@TomAugspurger
Copy link
Contributor Author

The question then maybe is if we would make the warning stronger in mentioning that this will be changed in the future

Once we're python3 only, we could using keyword-only arguments I think:

def rename(mapper=None, *, index=None, columns=None, axis=None, ...):

I should be able to emit a warning whenever len(args) > 1, and in the future we can have a regular signature and remove all the hacks in this PR.

@jorisvandenbossche
Copy link
Member

IUC, then all three of these have to be handled the same way (this is master):

I personally don't mind doing that (just all treating them as positional to deal with [15]). There will be a warning in any case, so people doing one of those three cases will typically change their code to not have the warning anymore.
In the case of df.rename(mapper=None, index=str.lower) the warning will not be fully accurate, but then, this is really a very strange thing to do.

@TomAugspurger
Copy link
Contributor Author

I'd be fine with raising on In[17], since if they passed anything other than None we would already raise. But I think In[16] is perfectly fine. We would be reinterpreting that as df.rename(index=None, columns=str.lower), or just .rename(columns=str.lower).

Let me try using different defaults other than None to see if that helps at all...

@jorisvandenbossche
Copy link
Member

@TomAugspurger if it is too complicated, I am also fine with this though! It's working, and we can later on remove the hacks again.

@TomAugspurger
Copy link
Contributor Author

FWIW, I find this version of _validate_axis_style_kwargs more understandable, but presumably because it's more recent :) My last commit add a docstring / examples, so hopefully that helps.

@jorisvandenbossche
Copy link
Member

To be clear, I mainly meant the signature hacking, I didn't look at the changes in _validate_axis_style_kwargs (since it moved it is a bit difficult to see what changed)

@TomAugspurger
Copy link
Contributor Author

Ah, yeah. FWIW, I'm 51/49 on whether overwriting the signature is a good thing or not. It's going to confuse static type checkers anyway. I'm OK with it for now since it'll be removed once we're Python3 only.

@TomAugspurger TomAugspurger changed the title [WIP]Rename signature fix Rename signature fix Oct 25, 2017
@TomAugspurger
Copy link
Contributor Author

OK, this should be all good.

warnings.warn(msg.format(method_name=method_name,), FutureWarning,
stacklevel=4)
out[self._AXIS_NAMES[0]] = args[0]
out[self._AXIS_NAMES[1]] = args[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would really love to move this out of generic (and just pass self into a static function, maybe into pandas.compat.validations or something).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used by reindex/rename, and now is sits next to those functions. IMO it is easier to keep it that way.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now also looked at the rewritten validate_axis_args method. Looks good!

elif len(args) == 2:
if 'axis' in kwargs:
# Unambiguously wrong
msg = "Cannot specify both {} and any of 'index' or 'columns'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "Cannot specify both axis and index / columns" ?(as you check that axis is specified)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and maybe ' ' around {})

warnings.warn(msg.format(method_name=method_name,), FutureWarning,
stacklevel=4)
out[self._AXIS_NAMES[0]] = args[0]
out[self._AXIS_NAMES[1]] = args[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used by reindex/rename, and now is sits next to those functions. IMO it is easier to keep it that way.

@TomAugspurger
Copy link
Contributor Author

@jreback are you OK with it in generic? I use self to figure out whether we're a Panel or a DataFrame, and it's only used on those two methods, so I think it's OK where it is.

@TomAugspurger TomAugspurger mentioned this pull request Oct 26, 2017
64 tasks
@jreback
Copy link
Contributor

jreback commented Oct 26, 2017

@jreback are you OK with it in generic? I use self to figure out whether we're a Panel or a DataFrame, and it's only used on those two methods, so I think it's OK where it is.

I don't really like it, because this is really validation code.

If you could move it would be much better (and forr sure pass in self as the arg)

@TomAugspurger
Copy link
Contributor Author

Moved to utils/_validators.py.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jorisvandenbossche jorisvandenbossche merged commit 3863063 into pandas-dev:master Oct 27, 2017
@TomAugspurger TomAugspurger deleted the rename-sig-fix branch October 27, 2017 12:04
peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API break with DataFrame.rename
3 participants