-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Have pivot and pivot_table take similar arguments #5505
Conversation
If you want to propose something we'd likely be open to it (depending on |
if rows is None: | ||
rows = kwarg['index'] | ||
else: | ||
raise TypeError("Can only specify either 'rows' or 'index'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think better way to do this is (without all this for loop stuff):
index = kwargs.get('index', None)
if index is not None:
if rows is None:
rows = index
else:
raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could index be an ndarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha, yes, should be "is not None"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtratner I'm sorry, but I don't follow how index being an ndarray breaks things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsexauer if something
does bool(something)
under the hood and ndarrays longer than length one can't be converted to bool:
>>> bool(np.array([1, 2, 3,4])
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hayd I think the flaw with your approach is that if the user passes an incorrect keyword argument, no error is raised. It seems to me we would want to raise a type error in such an event. This is especially import I feel for pivot_table()
where for example someone might pass agg_func
instead of aggfunc
and get frustrated wondering why the function is returning apparently incorrect results.
@jtratner The following experiment works for me:
>>> if np.ndarray([1,2]) is None:
... print 'None'
... else:
... print 'Not None'
...
Not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it a pop rather than a get then check the length of kwargs is 0. (if kwargs: raise...)
I guess the problem is that the API (order of args) won't be changed, so it's not really going to ever fully fix... :( |
@hayd hopefully my last commit addresses your concerns. |
A remark: would it make sense to choose which of the keywords (col/columns and rows/index) is preferential and only provide those to the other function (and not have both options in both functions)? |
@jsexauer does this have an issue? (aside from this PR) |
Not sure what you're asking @jreback. The included PR, which has no issues to my knowledge, fixes the issue regarding a discrepancy between the parameters for |
am asking if their is a reference issue for this (eg a prior issue); guess not |
I submitted this originally as an issue ticket but changed into a PR ticket when I coded the enhancement. |
msg = "Can only specify either 'cols' or 'columns'" | ||
raise TypeError(msg) | ||
|
||
if len(kwarg) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pythonic tip - you can just use "if kwarg" (my bad for saying otherwise)
This has tripped me in the past, so I think I'm +1 on this. |
I've implemented @hayd 's suggestions. |
(waiting til after 0.13 is final, but then it would need something in release notes and some tests, sorry I hadn't realised there were no tests 'til just now!) |
I want to repeat my remark (as there was not really a response). Isn't it possible to really make it more consistent instead of just providing both options to both functions as additional keyword arguments?
|
+1 on joris suggestion maybe attach a depr warning to use of cols? columns and index would be the most consistent with the rest of pandas, though rows is ok 2 (definitely not cols though) |
I think lets break |
should probably deprecate old args too. |
Assuming this passes CI, I believe this addresses everyone's concerns. Sorry for the delay. |
self.assertIsInstance(w[0], FutureWarning) | ||
self.assertIsInstance(w[1], FutureWarning) | ||
|
||
with warnings.catch_warnings(record=True) as w2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_produces_warning here (same thing just more consistent)
Ok good to know. Sorry you're going to get my full commit history then. I will do this in the future. |
But you can do it now, that is no problem. |
God I hope I did this right... |
Hmm, you have one commit (the third last one) that seems a squashed one, but the original commits are not gone, so clearly something went wrong, but I don't know what. If git rebase -i does not work, another option is to create a new branch and cherry-pick that one commit that is the good one, something like:
and then push this branch (to a new PR) |
So I deleted branch fix5505 on my local repo, and created a new version which was a squashed rebase of commit e3dad66 onto upstream/master (which is commit 0d74287). I then tried to push this new version of branch fix5505 onto jsexauer/pandas/fix5505 but it said I had to pull down the old version first, so I made a new empty commit [2409c0e] to make it happy. If there's a better way to do this last step, let me know, but this was the best I could figure out short of opening a new PR. If you'd prefer I did that I will, but it seems like it would be best to keep this mess in one ticket instead of spreading the wealth :) |
You have to force push at the end (as you've changed the history), you've now done it as a merge. Best advice is to create new branch off master and cherry-pick as Joris describes. then force push.
|
Bam, looks good I think? |
I'm really at a loss, as the test it fails on works for me:
|
@jsexauer In any case, the rebase looks good now! One clean commit. Regarding the test warning, this is because the warning is already triggered somewhere else, namely in the TestCrosstab tests in the same file (and only triggered once). You should also replace there all rows/cols to index/columns, as well as in eg the function crosstab (which is based on pivot_table). |
Ok I'll do that when I get home from work today. Would you like that squashed into the previous commit or as a new one? |
You can try to squash it at once, see if it works now .. :-) If you have further git questions, just ask! |
Thank you all for your patience in helping me figure out how to do this within your process. Unless someone indicates otherwise, I believe this PR is ready for merging. |
And you thanks fo the contribution! And travis is happy now A last thing, this should also be added to the release notes, under api changes I think. You can add in both https://github.com/pydata/pandas/blob/master/doc/source/v0.14.0.txt and https://github.com/pydata/pandas/blob/master/doc/source/release.rst a short line mentioning this. |
Updated |
looks good.. @jorisvandenbossche merge when you are ready |
Looks good! Thanks a lot @jsexauer |
ENH: Have pivot and pivot_table take similar arguments
Right now, there is
and
Because of dirtyness in my data I occasionally have to go back and forth between the two forms (and use
df.pivot_table(..., aggfunc='count')
to see who had the bad data). It would be nice if I didn't have to change the keyword arguments each time and one had an alias of the other. It would also be easier to remember, especially in the trivial case ofcols
vscolumns
. Finally, the two are related operations and having an option for consistent keyword parameters makes sense.I could make a PR to solve this easy enough, if this is an enhancement you are interested in. I'm not really sure what your stance is on redundant keyword arguments.