Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pivot_table very slow on Categorical data; how about an observed keyword argument? #24923 #24953
pivot_table very slow on Categorical data; how about an observed keyword argument? #24923 #24953
Changes from 23 commits
d1554c2
0662fa3
6121313
9f93ab9
ebe5972
416e9c8
5c62063
8663be2
a1e3afe
22637a3
088f277
672847b
d97a077
9de99fa
9a9569f
c8e085d
2516386
0efeed8
13168d2
8518833
58a8f6e
12b8fac
09af30b
6df9e6d
a23b5d0
8d50e85
3d39dff
ee696d9
12c0f82
f586e42
cf7e8f5
a3bcf1a
bb7cfef
5921646
3c1720c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versionadded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have time in here? these types of things are handled by the asv suite. i would remove this entire test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure how to otherwise address the test requirements @WillAyd highlighted. The only measurable outcome that observed changes is the time of execution. Have now removed.
Happy to listen to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have an asv result, yes? IOW if you timeit under master and then under the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminr right I wasn't asking for a test on timing as the ASV will handle that. I was asking for a test where the data makes a difference when this argument is supplied.
Right now the existing test gives the same result whether or not observed is True or False, so it doesn't test that adding this actually does anything for the result explicitly. Can you not come up with a test and data where the keyword would yield different results and test that explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added an additional benchmark for this to asv_bench/benchmarks/reshape.py in 5c62063.
I've just added an additional benchmark where observed would default to False, with no kwarg being passed.
Master
Fails because the kwarg doesn't exist in the master branch yet.
feature/pivot_table_groupby_observed
Speed increase is demonstrated here.