Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: masked ops for reductions (sum) #30982
PERF: masked ops for reductions (sum) #30982
Changes from 18 commits
9795f35
7fb1f88
cd26920
28cd331
6298fbd
735a741
6df454f
5eb48d6
3ef1331
d2230fd
19ac821
2776436
68b4dc2
18d5bfa
457efb1
4df858f
f5120db
76c5149
476f768
b2162dc
d4746f5
d9c2cbf
f8705c2
1a43e10
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid overriding a builtin-in name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care? In practice this is only used as
masked_reductions.sum(..)
, so there is no risk of confusing it (eg numpy also has a sum ..).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't have a problem with renaming it to eg
masked_sum
, though (but all together it then becomes a bit longmasked_reductions.masked_sum(..)
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nansum
? Not a strong enough opinion to be a blocker.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no nan's involved here ;), that's exactly the difference with nanops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha, sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required >=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a count, so yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche This seems like a useful pattern for other reductions like min and max (maybe a special case is needed if we have a
min_count
?), wondering if it might make sense to generalize this to those? Could be more performant than usingnanops
, and could also be done in a way that's extensible to string dtype (wherenanops
can't be used)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's indeed the goal of this PR (I just wanted to limit it to a single function for the initial PR). For min/max, I think it will be better to write a separate function in module since they don't use
min_count
, butprod
should be able to reuse this.I was planning to do follow-up PRs the and of this week, but feel free to pick it up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does
Tuple[int]
describe a 1-tuple, or any-length-tuple?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, apparently that needs to be
Tuple[int, ...]
for a variable length tuple