-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement string
, boolean
and binary
dtype in top_k
#15488
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15488 +/- ##
==========================================
- Coverage 81.14% 81.13% -0.01%
==========================================
Files 1362 1362
Lines 174951 175115 +164
Branches 2533 2532 -1
==========================================
+ Hits 141960 142083 +123
- Misses 32508 32551 +43
+ Partials 483 481 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, but this is not a valid solution. The whole point of top_k
is that it is more efficient than sort
followed by head
.
If we want to support more datatypes in top_k
, we should implement the algorithm for those datatypes. Booleans can have a dedicated fast implementation because we know there are only two possible values. For strings, I don't know the correct approach.
Series.sort
in top_k
string
, boolean
and binary
dtype in top_k
Thanks for review. I've rewrite the pr. |
Co-authored-by: Ritchie Vink <ritchie46@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. Only one remark on the string path.
string
, boolean
and binary
dtype in top_k
string
, boolean
and binary
dtype in top_k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Going in. Thank you!
This allows more dtype to use
top_k
which only supported numeric dtypes before.