-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: topk() always says a column contains 0 NULLs #8518
Comments
This is expected behavior due to the If you want to count tk = t.x.topk(10, by=t.count()) # or by=_.count() |
Could we consider changing what is expected behavior to what I expect? I could see two main uses for topk():
Messing up the expectations for usecase 1 reveals itself much earlier than for case 2. With case 2, your error might lurk for months without you realizing it. I was in this second boat. If we keep current behavior, the we definitely should remove the NULL: 0 row from the output, since that is actively wrong. |
Whatever we decide, value_counts() should also have the same semantics |
Hm, yeah the NULL is confusing in the output. I see what you mean. I think it makes sense to change this to be |
@cpcloud just to be sure you saw it, I edited my comment with some more info/context |
Thanks, got it! I think avoiding a loss of information is probably the way to go here. You can't recover uncounted nulls, whereas you can always discard the counted ones after the fact. |
related: #8540 |
What happened?
I would expect this to say that there is 1 NULL:
What version of ibis are you using?
main
What backend(s) are you using, if any?
duckdb
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: