-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ==, isequal <, and isless for DataFrameRow and GroupKey #2669
Conversation
Ah - and it is mildly breaking as we now define This means in particular that this hash is different than |
We probably don't care about this since it's internal, right? |
Yes - I have checked that we use rowhash only in one place in the code so it should be OK. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Also |
OK - I hope I fixed everything :). This was hard (but hopefully now Julia Base will be also more consistent with |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@pdeffebach - this is what we concluded is a good design. Could you please comment on it before merging (as you were against it)? |
Sounds good! I'm fine with this. |
src/dataframerow/utils.jl
Outdated
# table columns are passed as a tuple of vectors to ensure type specialization | ||
rowhash(cols::Tuple{AbstractVector}, r::Int, h::UInt = zero(UInt))::UInt = | ||
hash(cols[1][r], h) | ||
function rowhash(cols::Tuple{Vararg{AbstractVector}}, r::Int, h::UInt = zero(UInt))::UInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is interesting that we do not have this covered by tests. I will add some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it turns out we do not use these functions. I have removed them. @nalimilan - could you please have a look if we would ever need them? (I have never used them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT one guy removed the last use of findrow
and group_rows
in a recent PR. :-D #2641
It's weird that coverage didn't notice this. Maybe we only looked at the changed code and not at unrelated parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were tests that checked correctness of internal functions.
ngroups, rhashes, gslots, sorted = | ||
row_group_slots(ntuple(i -> df[!, i], ncol(df)), Val(true), groups, false, false) | ||
rperm, starts, stops = compute_indices(groups, ngroups) | ||
return RowGroupDict(df, rhashes, gslots, groups, rperm, starts, stops) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RowGroupDict
can also be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed - I am running tests to double check. It is astonishing how much we have reworked internally in this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - all seems clean. I will merge the PR after CI passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this file contains only grouping code, we will be able to move it to the corresponding folder and rename it without touching anything else. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will move it in this PR as otherwise we will forget. Also nonunique
uses it, but I think it is not a problem.
Thank you! |
This allows
right? |
Right, and even more e.g.:
|
Fixes #2639.
As usual, before I add tests and update documentation please have a look at the implementation.
It would be good to have a decision on JuliaLang/julia#40142 before finalizing this PR.
In particular we fix a bug according to which two
DataFrameRow
weretrue
in==
if they were the same row from the same data frame even if they containedmissing
(missing
should be produced then).