-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix aliasing detection in sort! #2981
Conversation
Maybe one more note. By doing stripping of
is now an error (so now we disallow non-identical column aliasing in parent data frame - not only in the view; I think it is OK and safer). |
Good catch. So the problem mentioned in the description is due to this, right? I'm not sure whether that's intended or not in Base (I see a comment "# We cannot do any better than the usual dataids check" there). julia> x = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> Base.mightalias(view([1, 2, 3], x), view([1, 2, 3], x))
true The problem in your last comment seems to be more general, and not even specific to DataFrames: aliased vectors can give weird results in many occasions. Is |
In Base for views it checks not only that the data does not alias but also that the vector specifying subset of rows to be picked does not alias. So you have that two completely different vectors that share indexing vectors are considered to be aliased.
We do not define For On the other hand we could drop this aliasing check altogether as the problem can happen in super rare cases (someone puts two views of the same vector with different offset). This would save us doing the check that has quadratic cost in number of columns. |
Yes, ideally we would have a way to check only whether the contents alias, ignoring indices. But that's probably not easy.
We still have to check whether some columns are equal, but that's probably less costly to do. So maybe we could drop the |
This is what we essentially do now (as we are comparing parents)
This is what I will do and will make a note in release notes. |
@nalimilan - I have fixed the approach. Now we only consider columns passing |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@nalimilan - can you please have a look again at this PR before I merge it? I have marked several tests as broken (previously they threw an error because of more aggressive aliasing detection; now they pass producing a wrong result) |
Thank you! |
The problem in DataFrames.jl 1.3.1 is:
This is caused by incorrect aliasing detection rule.
This PR fixes this.