-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Levels, setindex!, copyto! and vcat #258
Comments
I had mixed opinions (and rewritten this comments several times, in particular I considered the rule But in the end I concluded Drop 2 should be recommended. The reasoning would be:
But frankly - I am not sure what is the best default. |
Yeah, that's really a difficult choice. The main problem with Drop 2 is what I have put in italics:
I hadn't realized this at first, but this means that even things like |
I know. But the other option is to keep 4. Can you remind me why it is bad? Maybe we should go back and and accept 4. Then probably we should deprecate The only drawback of |
What do you mean by "keep 4"? |
Ah - sorry. "keep 4" means - do not change how we handle property 4 (i.e. to keep " What I mean is to make it more convenient to "unwrap"
|
OK. Makes sense, but |
This was my fear. Let us then stick to |
The behavior of
setindex!
,copyto!
andvcat
regarding levels is tricky to get right as we have conflicting goals (below "level" also implies "orderedness"):copyto!(similar(x), x)
should have the same levels asx
(including unused levels)x2 = similar(x); foreach(i -> x[i] = x2[i], eachindex(x, x2))
should be equivalent tocopyto!
vcat(x, y)
should be equivalent toz = similar(x, length(x)+length(y)); copyto!(z, x); copyto!(z, length(x)+1, y, 1, length(y))
setindex!(x::CategoricalArray v::CategoricalValue, ...)
should only affect the assigned value and not add other levelsCurrently we ensure 1, 2* and 3, but not 4:
setindex
merges sourceCategoricalValue
levels with destination levels. This may be surprising or inconvenient, for example if you merge a variable with current occupation with another variable containing last occupation for the unemployed which also has extra levels like "Never had a job". Another behavior which may be weird is thatsetindex!
may insert new levels in the middle and not only at the end, which forces recomputing all reference codes and invalidates existingCategoricalValue
objects.If we wanted to ensure property 4 (
setindex!
), we would have to either:setindex!
on each entry would not copy unused levels, butcopyto!
would. This shouldn't be too problematic in practice, though it introduces an inconsistency. Note that for performance, we would probably also have to add levels at the end rather than trying to merge orders, to avoid recoding all references each time a new value is encountered. This would have the major drawback that levels would be in their order of appearance in the data.copyto!
would copy levels when the destination array has no levels, but otherwise only add levels for values that appear in the data. This would have the advantage thatsetindex!
could do the same, so that we ensure property 2.copyto!
would drop unused levels. In that case,vcat
should do the same to ensure 3, which sounds even more problematic.(* Since #253
copyto!
also set levels when copying zero elements, which differs from callingsetindex!
zero times. So there's already an inconsistency.)The text was updated successfully, but these errors were encountered: