diff --git a/docs/src/implementation.md b/docs/src/implementation.md index b429703f..c04d283f 100644 --- a/docs/src/implementation.md +++ b/docs/src/implementation.md @@ -2,25 +2,12 @@ `CategoricalArray` is made of the two fields: -- `refs`: an integer array that stores the position of the category level in the `index` field of `CategoricalPool` for each `CategoricalArray` element; `0` denotes a missing value (for `CategoricalArray{Union{T, Missing}}` only). +- `refs`: an integer array that stores the position of the category level in the `levels` field of `CategoricalPool` for each `CategoricalArray` element; `0` denotes a missing value (for `CategoricalArray{Union{T, Missing}}` only). - `pool`: the `CategoricalPool` object that maintains the levels of the array. -!!! warning +The `CategoricalPool{V,R,C}` type keeps track of the levels of type `V` and associates them with an integer reference code of type `R` (for internal use). It offers methods to add new levels, and efficiently get the integer index corresponding to a level and vice-versa. Whether the values of `CategoricalArray` are ordered or not is defined by an `ordered` field of the pool. Finally, `CategoricalPool{V,R,C}` keeps a `valindex` vector of value objects of type `C == CategoricalValue{V, R}`, so that `getindex` can return the existing object instead of allocating a new one. - Integer codes in the `x.refs` field *cannot* be used to index into the vector returned - by `levels(x)`. These codes refer to the position in the *index*, which can be accessed - using `CategoricalArrays.index(x.pool)`. That is, - `CategoricalArrays.index(x.pool)[x.refs] == x` always holds, but - `levels(x.pool)[x.refs] == x` is *not* correct in general. To obtain the position in - `levels(x)` of entries in `x`, use `CategoricalArrays.order(x.pool)[x.refs]`. - - The reason for this subtlety is that it allows changing the order of levels without - having to reset all the underlying integer codes. This is especially useful for the - `CategoricalArray(::AbstractArray)` constructor, which needs to assign new codes as - new levels are encountered, potentially conflicting with the default ordering of - levels (based on `sort`). - -The `CategoricalPool{V,R,C}` type keeps track of the levels of type `V` and associates them with an integer reference code of type `R` (for internal use). It offers methods to set the levels, change their order while preserving the references, and efficiently get the integer index corresponding to a level and vice-versa. Whether the values of `CategoricalArray` are ordered or not is defined by an `ordered` field of the pool. Finally, `CategoricalPool{V,R,C}` keeps a `valindex` vector of value objects of type `C == CategoricalValue{V, R}`, so that `getindex` can return the existing object instead of allocating a new one. +Do note that `CategoricalPool` levels are semi-mutable: it is only allowed to add new levels, but never to remove or reorder existing ones. This ensures existing `CategoricalValue` objects remain valid and always point to the same level as when they were created. Therefore, `CategoricalArray`s create a new pool each time some of their levels are removed or reordered. This happens when calling `levels!`, but also when assigning a `CategoricalValue` via `setindex!`, `push!`, `append!`, `copy!` or `copyto!` (as new levels may be added to the front to preserve relative order of both source and destination levels). Doing so requires updating all reference codes to point to the new pool, and makes it impossible to compare existing ordered `CategoricalValue` objects with values from the array using `<` and `>`. The type parameters of `CategoricalArray{T, N, R <: Integer, V, C, U}` are a bit complex: - `T` is the type of array elements without `CategoricalValue` wrappers; if `T >: Missing`, then the array supports missing values. @@ -32,6 +19,4 @@ The type parameters of `CategoricalArray{T, N, R <: Integer, V, C, U}` are a bit Only `T`, `N` and `R` could be specified upon construction. The last three parameters are chosen automatically, but are needed for the definition of the type. In particular, `U` allows expressing that `CategoricalArray{T, N}` inherits from `AbstractArray{Union{C, U}, N}` (which is equivalent to `AbstractArray{C, N}` for arrays which do not support missing values, and to `AbstractArray{Union{C, Missing}, N}` for those which support them). -The `CategoricalPool` type is designed to limit the need to go over all elements of the vector, either for reading or for writing. This is why unused levels are not dropped automatically (this would force checking all elements on every modification or keeping a counts table), but only when `droplevels!` is called. `levels` is a (very fast) O(1) operation since it merely returns the (ordered) vector of levels without accessing the data at all. - -Another useful feature is that integer indices referring to levels are preserved when adding or reordering levels: the order of levels exposed to the user by the `levels` function does not necessarily match these internal indices, which are stored in the `index` field of the pool. This means a reordering of the levels is also an O(1) operation. On the other hand, deleting levels may change the indices and therefore requires iterating over all elements in the array to update the references. +The `CategoricalPool` type is designed to limit the need to go over all elements of the vector, either for reading or for writing. This is why unused levels are not dropped automatically (this would force checking all elements on every modification or keeping a counts table), but only when `droplevels!` is called. `levels` is a (very fast) O(1) operation since it merely returns the (ordered) vector of levels without accessing the data at all. \ No newline at end of file diff --git a/docs/src/using.md b/docs/src/using.md index f6304589..d6e1a870 100644 --- a/docs/src/using.md +++ b/docs/src/using.md @@ -193,7 +193,116 @@ julia> levels!(y, ["Young", "Middle"]; allow_missing=true) ``` -## Working with categorical arrays +## Combining levels + +Some operations imply combining levels of two categorical arrays: this is the case when concatenating arrays (`vcat`, `hcat` and `cat`) and when assigning a `CategoricalValue` from another categorical array. + +For example, imagine we have two sets of observations, one with only the younger part of the population and one with the older part: +```jldoctest using +julia> x = categorical(["Middle", "Old", "Middle"], ordered=true); + +julia> y = categorical(["Young", "Middle", "Middle"], ordered=true); + +julia> levels!(y, ["Young", "Middle"]); +``` + +If we concatenate the two sets, the levels of the resulting categorical vector are chosen so that the relative orders of levels in `x` and `y` are preserved, if possible. In that case, comparisons with `<` and `>` are still valid, and resulting vector is marked as ordered: +```jldoctest +julia> xy = vcat(x, y) +6-element CategoricalArray{String,1,UInt32}: + "Middle" + "Old" + "Middle" + "Young" + "Middle" + "Middle" + +julia> levels(xy) +3-element Array{String,1}: + "Young" + "Middle" + "Old" + +julia> isordered(xy) +true +``` + +Likewise, assigning a `CategoricalValue` from `y` to an entry in `x` expands the levels of `x`, *adding a new level to the front to respect the ordering of levels in both vectors*. The new level is added even if the assigned value belongs to another level which is already present in `x`. Note that adding new levels requires marking `x` as unordered: +```jldoctest +julia> x[1] = y[1] +ERROR: cannot add new level Young since ordered pools cannot be extended implicitly. Use the levels! function to set new levels, or the ordered! function to mark the pool as unordered. +Stacktrace: +[...] + +julia> ordered!(x, false); + +julia> levels(x) +2-element Array{String,1}: + "Middle" + "Old" + +julia> x[1] = y[1] +CategoricalValue{String,UInt32} "Old" (3/3) + +julia> levels(x) +3-element Array{String,1}: + "Young" + "Middle" + "Old" +``` + +In cases where levels with incompatible orderings are combined, the ordering of the first array wins and the resulting array is marked as unordered: +```jldoctest using +julia> a = categorical(["a", "b", "c"], ordered=true); + +julia> b = categorical(["a", "b", "c"], ordered=true); + +julia> ab = vcat(a, b) +6-element CategoricalArray{String,1,UInt32}: + "a" + "b" + "c" + "a" + "b" + "c" + +julia> levels(ab) +3-element Array{String,1}: + "a" + "b" + "c" + +julia> isordered(ab) +true + +julia> levels!(b, ["c", "b", "a"]) +3-element CategoricalArray{String,1,UInt32}: + "a" + "b" + "c" + +julia> ab2 = vcat(a, b) +6-element CategoricalArray{String,1,UInt32}: + "a" + "b" + "c" + "a" + "b" + "c" + +julia> levels(ab2) +3-element Array{String,1}: + "a" + "b" + "c" + +julia> isordered(ab2) +false +``` + +Do note that in some cases the two sets of levels may have compatible orderings, but it is not possible to determine in what order should levels appear in the merged set. This is the case for example with `["a, "b", "d"]` and `["c", "d", "e"]`: there is no way to detect that `"c"` should be inserted exactly after `"b"` (lexicographic ordering is not relevant here). In such cases, the resulting array is marked as unordered. This situation can only happen when working with data subsets selected based on non-contiguous subsets of levels. + +## Exported functions `categorical(A)` - Construct a categorical array with values from `A` diff --git a/src/CategoricalArrays.jl b/src/CategoricalArrays.jl index b232f19f..d5e52743 100644 --- a/src/CategoricalArrays.jl +++ b/src/CategoricalArrays.jl @@ -15,8 +15,6 @@ module CategoricalArrays include("typedefs.jl") - include("buildfields.jl") - include("pool.jl") include("value.jl") diff --git a/src/array.jl b/src/array.jl index 36e1c151..f9a015e9 100644 --- a/src/array.jl +++ b/src/array.jl @@ -248,7 +248,7 @@ function convert(::Type{CategoricalArray{T, N, R}}, A::AbstractArray{S, N}) wher # if order is defined for level type, automatically apply it L = leveltype(res) if hasmethod(isless, Tuple{L, L}) - levels!(res.pool, sort(levels(res.pool))) + levels!(res, sort(levels(res))) end res @@ -331,29 +331,80 @@ end size(A::CategoricalArray) = size(A.refs) Base.IndexStyle(::Type{<:CategoricalArray}) = IndexLinear() +function update_refs!(A::CategoricalArray, newlevels::AbstractVector) + oldlevels = levels(A) + levelsmap = similar(A.refs, length(oldlevels)+1) + # 0 maps to a missing value + levelsmap[1] = 0 + levelsmap[2:end] .= something.(indexin(oldlevels, newlevels), 0) + + refs = A.refs + @inbounds for (i, x) in enumerate(refs) + refs[i] = levelsmap[x+1] + end + A +end + +function merge_pools!(A::CatArrOrSub, + B::Union{CategoricalValue, CatArrOrSub}; + updaterefs::Bool=true) + if isordered(A) && length(pool(A)) > 0 && pool(B) ⊈ pool(A) + lev = A isa CategoricalValue ? get(B) : first(setdiff(levels(B), levels(A))) + throw(OrderedLevelsException(lev, levels(A))) + end + newpool = merge_pools(pool(A), pool(B)) + oldlevels = levels(A) + newlevels = levels(newpool) + ordered = isordered(newpool) + if isordered(A) != ordered + A isa SubArray && + throw(ArgumentError("cannot set ordered=$ordered on dest SubArray as it " * + "would affect the parent. "* + "Found when trying to set levels to $newlevels.")) + ordered!(A, ordered) + end + pA = A isa SubArray ? parent(A) : A + # If A's levels are an ordered superset of new (merged) pool, no need to recompute refs + if updaterefs && + (length(newlevels) < length(oldlevels) || + view(newlevels, 1:length(oldlevels)) != oldlevels) + update_refs!(pA, newlevels) + end + pA.pool = newpool + A +end + @inline function setindex!(A::CategoricalArray, v::Any, I::Real...) @boundscheck checkbounds(A, I...) + # TODO: use a global table to cache subset relations for all pairs of pools + if v isa CategoricalValue && pool(v) !== pool(A) && pool(v) ⊈ pool(A) + merge_pools!(A, v) + end @inbounds A.refs[I...] = get!(A.pool, v) end -Base.fill!(A::CategoricalArray, v::Any) = - (fill!(A.refs, get!(A.pool, v)); A) +function Base.fill!(A::CategoricalArray, v::Any) + # TODO: use a global table to cache subset relations for all pairs of pools + if v isa CategoricalValue && pool(v) !== pool(A) && pool(v) ⊈ pool(A) + merge_pools!(A, v, updaterefs=false) + end + fill!(A.refs, get!(A.pool, v)) + A +end # Methods preserving levels and more efficient than AbstractArray fallbacks copy(A::CategoricalArray{T, N}) where {T, N} = CategoricalArray{T, N}(copy(A.refs), copy(A.pool)) -CatArrOrSub{T, N} = Union{CategoricalArray{T, N}, - SubArray{<:Any, N, <:CategoricalArray{T}}} where {T, N} - -function copyto!(dest::CatArrOrSub{T, N}, dstart::Integer, +function copyto!(dest::CatArrOrSub{T, N, R}, dstart::Integer, src::CatArrOrSub{<:Any, N}, sstart::Integer, - n::Integer) where {T, N} - n == 0 && return dest + n::Integer) where {T, N, R} n < 0 && throw(ArgumentError(string("tried to copy n=", n, " elements, but n should be nonnegative"))) destinds, srcinds = LinearIndices(dest), LinearIndices(src) - (dstart ∈ destinds && dstart+n-1 ∈ destinds) || throw(BoundsError(dest, dstart:dstart+n-1)) - (sstart ∈ srcinds && sstart+n-1 ∈ srcinds) || throw(BoundsError(src, sstart:sstart+n-1)) + if n > 0 + (dstart ∈ destinds && dstart+n-1 ∈ destinds) || throw(BoundsError(dest, dstart:dstart+n-1)) + (sstart ∈ srcinds && sstart+n-1 ∈ srcinds) || throw(BoundsError(src, sstart:sstart+n-1)) + end drefs = refs(dest) srefs = refs(src) @@ -368,44 +419,30 @@ function copyto!(dest::CatArrOrSub{T, N}, dstart::Integer, throw(MissingException("cannot copy array with missing values to an array with element type $T")) end - newlevels, ordered = mergelevels(isordered(dest), dlevs, slevs) - if isordered(dest) && (length(newlevels) != length(dlevs)) - # Uncomment this when removing deprecation - # throw(OrderedLevelsException(newlevels[findfirst(!in(Set(dlevs)), newlevels)], - # dlevs)) - Base.depwarn("adding new levels to ordered CategoricalArray destination " * - "will throw an error in the future", :copyto!) - ordered &= isordered(src) | (length(newlevels) == length(dlevs)) - end - # Exception: empty pool marked as ordered if new value is ordered - if isempty(dlevs) && isordered(src) - ordered = true - end - if ordered != isordered(dest) - isa(dest, SubArray) && throw(ArgumentError("cannot set ordered=$ordered on dest SubArray as it would affect the parent. Found when trying to set levels to $newlevels.")) - ordered!(dest, ordered) - end + destp = dest isa SubArray ? parent(dest) : dest - # Simple case: replace all values - if !isa(dest, SubArray) && dstart == dstart == 1 && n == length(dest) == length(src) - # Set index to reflect refs - levels!(dpool, T[]) # Needed in case src and dest share some levels - levels!(dpool, index(spool)) - - # Set final levels in their visible order - levels!(dpool, newlevels) + # For partial copy, need to recompute existing refs + # TODO: for performance, avoid ajusting refs which are going to be overwritten + updaterefs = isa(dest, SubArray) || !(n == length(dest) == length(src)) + newpool = merge_pools!(dest, src, updaterefs=updaterefs) + newlevels = levels(newpool) + # If destination levels are an ordered superset of source, no need to recompute refs + if length(dlevs) >= length(slevs) && view(dlevs, 1:length(slevs)) == slevs + newlevels != dlevs && levels!(dpool, newlevels) copyto!(drefs, srefs) - else # More work to do: preserve some values (and therefore index) - levels!(dpool, newlevels) - - indexmap = indexin(index(spool), index(dpool)) + else # Otherwise, recompute refs according to new levels + # Then adjust refs from source + levelsmap = similar(drefs, length(slevs)+1) + # 0 maps to a missing value + levelsmap[1] = 0 + levelsmap[2:end] = indexin(slevs, newlevels) @inbounds for i = 0:(n-1) x = srefs[sstart+i] - drefs[dstart+i] = x > 0 ? indexmap[x] : 0 + drefs[dstart+i] = levelsmap[x+1] end - + destp.pool = CategoricalPool{nonmissingtype(T), R}(newlevels, isordered(newpool)) end dest @@ -479,7 +516,7 @@ While this will reduce memory use, this function is type-unstable, which can aff performance inside the function where the call is made. Therefore, use it with caution. """ function compress(A::CategoricalArray{T, N}) where {T, N} - R = reftype(length(index(A.pool))) + R = reftype(length(levels(A.pool))) convert(CategoricalArray{T, N, R}, A) end @@ -501,7 +538,7 @@ function vcat(A::CategoricalArray...) newlevels, ordered = mergelevels(ordered, map(levels, A)...) refsvec = map(A) do a - ii = convert(Vector{Int}, indexin(index(a.pool), newlevels)) + ii = convert(Vector{Int}, indexin(levels(a.pool), newlevels)) [x==0 ? 0 : ii[x] for x in a.refs]::Array{Int,ndims(a)} end @@ -552,42 +589,39 @@ If `A` accepts missing values (i.e. `eltype(A) >: Missing`) and `allow_missing=t entries corresponding to omitted levels will be set to `missing`. Else, `newlevels` must include all levels which appear in the data. """ -function levels!(A::CategoricalArray{T}, newlevels::Vector; allow_missing=false) where {T} +function levels!(A::CategoricalArray{T, N, R}, newlevels::Vector; allow_missing=false) where {T, N, R} if !allunique(newlevels) throw(ArgumentError(string("duplicated levels found: ", join(unique(filter(x->sum(newlevels.==x)>1, newlevels)), ", ")))) end + oldlevels = levels(A.pool) + # first pass to check whether, if some levels are removed, changes can be applied without error # TODO: save original levels and undo changes in case of error to skip this step # equivalent to issubset but faster due to JuliaLang/julia#24624 - if !isempty(setdiff(index(A.pool), newlevels)) - deleted = [!(l in newlevels) for l in index(A.pool)] + if !isempty(setdiff(oldlevels, newlevels)) + deleted = [!(l in newlevels) for l in oldlevels] @inbounds for (i, x) in enumerate(A.refs) if T >: Missing !allow_missing && x > 0 && deleted[x] && - throw(ArgumentError("cannot remove level $(repr(index(A.pool)[x])) as it is used at position $i and allow_missing=false.")) + throw(ArgumentError("cannot remove level $(repr(oldlevels[x])) as it " * + "is used at position $i and allow_missing=false.")) else deleted[x] && - throw(ArgumentError("cannot remove level $(repr(index(A.pool)[x])) as it is used at position $i. " * - "Change the array element type to Union{$T, Missing} using convert if you want to transform some levels to missing values.")) + throw(ArgumentError("cannot remove level $(repr(levels(A.pool)[x])) as it " * + "is used at position $i. Change the array element " * + "type to Union{$T, Missing} using convert if you want " * + "to transform some levels to missing values.")) end end end - # actually apply changes - oldindex = copy(index(A.pool)) - levels!(A.pool, newlevels) - - if index(A.pool) != oldindex - levelsmap = similar(A.refs, length(oldindex)+1) - # 0 maps to a missing value - levelsmap[1] = 0 - levelsmap[2:end] .= something.(indexin(oldindex, index(A.pool)), 0) - - @inbounds for (i, x) in enumerate(A.refs) - A.refs[i] = levelsmap[x+1] - end + # replace the pool and recode refs to reflect new pool + if newlevels != oldlevels + newpool = CategoricalPool{nonmissingtype(T), R}(newlevels, isordered(A.pool)) + update_refs!(A, newlevels) + A.pool = newpool end A @@ -596,7 +630,7 @@ end function _unique(::Type{S}, refs::AbstractArray{T}, pool::CategoricalPool) where {S, T<:Integer} - nlevels = length(index(pool)) + 1 + nlevels = length(levels(pool)) + 1 order = fill(0, nlevels) # 0 indicates not seen # If we don't track missings, short-circuit even if none has been seen count = S >: Missing ? 0 : 1 @@ -607,7 +641,7 @@ function _unique(::Type{S}, count == nlevels && break end end - S[i == 1 ? missing : index(pool)[i - 1] for i in sortperm(order) if order[i] != 0] + S[i == 1 ? missing : levels(pool)[i - 1] for i in sortperm(order) if order[i] != 0] end """ @@ -662,14 +696,22 @@ function Base.resize!(A::CategoricalVector, n::Integer) A end -function Base.push!(A::CategoricalVector, item) - r = get!(A.pool, item) +function Base.push!(A::CategoricalVector, v::Any) + # TODO: use a global table to cache subset relations for all pairs of pools + if v isa CategoricalValue && pool(v) !== pool(A) && pool(v) ⊈ pool(A) + merge_pools!(A, v) + end + r = get!(A.pool, v) push!(A.refs, r) A end function Base.append!(A::CategoricalVector, B::CatArrOrSub) - levels!(A, union(levels(A), levels(B))) + # TODO: use a global table to cache subset relations for all pairs of pools + if pool(B) !== pool(A) && pool(B) ⊈ pool(A) + merge_pools!(A, B) + end + # TODO: optimize recoding len = length(A) len2 = length(B) resize!(A.refs, len + len2) @@ -732,7 +774,7 @@ function in(x::CategoricalValue, y::CategoricalArray{T, N, R}) where {T, N, R} if x.pool === y.pool return x.level in y.refs else - ref = get(y.pool, index(x.pool)[x.level], zero(R)) + ref = get(y.pool, levels(x.pool)[x.level], zero(R)) return ref != 0 ? ref in y.refs : false end end @@ -776,11 +818,10 @@ Base.Broadcast.broadcasted(::typeof(!ismissing), A::CategoricalArray{T}) where { Base.Broadcast.broadcasted(_ -> true, A.refs) function Base.Broadcast.broadcasted(::typeof(levelcode), A::CategoricalArray{T}) where {T} - ord = order(A.pool) if T >: Missing - Base.Broadcast.broadcasted(i -> i > 0 ? Signed(widen(ord[i])) : missing, A.refs) + Base.Broadcast.broadcasted(r -> r > 0 ? Signed(widen(r)) : missing, A.refs) else - Base.Broadcast.broadcasted(i -> Signed(widen(ord[i])), A.refs) + Base.Broadcast.broadcasted(r -> Signed(widen(r)), A.refs) end end @@ -806,9 +847,10 @@ function Base.sort!(v::CategoricalVector; perm = sortperm(view(index, seen), order=ord) nzcounts = counts[seen] j = 0 + refs = v.refs @inbounds for ref in perm tmpj = j + nzcounts[ref] - v.refs[(j+1):tmpj] .= ref - anymissing + refs[(j+1):tmpj] .= ref - anymissing j = tmpj end diff --git a/src/buildfields.jl b/src/buildfields.jl deleted file mode 100644 index 598f6184..00000000 --- a/src/buildfields.jl +++ /dev/null @@ -1,42 +0,0 @@ -function buildindex(invindex::Dict{S, R}) where {S, R <: Integer} - index = Vector{S}(undef, length(invindex)) - for (v, i) in invindex - index[i] = v - end - return index -end - -function buildinvindex(index::Vector{T}, ::Type{R}=DefaultRefType) where {T, R} - if length(index) > typemax(R) - throw(LevelsException{T, R}(index[typemax(R)+1:end])) - end - - invindex = Dict{T, R}() - for (i, v) in enumerate(index) - invindex[v] = i - end - return invindex -end - -function buildvalues!(pool::CategoricalPool) - resize!(pool.valindex, length(levels(pool))) - for i in eachindex(pool.valindex) - v = CategoricalValue(i, pool) - @inbounds pool.valindex[i] = v - end - return pool.valindex -end - -function buildorder!(order::Array{R}, - invindex::Dict{S, R}, - levels::Vector{S}) where {S, R <: Integer} - for (i, v) in enumerate(levels) - order[invindex[convert(S, v)]] = i - end - return order -end - -function buildorder(invindex::Dict{S, R}, levels::Vector) where {S, R <: Integer} - order = Vector{R}(undef, length(invindex)) - return buildorder!(order, invindex, levels) -end diff --git a/src/deprecated.jl b/src/deprecated.jl index bcec24c5..90faf28b 100644 --- a/src/deprecated.jl +++ b/src/deprecated.jl @@ -125,3 +125,6 @@ import Unicode: normalize, graphemes @deprecate findfirst(needle::Base.Fix2, haystack::CategoricalValue{String}) findfirst(needle, String(haystack)) @deprecate findlast(needle::Base.Fix2, haystack::CategoricalValue{String}) findlast(needle, String(haystack)) @deprecate replace(x::CategoricalValue{String}, old_new::Pair...; kwargs...) replace(String(x), old_new...; kwargs...) + +@deprecate index(pool::CategoricalPool) levels(pool) false +@deprecate order(pool::CategoricalPool) 1:length(levels(pool)) false \ No newline at end of file diff --git a/src/pool.jl b/src/pool.jl index f406a45c..d9d3183e 100644 --- a/src/pool.jl +++ b/src/pool.jl @@ -1,96 +1,35 @@ -function CategoricalPool{T, R, V}(index::Vector{T}, - invindex::Dict{T, R}, - order::Vector{R}, - ordered::Bool) where {T, R, V} - levels = similar(index) - levels[order] = index - pool = CategoricalPool{T, R, V}(index, invindex, order, levels, V[], ordered) - buildvalues!(pool) - return pool -end - -function CategoricalPool(index::Vector{S}, - invindex::Dict{S, T}, - order::Vector{R}, - ordered::Bool=false) where {S, T <: Integer, R <: Integer} - invindex = convert(Dict{S, R}, invindex) - V = CategoricalValue{S, R} - CategoricalPool{S, R, V}(index, invindex, order, ordered) -end - CategoricalPool{T, R, V}(ordered::Bool=false) where {T, R, V} = - CategoricalPool{T, R, V}(T[], Dict{T, R}(), R[], ordered) + CategoricalPool{T, R, V}(T[], ordered) CategoricalPool{T, R}(ordered::Bool=false) where {T, R} = - CategoricalPool(T[], Dict{T, R}(), R[], ordered) + CategoricalPool{T, R}(T[], ordered) CategoricalPool{T}(ordered::Bool=false) where {T} = - CategoricalPool{T, DefaultRefType}(ordered) - -function CategoricalPool{T, R}(index::Vector, - ordered::Bool=false) where {T, R} - invindex = buildinvindex(index, R) - order = Vector{R}(1:length(index)) - CategoricalPool(index, invindex, order, ordered) -end - -function CategoricalPool(index::Vector, ordered::Bool=false) - invindex = buildinvindex(index) - order = Vector{DefaultRefType}(1:length(index)) - return CategoricalPool(index, invindex, order, ordered) -end - -function CategoricalPool(invindex::Dict{S, R}, - ordered::Bool=false) where {S, R <: Integer} - index = buildindex(invindex) - order = Vector{DefaultRefType}(1:length(index)) - return CategoricalPool(index, invindex, order, ordered) -end - -# TODO: Add tests for this -function CategoricalPool(index::Vector{S}, - invindex::Dict{S, R}, - ordered::Bool=false) where {S, R <: Integer} - order = Vector{DefaultRefType}(1:length(index)) - return CategoricalPool(index, invindex, order, ordered) -end + CategoricalPool{T, DefaultRefType}(T[], ordered) -function CategoricalPool(index::Vector{T}, - levels::Vector{T}, - ordered::Bool=false) where {T} - invindex = buildinvindex(index) - order = buildorder(invindex, levels) - return CategoricalPool(index, invindex, order, ordered) -end +CategoricalPool{T, R}(levels::Vector, ordered::Bool=false) where {T, R} = + CategoricalPool{T, R, CategoricalValue{T, R}}(convert(Vector{T}, levels), ordered) +CategoricalPool(levels::Vector{T}, ordered::Bool=false) where {T} = + CategoricalPool{T, DefaultRefType}(convert(Vector{T}, levels), ordered) -function CategoricalPool(invindex::Dict{S, R}, - levels::Vector{S}, - ordered::Bool=false) where {S, R <: Integer} - index = buildindex(invindex) - order = buildorder(invindex, levels) - return CategoricalPool(index, invindex, order, ordered) -end +CategoricalPool(invindex::Dict{T, R}, ordered::Bool=false) where {T, R <: Integer} = + CategoricalPool{T, R, CategoricalValue{T, R}}(invindex, ordered) Base.convert(::Type{T}, pool::T) where {T <: CategoricalPool} = pool Base.convert(::Type{CategoricalPool{S}}, pool::CategoricalPool{T, R}) where {S, T, R <: Integer} = convert(CategoricalPool{S, R}, pool) -function Base.convert(::Type{CategoricalPool{S, R}}, pool::CategoricalPool) where {S, R <: Integer} +function Base.convert(::Type{CategoricalPool{T, R}}, pool::CategoricalPool) where {T, R <: Integer} if length(levels(pool)) > typemax(R) - throw(LevelsException{S, R}(levels(pool)[typemax(R)+1:end])) + throw(LevelsException{T, R}(levels(pool)[typemax(R)+1:end])) end - indexS = convert(Vector{S}, pool.index) - invindexS = convert(Dict{S, R}, pool.invindex) - order = convert(Vector{R}, pool.order) - return CategoricalPool(indexS, invindexS, order, pool.ordered) + levelsT = convert(Vector{T}, pool.levels) + invindexT = convert(Dict{T, R}, pool.invindex) + return CategoricalPool{T, R, CategoricalValue{T, R}}(levelsT, invindexT, pool.ordered) end -function Base.copy(pool::CategoricalPool{T, R, V}) where {T, R, V} - newpool = CategoricalPool{T, R, V}(copy(pool.index), copy(pool.invindex), copy(pool.order), - copy(pool.levels), similar(pool.valindex), pool.ordered) - buildvalues!(newpool) # With a plain copy values would refer to the old pool - newpool -end +Base.copy(pool::CategoricalPool{T, R, V}) where {T, R, V} = + CategoricalPool{T, R, V}(copy(pool.levels), copy(pool.invindex), pool.ordered) function Base.show(io::IO, pool::CategoricalPool{T, R}) where {T, R} @printf(io, "%s{%s,%s}([%s])", typeof(pool).name, T, R, @@ -99,7 +38,7 @@ function Base.show(io::IO, pool::CategoricalPool{T, R}) where {T, R} pool.ordered && print(io, " with ordered levels") end -Base.length(pool::CategoricalPool) = length(pool.index) +Base.length(pool::CategoricalPool) = length(pool.levels) Base.getindex(pool::CategoricalPool, i::Integer) = pool.valindex[i] Base.get(pool::CategoricalPool, level::Any) = pool.invindex[level] @@ -117,8 +56,6 @@ avoid doing a dict lookup twice end i = R(n + 1) - push!(pool.index, x) - push!(pool.order, i) push!(pool.levels, x) push!(pool.valindex, CategoricalValue(i, pool)) i @@ -194,9 +131,8 @@ end @inline function Base.get!(pool::CategoricalPool, level::CategoricalValue) pool === level.pool && return level.level - # Use invindex for O(1) lookup - # TODO: use a global table to cache this information for all pairs of pools - if level.pool.levels ⊈ keys(pool.invindex) + # TODO: use a global table to cache subset relations for all pairs of pools + if level.pool ⊈ pool if isordered(pool) throw(OrderedLevelsException(level, pool.levels)) end @@ -225,33 +161,36 @@ function Base.append!(pool::CategoricalPool, levels) return pool end -function Base.delete!(pool::CategoricalPool{S}, levels...) where S - for level in levels - levelS = convert(S, level) - if haskey(pool.invindex, levelS) - ind = pool.invindex[levelS] - delete!(pool.invindex, levelS) - splice!(pool.index, ind) - ord = splice!(pool.order, ind) - splice!(pool.levels, ord) - splice!(pool.valindex, ind) - for i in ind:length(pool) - pool.invindex[pool.index[i]] -= 1 - pool.valindex[i] = CategoricalValue(i, pool) - end - for i in 1:length(pool) - pool.order[i] > ord && (pool.order[i] -= 1) - end - end +# Do not override Base.merge as for internal use we need to use the type and orderedness +# of the first pool rather than promoting both pools +function merge_pools(a::CategoricalPool{T, R}, b::CategoricalPool) where {T, R} + if length(a) == 0 && length(b) == 0 + newlevs = T[] + ordered = isordered(a) + elseif length(a) == 0 + newlevs = Vector{T}(levels(b)) + ordered = isordered(b) + elseif length(b) == 0 + newlevs = copy(levels(a)) + ordered = isordered(a) + else + nl, ordered = mergelevels(isordered(a), a.levels, b.levels) + newlevs = convert(Vector{T}, nl) end - return pool + CategoricalPool{T, R}(newlevs, ordered) end +Base.issubset(a::CategoricalPool, b::CategoricalPool) = issubset(a.levels, keys(b.invindex)) + +# Contrary to the CategoricalArray one, this method only allows adding new levels at the end +# so that existing CategoricalValue objects still point to the same value function levels!(pool::CategoricalPool{S, R}, newlevels::Vector) where {S, R} levs = convert(Vector{S}, newlevels) if !allunique(levs) throw(ArgumentError(string("duplicated levels found in levs: ", join(unique(filter(x->sum(levs.==x)>1, levs)), ", ")))) + elseif length(levs) < length(pool) || view(levs, 1:length(pool)) != pool.levels + throw(ArgumentError("removing or reordering levels of existing CategoricalPool is not allowed")) end n = length(levs) @@ -260,34 +199,20 @@ function levels!(pool::CategoricalPool{S, R}, newlevels::Vector) where {S, R} throw(LevelsException{S, R}(setdiff(levs, levels(pool))[typemax(R)-length(levels(pool))+1:end])) end - # No deletions: can preserve position of existing levels - # equivalent to issubset but faster due to JuliaLang/julia#24624 - if isempty(setdiff(pool.index, levs)) - append!(pool, setdiff(levs, pool.index)) - else - empty!(pool.invindex) - resize!(pool.index, n) - resize!(pool.valindex, n) - resize!(pool.order, n) - resize!(pool.levels, n) - for i in 1:n - v = levs[i] - pool.index[i] = v - pool.invindex[v] = i - pool.valindex[i] = CategoricalValue(i, pool) - end + empty!(pool.invindex) + resize!(pool.levels, n) + resize!(pool.valindex, n) + for i in 1:n + v = levs[i] + pool.levels[i] = v + pool.invindex[v] = i + pool.valindex[i] = CategoricalValue(i, pool) end - buildorder!(pool.order, pool.invindex, levs) - for (i, x) in enumerate(pool.order) - pool.levels[x] = pool.index[i] - end return pool end -index(pool::CategoricalPool) = pool.index DataAPI.levels(pool::CategoricalPool) = pool.levels -order(pool::CategoricalPool) = pool.order isordered(pool::CategoricalPool) = pool.ordered ordered!(pool::CategoricalPool, ordered) = (pool.ordered = ordered; pool) diff --git a/src/recode.jl b/src/recode.jl index 7e48ead1..647a2b53 100644 --- a/src/recode.jl +++ b/src/recode.jl @@ -140,9 +140,11 @@ function recode!(dest::CategoricalArray{T}, src::AbstractArray, default::Any, pa dest end -function recode!(dest::CategoricalArray{T}, src::CategoricalArray, default::Any, pairs::Pair...) where {T} +function recode!(dest::CategoricalArray{T, N, R}, src::CategoricalArray, + default::Any, pairs::Pair...) where {T, N, R<:Integer} if length(dest) != length(src) - throw(DimensionMismatch("dest and src must be of the same length (got $(length(dest)) and $(length(src)))")) + throw(DimensionMismatch("dest and src must be of the same length " * + "(got $(length(dest)) and $(length(src)))")) end vals = T[p.second for p in pairs] @@ -175,20 +177,24 @@ function recode!(dest::CategoricalArray{T}, src::CategoricalArray, default::Any, ordered = false end - srcindex = src.pool === dest.pool ? copy(index(src.pool)) : index(src.pool) - levels!(dest.pool, levs) + srclevels = src.pool === dest.pool ? copy(levels(src.pool)) : levels(src.pool) + if length(levs) > length(srclevels) && view(levs, 1:length(srclevels)) == srclevels + levels!(dest.pool, levs) + else + dest.pool = CategoricalPool{nonmissingtype(T), R}(levs, isordered(dest)) + end drefs = dest.refs srefs = src.refs - origmap = [get(dest.pool, v, 0) for v in srcindex] - indexmap = Vector{DefaultRefType}(undef, length(srcindex)+1) + origmap = [get(dest.pool, v, 0) for v in srclevels] + levelsmap = Vector{DefaultRefType}(undef, length(srclevels)+1) # For missing values (0 if no missing in pairs' keys) - indexmap[1] = 0 + levelsmap[1] = 0 for p in pairs if ((isa(p.first, Union{AbstractArray, Tuple}) && any(ismissing, p.first)) || ismissing(p.first)) - indexmap[1] = get(dest.pool, p.second) + levelsmap[1] = get(dest.pool, p.second) break end end @@ -197,28 +203,28 @@ function recode!(dest::CategoricalArray{T}, src::CategoricalArray, default::Any, ordered && (ordered = issorted(pairmap)) ordered!(dest, ordered) defaultref = default === nothing || ismissing(default) ? 0 : get(dest.pool, default) - @inbounds for (i, l) in enumerate(srcindex) + @inbounds for (i, l) in enumerate(srclevels) for j in 1:length(pairs) p = pairs[j] if ((isa(p.first, Union{AbstractArray, Tuple}) && any(l ≅ y for y in p.first)) || l ≅ p.first) - indexmap[i+1] = pairmap[j] + levelsmap[i+1] = pairmap[j] @goto nextitem end end # Value not in any of the pairs if default === nothing - indexmap[i+1] = origmap[i] + levelsmap[i+1] = origmap[i] else - indexmap[i+1] = defaultref + levelsmap[i+1] = defaultref end @label nextitem end @inbounds for i in eachindex(drefs) - v = indexmap[srefs[i]+1] + v = levelsmap[srefs[i]+1] if !(eltype(dest) >: Missing) v > 0 || throw(MissingException("missing value found, but dest does not support them: " * "recode them to a supported value")) diff --git a/src/subarray.jl b/src/subarray.jl index 15f741f4..16ff1b5a 100644 --- a/src/subarray.jl +++ b/src/subarray.jl @@ -4,12 +4,12 @@ DataAPI.levels(sa::SubArray{T,N,P}) where {T,N,P<:CategoricalArray} = levels(par isordered(sa::SubArray{T,N,P}) where {T,N,P<:CategoricalArray} = isordered(parent(sa)) # This method cannot support allow_missing=true since that would modify the parent levels!(sa::SubArray{T,N,P}, newlevels::Vector) where {T,N,P<:CategoricalArray} = - levels!(parent(sa), levels) + levels!(parent(sa), newlevels) function unique(sa::SubArray{T,N,P}) where {T,N,P<:CategoricalArray} A = parent(sa) refs = view(A.refs, sa.indices...) - S = eltype(P) >: Missing ? Union{eltype(index(A.pool)), Missing} : eltype(index(A.pool)) + S = eltype(P) >: Missing ? Union{eltype(levels(A.pool)), Missing} : eltype(levels(A.pool)) _unique(S, refs, A.pool) end diff --git a/src/typedefs.jl b/src/typedefs.jl index 00114fb1..9bdb2590 100644 --- a/src/typedefs.jl +++ b/src/typedefs.jl @@ -7,18 +7,40 @@ const DefaultRefType = UInt32 # * `R` integer type for referencing category levels # * `V` categorical value type mutable struct CategoricalPool{T, R <: Integer, V} - index::Vector{T} # category levels ordered by their reference codes + levels::Vector{T} # category levels ordered by their reference codes invindex::Dict{T, R} # map from category levels to their reference codes - order::Vector{R} # 1-to-1 map from `index` to `level` (position of i-th category in `levels`) - levels::Vector{T} # category levels ordered by externally specified order valindex::Vector{V} # "category value" objects 1-to-1 matching `index` ordered::Bool - function CategoricalPool{T, R, V}(index::Vector{T}, + function CategoricalPool{T, R, V}(levels::Vector{T}, + ordered::Bool) where {T, R, V} + if length(levels) > typemax(R) + throw(LevelsException{T, R}(levels[Int(typemax(R))+1:end])) + end + invindex = Dict{T, R}(v => i for (i, v) in enumerate(levels)) + if length(invindex) != length(levels) + throw(ArgumentError("Duplicate entries are not allowed in levels")) + end + CategoricalPool{T, R, V}(levels, invindex, ordered) + end + function CategoricalPool{T, R, V}(invindex::Dict{T, R}, + ordered::Bool) where {T, R, V} + levels = Vector{T}(undef, length(invindex)) + # If invindex contains non consecutive values, a BoundsError will be thrown + try + for (k, v) in invindex + levels[v] = k + end + catch BoundsError + throw(ArgumentError("Reference codes must be in 1:length(invindex)")) + end + if length(invindex) > typemax(R) + throw(LevelsException{T, R}(levels[typemax(R)+1:end])) + end + CategoricalPool{T, R, V}(levels, invindex, ordered) + end + function CategoricalPool{T, R, V}(levels::Vector{T}, invindex::Dict{T, R}, - order::Vector{R}, - levels::Vector{T}, - valindex::Vector{V}, ordered::Bool) where {T, R, V} if T <: CategoricalValue && T !== Union{} throw(ArgumentError("Level type $T cannot be a categorical value type")) @@ -26,13 +48,13 @@ mutable struct CategoricalPool{T, R <: Integer, V} if !(V <: CategoricalValue) throw(ArgumentError("Type $V is not a categorical value type")) end - if leveltype(V) !== T - throw(ArgumentError("Level type of the categorical value ($(leveltype(V))) and of the pool ($T) do not match")) + if V !== CategoricalValue{T, R} + throw(ArgumentError("V must be CategoricalValue{T, R}")) end - if reftype(V) !== R - throw(ArgumentError("Reference type of the categorical value ($(reftype(V))) and of the pool ($R) do not match")) - end - new(index, invindex, order, levels, valindex, ordered) + valindex = Vector{V}(undef, length(levels)) + pool = new(levels, invindex, valindex, ordered) + pool.valindex .= CategoricalValue.(1:length(levels), Ref(pool)) + return pool end end @@ -78,7 +100,7 @@ abstract type AbstractCategoricalArray{T, N, R, V, C, U} <: AbstractArray{Union{ const AbstractCategoricalVector{T, R, V, C, U} = AbstractCategoricalArray{T, 1, R, V, C, U} const AbstractCategoricalMatrix{T, R, V, C, U} = AbstractCategoricalArray{T, 2, R, V, C, U} -struct CategoricalArray{T, N, R <: Integer, V, C, U} <: AbstractCategoricalArray{T, N, R, V, C, U} +mutable struct CategoricalArray{T, N, R <: Integer, V, C, U} <: AbstractCategoricalArray{T, N, R, V, C, U} refs::Array{R, N} pool::CategoricalPool{V, R, C} @@ -92,3 +114,7 @@ struct CategoricalArray{T, N, R <: Integer, V, C, U} <: AbstractCategoricalArray end const CategoricalVector{T, R, V, C, U} = CategoricalArray{T, 1, V, C, U} const CategoricalMatrix{T, R, V, C, U} = CategoricalArray{T, 2, V, C, U} + +CatArrOrSub{T, N, R} = Union{CategoricalArray{T, N, R}, + SubArray{<:Any, N, <:CategoricalArray{T, <:Any, R}}} where + {T, N, R<:Integer} \ No newline at end of file diff --git a/src/value.jl b/src/value.jl index 7e25790a..d6569fdc 100644 --- a/src/value.jl +++ b/src/value.jl @@ -22,7 +22,7 @@ unwrap_catvaluetype(::Type{Union{}}) = Union{} # prevent incorrect dispatch to T unwrap_catvaluetype(::Type{Any}) = Any # prevent recursion in T>:Missing method unwrap_catvaluetype(::Type{T}) where {T <: CategoricalValue} = leveltype(T) -Base.get(x::CategoricalValue) = index(pool(x))[level(x)] +Base.get(x::CategoricalValue) = levels(x)[level(x)] """ levelcode(x::CategoricalValue) @@ -30,7 +30,7 @@ Base.get(x::CategoricalValue) = index(pool(x))[level(x)] Get the code of categorical value `x`, i.e. its index in the set of possible values returned by [`levels(x)`](@ref). """ -levelcode(x::CategoricalValue) = Signed(widen(order(pool(x))[level(x)])) +levelcode(x::CategoricalValue) = Signed(widen(level(x))) """ levelcode(x::Missing) diff --git a/test/01_typedef.jl b/test/01_typedef.jl deleted file mode 100644 index 96c6581c..00000000 --- a/test/01_typedef.jl +++ /dev/null @@ -1,120 +0,0 @@ -module TestTypeDef -using Test -using CategoricalArrays -using CategoricalArrays: DefaultRefType, level, reftype, leveltype - -@testset "CategoricalPool, a b c order" begin - pool = CategoricalPool( - [ - "a", - "b", - "c" - ], - Dict( - "a" => DefaultRefType(1), - "b" => DefaultRefType(2), - "c" => DefaultRefType(3), - ) - ) - - @test isa(pool, CategoricalPool) - - @test isa(pool.index, Vector) - @test length(pool.index) == 3 - @test pool.index[1] == "a" - @test pool.index[2] == "b" - @test pool.index[3] == "c" - - @test isa(pool.invindex, Dict) - @test length(pool.invindex) == 3 - @test pool.invindex["a"] === DefaultRefType(1) - @test pool.invindex["b"] === DefaultRefType(2) - @test pool.invindex["c"] === DefaultRefType(3) - - @test isa(pool.order, Vector{DefaultRefType}) - @test length(pool.order) == 3 - @test pool.order[1] === DefaultRefType(1) - @test pool.order[2] === DefaultRefType(2) - @test pool.order[3] === DefaultRefType(3) - - @test leveltype("abc") === String - @test leveltype(String) === String - @test leveltype(1.0) === Float64 - @test leveltype(Float64) === Float64 - - for i in 1:3 - x = CategoricalValue(i, pool) - - @test leveltype(x) === String - @test leveltype(typeof(x)) === String - @test reftype(x) === DefaultRefType - @test reftype(typeof(x)) === DefaultRefType - @test x isa CategoricalValue{String, DefaultRefType} - - @test isa(level(x), DefaultRefType) - @test level(x) === DefaultRefType(i) - - @test isa(CategoricalArrays.pool(x), CategoricalPool) - @test CategoricalArrays.pool(x) === pool - - @test typeof(x)(x) === x - - @test CategoricalValue(UInt8(i), pool) == x - end -end - -@testset "CategoricalPool, c b a order" begin - pool = CategoricalPool( - [ - "a", - "b", - "c" - ], - Dict( - "a" => DefaultRefType(1), - "b" => DefaultRefType(2), - "c" => DefaultRefType(3), - ), - [ - DefaultRefType(3), - DefaultRefType(2), - DefaultRefType(1), - ] - ) - - @test isa(pool, CategoricalPool) - - @test isa(pool.index, Vector) - @test length(pool.index) == 3 - @test pool.index[1] == "a" - @test pool.index[2] == "b" - @test pool.index[3] == "c" - - @test isa(pool.invindex, Dict) - @test length(pool.invindex) == 3 - @test pool.invindex["a"] === DefaultRefType(1) - @test pool.invindex["b"] === DefaultRefType(2) - @test pool.invindex["c"] === DefaultRefType(3) - - @test isa(pool.order, Vector{DefaultRefType}) - @test length(pool.order) == 3 - @test pool.order[1] === DefaultRefType(3) - @test pool.order[2] === DefaultRefType(2) - @test pool.order[3] === DefaultRefType(1) - - for i in 1:3 - y = CategoricalValue(i, pool) - - @test isa(level(y), DefaultRefType) - @test level(y) === DefaultRefType(i) - - @test isa(CategoricalArrays.pool(y), CategoricalPool) - @test CategoricalArrays.pool(y) === pool - - @test typeof(y)(y) === y - - @test CategoricalValue(UInt8(i), pool) == y - end -end - -end diff --git a/test/01_value.jl b/test/01_value.jl new file mode 100644 index 00000000..3c03e62c --- /dev/null +++ b/test/01_value.jl @@ -0,0 +1,65 @@ +module TestValue +using Test +using CategoricalArrays +using CategoricalArrays: DefaultRefType, level, reftype, leveltype + +@testset "leveltype on non CategoricalValue types" begin + @test leveltype("abc") === String + @test leveltype(String) === String + @test leveltype(1.0) === Float64 + @test leveltype(Float64) === Float64 +end + +@testset "CategoricalValue on DefaultRefType pool in sorted order" begin + pool = CategoricalPool( + Dict( + "a" => DefaultRefType(1), + "b" => DefaultRefType(2), + "c" => DefaultRefType(3), + ) + ) + + for i in 1:3 + x = CategoricalValue(i, pool) + + @test leveltype(x) === String + @test leveltype(typeof(x)) === String + @test reftype(x) === DefaultRefType + @test reftype(typeof(x)) === DefaultRefType + @test x isa CategoricalValue{String, DefaultRefType} + + @test level(x) === DefaultRefType(i) + @test CategoricalArrays.pool(x) === pool + + @test typeof(x)(x) === x + @test CategoricalValue(UInt8(i), pool) == x + end +end + +@testset "CategoricalValue on UInt8 pool in custom order" begin + pool = CategoricalPool( + Dict( + "a" => UInt8(3), + "b" => UInt8(2), + "c" => UInt8(1), + ) + ) + + for i in 1:3 + x = CategoricalValue(i, pool) + + @test leveltype(x) === String + @test leveltype(typeof(x)) === String + @test reftype(x) === UInt8 + @test reftype(typeof(x)) === UInt8 + @test x isa CategoricalValue{String, UInt8} + + @test level(x) === UInt8(i) + @test CategoricalArrays.pool(x) === pool + + @test typeof(x)(x) === x + @test CategoricalValue(UInt32(i), pool) == x + end +end + +end diff --git a/test/02_buildorder.jl b/test/02_buildorder.jl deleted file mode 100644 index 8f6192af..00000000 --- a/test/02_buildorder.jl +++ /dev/null @@ -1,38 +0,0 @@ -module TestUpdateOrder -using Test -using CategoricalArrays -using CategoricalArrays: DefaultRefType - -@testset "buildorder!(b a c)" begin - pool = CategoricalPool( - [ - "a", - "b", - "c" - ], - Dict( - "a" => convert(DefaultRefType, 1), - "b" => convert(DefaultRefType, 2), - "c" => convert(DefaultRefType, 3), - ) - ) - - order = Vector{DefaultRefType}(undef, length(pool.index)) - - CategoricalArrays.buildorder!(order, pool.invindex, ["b", "a", "c"]) - - @test order[1] == convert(DefaultRefType, 2) - @test order[2] == convert(DefaultRefType, 1) - @test order[3] == convert(DefaultRefType, 3) -end - -@testset "levels are built correctly" begin - orig_index = [2, 5, 1, 3, 4] - orig_levels = [1, 2, 3, 4, 5] - pool = CategoricalPool(orig_index, orig_levels, true) - @test orig_index == CategoricalArrays.index(pool) - @test orig_levels == levels(pool) - @test CategoricalArrays.index(pool) == levels(pool)[CategoricalArrays.order(pool)] -end - -end diff --git a/test/03_buildfields.jl b/test/03_buildfields.jl deleted file mode 100644 index 8d5d1a6a..00000000 --- a/test/03_buildfields.jl +++ /dev/null @@ -1,44 +0,0 @@ -module TestBuildFields -using Test -using CategoricalArrays -using CategoricalArrays: DefaultRefType - -@testset "buildindex(), buildinvindex(), buildorder() for b a c" begin - index = ["b", "a", "c"] - - invindex = Dict( - "b" => DefaultRefType(1), - "a" => DefaultRefType(2), - "c" => DefaultRefType(3), - ) - - order = [ - DefaultRefType(2), - DefaultRefType(1), - DefaultRefType(3), - ] - - pool = CategoricalPool(index, invindex) - - levels = ["c", "a", "b"] - - built_index = CategoricalArrays.buildindex(invindex) - @test isa(index, Vector) - @test built_index == index - - built_invindex = CategoricalArrays.buildinvindex(index) - @test isa(invindex, Dict) - @test built_invindex == invindex - - neworder = [ - DefaultRefType(3), - DefaultRefType(2), - DefaultRefType(1), - ] - - built_order = CategoricalArrays.buildorder(pool.invindex, levels) - @test isa(order, Vector{DefaultRefType}) - @test built_order == neworder -end - -end diff --git a/test/04_constructors.jl b/test/04_constructors.jl index 40d639cf..7975c1cc 100644 --- a/test/04_constructors.jl +++ b/test/04_constructors.jl @@ -6,15 +6,21 @@ using CategoricalArrays: DefaultRefType @testset "Type parameter constraints" begin # cannot use categorical value as level type @test_throws ArgumentError CategoricalPool{CategoricalValue{Int,UInt8}, UInt8, CategoricalValue{CategoricalValue{Int,UInt8},UInt8}}( - CategoricalValue{Int,UInt8}[], Dict{CategoricalValue{Int,UInt8}, UInt8}(), UInt8[], false) + Dict{CategoricalValue{Int,UInt8}, UInt8}(), false) + @test_throws ArgumentError CategoricalPool{CategoricalValue{Int,UInt8}, UInt8, CategoricalValue{CategoricalValue{Int,UInt8},UInt8}}( + CategoricalValue{Int,UInt8}[], false) # cannot use non-categorical value as categorical value type - @test_throws ArgumentError CategoricalPool{Int, UInt8, Int}(Int[], Dict{Int, UInt8}(), UInt8[], false) - # level type of the pool and categorical value should match - @test_throws ArgumentError CategoricalPool{Int, UInt8, CategoricalValue{String, UInt8}}(Int[], Dict{Int, UInt8}(), UInt8[], false) - # reference type of the pool and categorical value should match - @test_throws ArgumentError CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt16}}(Int[], Dict{Int, UInt8}(), UInt8[], false) + @test_throws ArgumentError CategoricalPool{Int, UInt8, Int}(Int[], false) + @test_throws ArgumentError CategoricalPool{Int, UInt8, Int}(Dict{Int, UInt8}(), false) + # level type of the pool and categorical value must match + @test_throws ArgumentError CategoricalPool{Int, UInt8, CategoricalValue{String, UInt8}}(Int[], false) + @test_throws ArgumentError CategoricalPool{Int, UInt8, CategoricalValue{String, UInt8}}(Dict{Int, UInt8}(), false) + # reference type of the pool and categorical value must match + @test_throws ArgumentError CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt16}}(Int[], false) + @test_throws ArgumentError CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt16}}(Dict{Int, UInt8}(), false) # correct types combination - @test CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt8}}(Int[], Dict{Int, UInt8}(), UInt8[], false) isa CategoricalPool + @test CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt8}}(Int[], false) isa CategoricalPool + @test CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt8}}(Dict{Int, UInt8}(), false) isa CategoricalPool end @testset "empty CategoricalPool{String}" begin @@ -22,8 +28,8 @@ end @test isa(pool, CategoricalPool{String}) - @test isa(pool.index, Vector{String}) - @test length(pool.index) == 0 + @test isa(pool.levels, Vector{String}) + @test length(pool.levels) == 0 @test isa(pool.invindex, Dict{String, DefaultRefType}) @test length(pool.invindex) == 0 @@ -34,8 +40,8 @@ end @test isa(pool, CategoricalPool{Int, UInt8, CategoricalValue{Int, UInt8}}) - @test isa(pool.index, Vector{Int}) - @test length(pool.index) == 0 + @test isa(pool.levels, Vector{Int}) + @test length(pool.levels) == 0 @test isa(pool.invindex, Dict{Int, UInt8}) @test length(pool.invindex) == 0 @@ -46,11 +52,8 @@ end @test isa(pool, CategoricalPool{String, UInt32, CategoricalValue{String, UInt32}}) - @test isa(pool.index, Vector{String}) - @test length(pool.index) == 3 - @test pool.index[1] == "a" - @test pool.index[2] == "b" - @test pool.index[3] == "c" + @test isa(pool.levels, Vector{String}) + @test pool.levels == ["a", "b", "c"] @test isa(pool.invindex, Dict{String, DefaultRefType}) @test length(pool.invindex) == 3 @@ -64,11 +67,8 @@ end @test isa(pool, CategoricalPool) - @test isa(pool.index, Vector{String}) - @test length(pool.index) == 3 - @test pool.index[1] == "a" - @test pool.index[2] == "b" - @test pool.index[3] == "c" + @test isa(pool.levels, Vector{String}) + @test pool.levels == ["a", "b", "c"] @test isa(pool.invindex, Dict{String, UInt8}) @test length(pool.invindex) == 3 @@ -77,7 +77,7 @@ end @test pool.invindex["c"] === UInt8(3) end -@testset "CategoricalPool(a b c) with specified reference codes" begin +@testset "CategoricalPool(a b c) with invindex" begin pool = CategoricalPool( Dict( "a" => DefaultRefType(1), @@ -88,11 +88,8 @@ end @test isa(pool, CategoricalPool) - @test isa(pool.index, Vector{String}) - @test length(pool.index) == 3 - @test pool.index[1] == "a" - @test pool.index[2] == "b" - @test pool.index[3] == "c" + @test isa(pool.levels, Vector{String}) + @test pool.levels == ["a", "b", "c"] @test isa(pool.invindex, Dict{String, DefaultRefType}) @test length(pool.invindex) == 3 @@ -101,9 +98,7 @@ end @test pool.invindex["c"] === DefaultRefType(3) end -@testset "CategoricalPool(a b c) with specified Int ref codes" begin - # TODO: Make sure that invindex input is exhaustive - # Raise an error if map misses any entries +@testset "CategoricalPool(a b c) with invindex" begin pool = CategoricalPool( Dict( "a" => 1, @@ -114,17 +109,14 @@ end @test isa(pool, CategoricalPool) - @test isa(pool.index, Vector{String}) - @test length(pool.index) == 3 - @test pool.index[1] == "a" - @test pool.index[2] == "b" - @test pool.index[3] == "c" + @test isa(pool.levels, Vector{String}) + @test pool.levels == ["a", "b", "c"] - @test isa(pool.invindex, Dict{String, DefaultRefType}) + @test isa(pool.invindex, Dict{String, Int}) @test length(pool.invindex) == 3 - @test pool.invindex["a"] === DefaultRefType(1) - @test pool.invindex["b"] === DefaultRefType(2) - @test pool.invindex["c"] === DefaultRefType(3) + @test pool.invindex["a"] === 1 + @test pool.invindex["b"] === 2 + @test pool.invindex["c"] === 3 end @testset "CategoricalPool(c b a)" begin @@ -132,22 +124,13 @@ end @test isa(pool, CategoricalPool) - @test length(pool.index) == 3 - @test pool.index[1] == "c" - @test pool.index[2] == "b" - @test pool.index[3] == "a" + @test pool.levels == ["c", "b", "a"] @test isa(pool.invindex, Dict{String, DefaultRefType}) @test length(pool.invindex) == 3 @test pool.invindex["c"] === DefaultRefType(1) @test pool.invindex["b"] === DefaultRefType(2) @test pool.invindex["a"] === DefaultRefType(3) - - @test isa(pool.order, Vector{DefaultRefType}) - @test length(pool.order) == 3 - @test pool.order[1] === DefaultRefType(1) - @test pool.order[2] === DefaultRefType(2) - @test pool.order[3] === DefaultRefType(3) end @testset "CategoricalPool(a b c) with ref codes not matching the natural order" begin @@ -161,75 +144,13 @@ end @test isa(pool, CategoricalPool) - @test length(pool.index) == 3 - @test pool.index[1] == "c" - @test pool.index[2] == "b" - @test pool.index[3] == "a" - - @test isa(pool.invindex, Dict{String, DefaultRefType}) - @test length(pool.invindex) == 3 - @test pool.invindex["c"] === DefaultRefType(1) - @test pool.invindex["b"] === DefaultRefType(2) - @test pool.invindex["a"] === DefaultRefType(3) - - @test isa(pool.order, Vector{DefaultRefType}) - @test length(pool.order) == 3 - @test pool.order[1] === DefaultRefType(1) - @test pool.order[2] === DefaultRefType(2) - @test pool.order[3] === DefaultRefType(3) -end - -@testset "CategoricalPool(a b c) with specified levels order" begin - pool = CategoricalPool(["c", "b", "a"], ["c", "b", "a"]) - - @test isa(pool, CategoricalPool) - - @test length(pool.index) == 3 - @test pool.index[1] == "c" - @test pool.index[2] == "b" - @test pool.index[3] == "a" - - @test isa(pool.invindex, Dict{String, DefaultRefType}) - @test length(pool.invindex) == 3 - @test pool.invindex["c"] === DefaultRefType(1) - @test pool.invindex["b"] === DefaultRefType(2) - @test pool.invindex["a"] === DefaultRefType(3) - - @test isa(pool.order, Vector{DefaultRefType}) - @test length(pool.order) == 3 - @test pool.order[1] === DefaultRefType(1) - @test pool.order[2] === DefaultRefType(2) - @test pool.order[3] === DefaultRefType(3) -end - -@testset "CategoricalPool(a b c) with specified index and levels order" begin - pool = CategoricalPool( - Dict( - "a" => DefaultRefType(3), - "b" => DefaultRefType(2), - "c" => DefaultRefType(1), - ), - ["c", "b", "a"] - ) - - @test isa(pool, CategoricalPool) - - @test length(pool.index) == 3 - @test pool.index[1] == "c" - @test pool.index[2] == "b" - @test pool.index[3] == "a" + @test pool.levels == ["c", "b", "a"] @test isa(pool.invindex, Dict{String, DefaultRefType}) @test length(pool.invindex) == 3 @test pool.invindex["c"] === DefaultRefType(1) @test pool.invindex["b"] === DefaultRefType(2) @test pool.invindex["a"] === DefaultRefType(3) - - @test isa(pool.order, Vector{DefaultRefType}) - @test length(pool.order) == 3 - @test pool.order[1] === DefaultRefType(1) - @test pool.order[2] === DefaultRefType(2) - @test pool.order[3] === DefaultRefType(3) end @testset "CategoricalPool{Float64, UInt8}()" begin @@ -239,4 +160,9 @@ end @test CategoricalValue(1, pool) isa CategoricalValue{Float64, UInt8} end +@testset "Invalid arguments" begin + @test_throws ArgumentError CategoricalPool(Dict("a" => 1, "b" => 3)) + @test_throws ArgumentError CategoricalPool(["a", "a"]) end + +end \ No newline at end of file diff --git a/test/05_convert.jl b/test/05_convert.jl index e5bc209d..004e378e 100644 --- a/test/05_convert.jl +++ b/test/05_convert.jl @@ -125,8 +125,7 @@ end end @testset "levelcode" begin - pool = CategoricalPool{Int,UInt8}([3, 1, 2]) - levels!(pool, [2, 1, 3]) + pool = CategoricalPool{Int,UInt8}([2, 1, 3]) for i in 1:3 v = CategoricalValue(i, pool) @test levelcode(v) isa Int16 diff --git a/test/05_copy.jl b/test/05_copy.jl index 27e0d54f..7a4ec3f9 100644 --- a/test/05_copy.jl +++ b/test/05_copy.jl @@ -9,20 +9,18 @@ using CategoricalArrays: CategoricalPool pool2 = copy(pool) @test length(pool2) == 3 - @test pool2.levels == pool2.index == ["d", "c", "b"] + @test pool2.levels == ["d", "c", "b"] @test pool2.invindex == Dict("d"=>1, "c"=>2, "b"=>3) - @test pool2.order == 1:3 @test pool2.valindex == [CategoricalValue(i, pool2) for i in 1:3] @test all(v -> v.pool === pool2, pool2.valindex) @test pool2.ordered - levels!(pool2, ["a", "b", "c", "d"]) + levels!(pool2, ["d", "c", "b", "e"]) ordered!(pool2, false) @test length(pool) == 3 - @test pool.levels == pool.index == ["d", "c", "b"] + @test pool.levels == ["d", "c", "b"] @test pool.invindex == Dict("d"=>1, "c"=>2, "b"=>3) - @test pool.order == 1:3 @test pool.valindex == [CategoricalValue(i, pool) for i in 1:3] @test all(v -> v.pool === pool, pool.valindex) @test pool.ordered diff --git a/test/06_length.jl b/test/06_length.jl deleted file mode 100644 index ea6b6918..00000000 --- a/test/06_length.jl +++ /dev/null @@ -1,13 +0,0 @@ -module TestLength -using Test -using CategoricalArrays - -@testset "length(pool)" begin - pool = CategoricalPool([1, 2, 3]) - @test length(pool) == 3 - - pool = CategoricalPool([1, 2, 3], [3, 2, 1]) - @test length(pool) == 3 -end - -end diff --git a/test/06_show.jl b/test/06_show.jl index 475a1147..d8de407b 100644 --- a/test/06_show.jl +++ b/test/06_show.jl @@ -5,8 +5,7 @@ using CategoricalArrays @testset "show() for CategoricalPool{String} and its values" begin pool = CategoricalPool(["c", "b", "a"]) - - opool = CategoricalPool(["c", "b", "a"], ["a", "b", "c"], true) + opool = CategoricalPool(["c", "b", "a"], true) nv1 = CategoricalValue(1, pool) nv2 = CategoricalValue(2, pool) @@ -17,15 +16,15 @@ using CategoricalArrays ov3 = CategoricalValue(3, opool) @test sprint(show, pool) == "$CategoricalPool{String,UInt32}([\"c\",\"b\",\"a\"])" - @test sprint(show, opool) == "$CategoricalPool{String,UInt32}([\"a\",\"b\",\"c\"]) with ordered levels" + @test sprint(show, opool) == "$CategoricalPool{String,UInt32}([\"c\",\"b\",\"a\"]) with ordered levels" @test sprint(show, nv1) == "$CategoricalValue{String,UInt32} \"c\"" @test sprint(show, nv2) == "$CategoricalValue{String,UInt32} \"b\"" @test sprint(show, nv3) == "$CategoricalValue{String,UInt32} \"a\"" - @test sprint(show, ov1) == "$CategoricalValue{String,UInt32} \"c\" (3/3)" + @test sprint(show, ov1) == "$CategoricalValue{String,UInt32} \"c\" (1/3)" @test sprint(show, ov2) == "$CategoricalValue{String,UInt32} \"b\" (2/3)" - @test sprint(show, ov3) == "$CategoricalValue{String,UInt32} \"a\" (1/3)" + @test sprint(show, ov3) == "$CategoricalValue{String,UInt32} \"a\" (3/3)" @test sprint(show, nv1, context=:typeinfo=>typeof(nv1)) == "\"c\"" @test sprint(show, nv2, context=:typeinfo=>typeof(nv2)) == "\"b\"" @@ -68,9 +67,7 @@ end @testset "show() for CategoricalPool{Date} and its values" begin pool = CategoricalPool([Date(1999, 12), Date(1991, 8), Date(1993, 10)]) - - opool = CategoricalPool([Date(1999, 12), Date(1991, 8), Date(1993, 10)], - [Date(1991, 8), Date(1993, 10), Date(1999, 12)], true) + opool = CategoricalPool([Date(1999, 12), Date(1991, 8), Date(1993, 10)], true) nv1 = CategoricalValue(1, pool) nv2 = CategoricalValue(2, pool) @@ -81,15 +78,15 @@ end ov3 = CategoricalValue(3, opool) @test sprint(show, pool) == "$CategoricalPool{Dates.Date,UInt32}([1999-12-01,1991-08-01,1993-10-01])" - @test sprint(show, opool) == "$CategoricalPool{Dates.Date,UInt32}([1991-08-01,1993-10-01,1999-12-01]) with ordered levels" + @test sprint(show, opool) == "$CategoricalPool{Dates.Date,UInt32}([1999-12-01,1991-08-01,1993-10-01]) with ordered levels" @test sprint(show, nv1) == "$CategoricalValue{Dates.Date,UInt32} 1999-12-01" @test sprint(show, nv2) == "$CategoricalValue{Dates.Date,UInt32} 1991-08-01" @test sprint(show, nv3) == "$CategoricalValue{Dates.Date,UInt32} 1993-10-01" - @test sprint(show, ov1) == "$CategoricalValue{Dates.Date,UInt32} 1999-12-01 (3/3)" - @test sprint(show, ov2) == "$CategoricalValue{Dates.Date,UInt32} 1991-08-01 (1/3)" - @test sprint(show, ov3) == "$CategoricalValue{Dates.Date,UInt32} 1993-10-01 (2/3)" + @test sprint(show, ov1) == "$CategoricalValue{Dates.Date,UInt32} 1999-12-01 (1/3)" + @test sprint(show, ov2) == "$CategoricalValue{Dates.Date,UInt32} 1991-08-01 (2/3)" + @test sprint(show, ov3) == "$CategoricalValue{Dates.Date,UInt32} 1993-10-01 (3/3)" @test sprint(show, nv1, context=:typeinfo=>typeof(nv1)) == "1999-12-01" @test sprint(show, nv2, context=:typeinfo=>typeof(nv2)) == "1991-08-01" diff --git a/test/07_levels.jl b/test/07_levels.jl index 946793a2..fcb2a8ee 100644 --- a/test/07_levels.jl +++ b/test/07_levels.jl @@ -3,26 +3,23 @@ using Test using CategoricalArrays using CategoricalArrays: DefaultRefType, levels! -@testset "CategoricalPool{Int} updates levels/index/order correctly" begin +@testset "CategoricalPool{Int} updates levels and order correctly" begin pool = CategoricalPool([2, 1, 3]) @test isa(levels(pool), Vector{Int}) - @test length(levels(pool)) === 3 - @test levels(pool) == pool.index == [2, 1, 3] + @test length(pool) === 3 + @test levels(pool) == [2, 1, 3] @test all([levels(CategoricalValue(i, pool)) for i in 1:3] .=== Ref(levels(pool))) - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3) - @test pool.order == [1, 2, 3] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3) @test pool.valindex == [CategoricalValue(i, pool) for i in 1:3] for rep in 1:3 push!(pool, 4) - @test isa(pool.index, Vector{Int}) + @test isa(pool.levels, Vector{Int}) @test length(pool) === 4 - @test pool.index == [2, 1, 3, 4] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 4=>4) - @test pool.order == [1, 2, 3, 4] - @test pool.levels == [2, 1, 3, 4] + @test levels(pool) == [2, 1, 3, 4] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4) @test get(pool, 4) === DefaultRefType(4) @test pool[4] === CategoricalValue(4, pool) @test pool.valindex == [CategoricalValue(i, pool) for i in 1:4] @@ -31,12 +28,10 @@ using CategoricalArrays: DefaultRefType, levels! for rep in 1:3 push!(pool, 0) - @test isa(pool.index, Vector{Int}) + @test isa(pool.levels, Vector{Int}) @test length(pool) === 5 - @test levels(pool) == pool.index == [2, 1, 3, 4, 0] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 4=>4, 0=>5) - @test pool.order == [1, 2, 3, 4, 5] - @test pool.levels == [2, 1, 3, 4, 0] + @test levels(pool) == [2, 1, 3, 4, 0] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5) @test get(pool, 0) === DefaultRefType(5) @test pool[5] === CategoricalValue(5, pool) @test pool.valindex == [CategoricalValue(i, pool) for i in 1:5] @@ -45,12 +40,10 @@ using CategoricalArrays: DefaultRefType, levels! for rep in 1:3 push!(pool, 10, 11) - @test isa(pool.index, Vector{Int}) + @test isa(pool.levels, Vector{Int}) @test length(pool) === 7 - @test levels(pool) == pool.index == [2, 1, 3, 4, 0, 10, 11] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7) - @test pool.order == [1, 2, 3, 4, 5, 6, 7] - @test pool.levels == [2, 1, 3, 4, 0, 10, 11] + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7) @test get(pool, 10) === DefaultRefType(6) @test get(pool, 11) === DefaultRefType(7) @test pool[6] === CategoricalValue(6, pool) @@ -61,12 +54,10 @@ using CategoricalArrays: DefaultRefType, levels! for rep in 1:3 push!(pool, 12, 13) - @test isa(pool.index, Vector{Int}) + @test isa(pool.levels, Vector{Int}) @test length(pool) === 9 - @test levels(pool) == pool.index == [2, 1, 3, 4, 0, 10, 11, 12, 13] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9) - @test pool.order == [1, 2, 3, 4, 5, 6, 7, 8, 9] - @test pool.levels == [2, 1, 3, 4, 0, 10, 11, 12, 13] + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9) @test get(pool, 12) === DefaultRefType(8) @test get(pool, 13) === DefaultRefType(9) @test pool[8] === CategoricalValue(8, pool) @@ -74,207 +65,86 @@ using CategoricalArrays: DefaultRefType, levels! @test pool.valindex == [CategoricalValue(i, pool) for i in 1:9] end - for rep in 1:3 - delete!(pool, 13) - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 8 - @test levels(pool) == pool.index == [2, 1, 3, 4, 0, 10, 11, 12] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8) - @test pool.order == [1, 2, 3, 4, 5, 6, 7, 8] - @test pool.levels == [2, 1, 3, 4, 0, 10, 11, 12] - @test_throws KeyError get(pool, 13) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:8] - end - - for rep in 1:3 - delete!(pool, 12, 11) - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 6 - @test levels(pool) == pool.index == [2, 1, 3, 4, 0, 10] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 4=>4, 0=>5, 10=>6) - @test pool.order == [1, 2, 3, 4, 5, 6] - @test pool.levels == [2, 1, 3, 4, 0, 10] - @test_throws KeyError get(pool, 11) - @test_throws KeyError get(pool, 12) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:6] - end - - for rep in 1:3 - delete!(pool, 4) - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 5 - @test levels(pool) == pool.index == [2, 1, 3, 0, 10] - @test pool.invindex == Dict(1=>2, 2=>1, 3=>3, 0=>4, 10=>5) - @test pool.order == [1, 2, 3, 4, 5] - @test pool.levels == [2, 1, 3, 0, 10] - @test_throws KeyError get(pool, 4) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:5] - end - - @test levels!(pool, [1, 2, 3]) === pool - @test levels(pool) == [1, 2, 3] - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 3 - @test length(pool.valindex) == 3 - @test levels(pool) == pool.index == [1, 2, 3] - @test pool.invindex == Dict(1=>1, 2=>2, 3=>3) - @test pool.order == [1, 2, 3] - @test pool.levels == [1, 2, 3] - @test get(pool, 1) === DefaultRefType(1) - @test_throws KeyError get(pool, 0) - @test_throws KeyError get(pool, 10) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:3] - - @test levels!(pool, [1, 2, 4]) === pool - @test levels(pool) == [1, 2, 4] - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 3 - @test length(pool.valindex) == 3 - @test levels(pool) == pool.index == [1, 2, 4] - @test pool.invindex == Dict(1=>1, 2=>2, 4=>3) - @test pool.order == [1, 2, 3] - @test pool.levels == [1, 2, 4] - @test get(pool, 1) === DefaultRefType(1) - @test_throws KeyError get(pool, 3) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:3] - - @test levels!(pool, [6, 5, 4]) === pool - @test levels(pool) == [6, 5, 4] - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 3 - @test length(pool.valindex) == 3 - @test levels(pool) == pool.index == [6, 5, 4] - @test pool.invindex == Dict(6=>1, 5=>2, 4=>3) - @test pool.order == [1, 2, 3] - @test pool.levels == [6, 5, 4] - @test get(pool, 5) === DefaultRefType(2) - @test_throws KeyError get(pool, 3) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:3] + # Removing levels + @test_throws ArgumentError levels!(pool, levels(pool)[2:end]) # Changing order while preserving existing levels - @test levels!(pool, [5, 6, 4]) === pool - @test levels(pool) == [5, 6, 4] - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 3 - @test length(pool.valindex) == 3 - @test levels(pool) == [5, 6, 4] - @test pool.index == [6, 5, 4] - @test pool.invindex == Dict(6=>1, 5=>2, 4=>3) - @test pool.order == [2, 1, 3] - @test pool.levels == [5, 6, 4] - @test get(pool, 5) === DefaultRefType(2) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:3] + @test_throws ArgumentError levels!(pool, reverse(levels(pool))) # Adding levels while preserving existing ones - @test levels!(pool, [5, 2, 3, 6, 4]) === pool - @test levels(pool) == [5, 2, 3, 6, 4] - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 5 - @test length(pool.valindex) == 5 - @test levels(pool) == [5, 2, 3, 6, 4] - @test pool.index == [6, 5, 4, 2, 3] - @test pool.invindex == Dict(6=>1, 5=>2, 4=>3, 2=>4, 3=>5) - @test pool.order == [4, 1, 5, 2, 3] - @test pool.levels == [5, 2, 3, 6, 4] - @test get(pool, 2) === DefaultRefType(4) - @test get(pool, 3) === DefaultRefType(5) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:5] - - for rep in 1:3 - delete!(pool, 6) - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 4 - @test length(pool.valindex) == 4 - @test levels(pool) == [5, 2, 3, 4] - @test pool.index == [5, 4, 2, 3] - @test pool.invindex == Dict(5=>1, 4=>2, 2=>3, 3=>4) - @test pool.order == [1, 4, 2, 3] - @test pool.levels == [5, 2, 3, 4] - @test get(pool, 4) === DefaultRefType(2) - @test_throws KeyError get(pool, 6) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:4] - end + @test levels!(pool, [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14]) === pool + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14] + + @test isa(pool.levels, Vector{Int}) + @test length(pool) === 11 + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9, + 15=>10, 14=>11) + @test get(pool, 15) === DefaultRefType(10) + @test get(pool, 14) === DefaultRefType(11) + @test pool[10] === CategoricalValue(10, pool) + @test pool[11] === CategoricalValue(11, pool) + @test pool.valindex == [CategoricalValue(i, pool) for i in 1:11] # get! ordered!(pool, true) - @test_throws OrderedLevelsException get!(pool, 10) + @test_throws OrderedLevelsException get!(pool, 1000) ordered!(pool, false) - @test get!(pool, 10) === DefaultRefType(5) + @test get!(pool, 20) === DefaultRefType(12) + + @test isa(pool.levels, Vector{Int}) + @test length(pool) == 12 + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14, 20] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9, + 15=>10, 14=>11, 20=>12) + @test get(pool, 20) === DefaultRefType(12) + @test pool.valindex == [CategoricalValue(i, pool) for i in 1:12] - @test isa(pool.index, Vector{Int}) - @test length(pool) == 5 - @test length(pool.valindex) == 5 - @test levels(pool) == [5, 2, 3, 4, 10] - @test pool.index == [5, 4, 2, 3, 10] - @test pool.invindex == Dict(5=>1, 4=>2, 2=>3, 3=>4, 10=>5) - @test pool.order == [1, 4, 2, 3, 5] - @test pool.levels == [5, 2, 3, 4, 10] - @test get(pool, 10) === DefaultRefType(5) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:5] + # get! with CategoricalValue adding new levels in conflicting order + v = CategoricalValue(2, CategoricalPool([100, 99, 4, 2])) + @test_throws ArgumentError get!(pool, v) - # get! with CategoricalValue adding new levels - v = CategoricalValue(2, CategoricalPool([100, 99, 5, 2])) + # get! with CategoricalValue adding new levels in compatible order + v = CategoricalValue(4, CategoricalPool([2, 4, 100, 99])) ordered!(pool, true) @test_throws OrderedLevelsException get!(pool, v) ordered!(pool, false) - @test get!(pool, v) === DefaultRefType(7) + @test get!(pool, v) === DefaultRefType(14) - @test isa(pool.index, Vector{Int}) - @test length(pool) == 7 - @test length(pool.valindex) == 7 - @test levels(pool) == [100, 99, 5, 2, 3, 4, 10] - @test pool.index == [5, 4, 2, 3, 10, 100, 99] - @test pool.invindex == Dict(5=>1, 4=>2, 2=>3, 3=>4, 10=>5, 100=>6, 99=>7) - @test pool.order == [3, 6, 4, 5, 7, 1, 2] - @test pool.levels == [100, 99, 5, 2, 3, 4, 10] - @test get(pool, 99) === DefaultRefType(7) - @test get(pool, 100) === DefaultRefType(6) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:7] + @test isa(pool.levels, Vector{Int}) + @test length(pool) == 14 + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14, 20, 100, 99] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9, + 15=>10, 14=>11, 20=>12, 100=>13, 99=>14) + @test get(pool, 100) === DefaultRefType(13) + @test get(pool, 99) === DefaultRefType(14) + @test pool.valindex == [CategoricalValue(i, pool) for i in 1:14] # get! with CategoricalValue not adding new levels v = CategoricalValue(1, CategoricalPool([100, 2])) - @test get!(pool, v) === DefaultRefType(6) - - @test isa(pool.index, Vector{Int}) - @test length(pool) == 7 - @test length(pool.valindex) == 7 - @test levels(pool) == [100, 99, 5, 2, 3, 4, 10] - @test pool.index == [5, 4, 2, 3, 10, 100, 99] - @test pool.invindex == Dict(5=>1, 4=>2, 2=>3, 3=>4, 10=>5, 100=>6, 99=>7) - @test pool.order == [3, 6, 4, 5, 7, 1, 2] - @test pool.levels == [100, 99, 5, 2, 3, 4, 10] - @test get(pool, 99) === DefaultRefType(7) - @test get(pool, 100) === DefaultRefType(6) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:7] + @test get!(pool, v) === DefaultRefType(13) + + @test isa(pool.levels, Vector{Int}) + @test length(pool) == 14 + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14, 20, 100, 99] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9, + 15=>10, 14=>11, 20=>12, 100=>13, 99=>14) + @test pool.valindex == [CategoricalValue(i, pool) for i in 1:14] # get! with CategoricalValue from same pool @test get!(pool, pool[1]) === DefaultRefType(1) - @test isa(pool.index, Vector{Int}) - @test length(pool) == 7 - @test length(pool.valindex) == 7 - @test levels(pool) == [100, 99, 5, 2, 3, 4, 10] - @test pool.index == [5, 4, 2, 3, 10, 100, 99] - @test pool.invindex == Dict(5=>1, 4=>2, 2=>3, 3=>4, 10=>5, 100=>6, 99=>7) - @test pool.order == [3, 6, 4, 5, 7, 1, 2] - @test pool.levels == [100, 99, 5, 2, 3, 4, 10] - @test get(pool, 99) === DefaultRefType(7) - @test get(pool, 100) === DefaultRefType(6) - @test pool.valindex == [CategoricalValue(i, pool) for i in 1:7] + @test isa(pool.levels, Vector{Int}) + @test length(pool) == 14 + @test levels(pool) == [2, 1, 3, 4, 0, 10, 11, 12, 13, 15, 14, 20, 100, 99] + @test pool.invindex == Dict(2=>1, 1=>2, 3=>3, 4=>4, 0=>5, 10=>6, 11=>7, 12=>8, 13=>9, + 15=>10, 14=>11, 20=>12, 100=>13, 99=>14) + @test pool.valindex == [CategoricalValue(i, pool) for i in 1:14] - # get! with CategoricalValue not adding new levels + # get! with CategoricalValue conversion error v = CategoricalValue(1, CategoricalPool(["a", "b"])) @test_throws MethodError get!(pool, v) @@ -307,7 +177,7 @@ end @test sprint(showerror, res.value) == "cannot store level(s) 285, 286, 287 and 288 since reference type UInt8 can only hold 255 levels. Use the decompress function to make room for more levels." pool = CategoricalPool{String, UInt8}(string.(318:-1:65)) - res = @test_throws LevelsException{String, UInt8} levels!(pool, vcat("az", levels(pool), "bz", "cz")) + res = @test_throws LevelsException{String, UInt8} levels!(pool, vcat(levels(pool), "az", "bz", "cz")) @test res.value.levels == ["bz", "cz"] @test sprint(showerror, res.value) == "cannot store level(s) \"bz\" and \"cz\" since reference type UInt8 can only hold 255 levels. Use the decompress function to make room for more levels." lev = copy(levels(pool)) @@ -315,4 +185,20 @@ end @test levels(pool) == vcat(lev, "az") end +@testset "issubset" begin + pool1 = CategoricalPool(["a", "b", "c"]) + pool2 = CategoricalPool(["c", "a", "b"]) + pool3 = CategoricalPool(["a", "b", "c", "d"]) + pool4 = CategoricalPool(["a", "b"]) + pool5 = CategoricalPool(["a", "b", "e"]) + pool6 = CategoricalPool(String[]) + + @test issubset(pool1, pool1) + @test issubset(pool2, pool1) + @test !issubset(pool3, pool1) + @test issubset(pool4, pool1) + @test !issubset(pool5, pool1) + @test issubset(pool6, pool1) +end + end diff --git a/test/10_isless.jl b/test/10_isless.jl index a8414556..ebf70fe3 100644 --- a/test/10_isless.jl +++ b/test/10_isless.jl @@ -185,142 +185,6 @@ end end end -@testset "comparisons with reordered levels" begin - @test levels!(pool, [2, 3, 1]) === pool - @test levels(pool) == [2, 3, 1] - - @test (v1 < v1) === false - @test (v1 < v2) === false - @test (v1 < v3) === false - @test (v2 < v1) === true - @test (v2 < v2) === false - @test (v2 < v3) === true - @test (v3 < v1) === true - @test (v3 < v2) === false - @test (v3 < v3) === false - - @test (v1 <= v1) === true - @test (v1 <= v2) === false - @test (v1 <= v3) === false - @test (v2 <= v1) === true - @test (v2 <= v2) === true - @test (v2 <= v3) === true - @test (v3 <= v1) === true - @test (v3 <= v2) === false - @test (v3 <= v3) === true - - @test (v1 > v1) === false - @test (v1 > v2) === true - @test (v1 > v3) === true - @test (v2 > v1) === false - @test (v2 > v2) === false - @test (v2 > v3) === false - @test (v3 > v1) === false - @test (v3 > v2) === true - @test (v3 > v3) === false - - @test (v1 >= v1) === true - @test (v1 >= v2) === true - @test (v1 >= v3) === true - @test (v2 >= v1) === false - @test (v2 >= v2) === true - @test (v2 >= v3) === false - @test (v3 >= v1) === false - @test (v3 >= v2) === true - @test (v3 >= v3) === true - - @test isless(v1, v1) === false - @test isless(v1, v2) === false - @test isless(v1, v3) === false - @test isless(v2, v1) === true - @test isless(v2, v2) === false - @test isless(v2, v3) === true - @test isless(v3, v1) === true - @test isless(v3, v2) === false - @test isless(v3, v3) === false - - @testset "comparison with values of different types" begin - @test isless(v1, 1) === false - @test isless(v1, 2) === false - @test isless(v2, 1) === true - @test_throws KeyError isless(v1, 10) - @test_throws KeyError isless(v1, "a") - @test isless(1, v1) === false - @test isless(2, v1) === true - @test_throws KeyError isless("a", v1) - @test (v1 < 1) === false - @test (v1 < 2) === false - @test (v2 < 1) === true - @test_throws KeyError v1 < 10 - @test_throws KeyError v1 < "a" - @test (v1 <= 1) === true - @test (v1 <= 2) === false - @test (v2 <= 1) === true - @test_throws KeyError v1 <= "a" - @test (v1 > 1) === false - @test (v1 > 2) === true - @test (v2 > 1) === false - @test_throws KeyError v1 > "a" - @test (v1 >= 1) === true - @test (v1 >= 2) === true - @test (v2 >= 1) === false - @test_throws KeyError v1 >= "a" - end - - @test ordered!(pool, false) === pool - @test isordered(pool) === false - - @test_throws ArgumentError v1 < v1 - @test_throws ArgumentError v1 < v2 - @test_throws ArgumentError v1 < v3 - @test_throws ArgumentError v2 < v1 - @test_throws ArgumentError v2 < v2 - @test_throws ArgumentError v2 < v3 - @test_throws ArgumentError v3 < v1 - @test_throws ArgumentError v3 < v2 - @test_throws ArgumentError v3 < v3 - - @test_throws ArgumentError v1 <= v1 - @test_throws ArgumentError v1 <= v2 - @test_throws ArgumentError v1 <= v3 - @test_throws ArgumentError v2 <= v1 - @test_throws ArgumentError v2 <= v2 - @test_throws ArgumentError v2 <= v3 - @test_throws ArgumentError v3 <= v1 - @test_throws ArgumentError v3 <= v2 - @test_throws ArgumentError v3 <= v3 - - @test_throws ArgumentError v1 > v1 - @test_throws ArgumentError v1 > v2 - @test_throws ArgumentError v1 > v3 - @test_throws ArgumentError v2 > v1 - @test_throws ArgumentError v2 > v2 - @test_throws ArgumentError v2 > v3 - @test_throws ArgumentError v3 > v1 - @test_throws ArgumentError v3 > v2 - @test_throws ArgumentError v3 > v3 - - @test_throws ArgumentError v1 >= v1 - @test_throws ArgumentError v1 >= v2 - @test_throws ArgumentError v1 >= v3 - @test_throws ArgumentError v2 >= v1 - @test_throws ArgumentError v2 >= v2 - @test_throws ArgumentError v2 >= v3 - @test_throws ArgumentError v3 >= v1 - @test_throws ArgumentError v3 >= v2 - @test_throws ArgumentError v3 >= v3 - - @test isless(v1, v1) === false - @test isless(v1, v2) === false - @test isless(v1, v3) === false - @test isless(v2, v1) === true - @test isless(v2, v2) === false - @test isless(v2, v3) === true - @test isless(v3, v1) === true - @test isless(v3, v2) === false - @test isless(v3, v3) === false -end - @testset "ordering comparisons between pools fail" begin pool2 = CategoricalPool([1, 2, 3]) ordered!(pool2, true) diff --git a/test/11_array.jl b/test/11_array.jl index 578b9109..bc9670f9 100644 --- a/test/11_array.jl +++ b/test/11_array.jl @@ -102,9 +102,9 @@ using CategoricalArrays: DefaultRefType, leveltype @test_throws Exception x[1] > x[2] @test_throws Exception x[3] > x[2] - @test x[1] === x.pool.valindex[1] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[1] + @test x[1] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[2] @test_throws BoundsError x[4] x2 = x[:] @@ -137,35 +137,33 @@ using CategoricalArrays: DefaultRefType, leveltype @test isordered(x2) == isordered(x) x[1] = x[2] - @test x[1] === x.pool.valindex[2] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[1] + @test x[1] === x.pool.valindex[1] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[2] x[3] = "c" - @test x[1] === x.pool.valindex[2] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[3] @test levels(x) == ["a", "b", "c"] + @test x[1] === x.pool.valindex[1] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[3] x[2:3] .= "b" - @test x[1] === x.pool.valindex[2] - @test x[2] === x.pool.valindex[1] - @test x[3] === x.pool.valindex[1] @test levels(x) == ["a", "b", "c"] + @test x[1] === x.pool.valindex[1] + @test x[2] === x.pool.valindex[2] + @test x[3] === x.pool.valindex[2] @test droplevels!(x) === x @test levels(x) == ["a", "b"] @test x[1] === x.pool.valindex[1] @test x[2] === x.pool.valindex[2] @test x[3] === x.pool.valindex[2] - @test levels(x) == ["a", "b"] @test levels!(x, ["b", "a"]) === x @test levels(x) == ["b", "a"] - @test x[1] === x.pool.valindex[1] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[2] - @test levels(x) == ["b", "a"] + @test x[1] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[1] @test_throws ArgumentError levels!(x, ["a"]) @test_throws ArgumentError levels!(x, ["e", "b"]) @@ -173,15 +171,14 @@ using CategoricalArrays: DefaultRefType, leveltype @test levels!(x, ["e", "a", "b"]) === x @test levels(x) == ["e", "a", "b"] - @test x[1] === x.pool.valindex[1] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[2] - @test levels(x) == ["e", "a", "b"] + @test x[1] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[3] + @test x[3] === x.pool.valindex[3] x[1] = "c" @test x[1] === x.pool.valindex[4] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[3] + @test x[3] === x.pool.valindex[3] @test levels(x) == ["e", "a", "b", "c"] push!(x, "a") @@ -202,7 +199,7 @@ using CategoricalArrays: DefaultRefType, leveltype x2 = copy(x) @test_throws MethodError push!(x, 1) @test x == x2 - @test x.pool.index == x2.pool.index + @test x.pool.levels == x2.pool.levels @test x.pool.invindex == x2.pool.invindex empty!(x) @@ -619,7 +616,7 @@ using CategoricalArrays: DefaultRefType, leveltype levels!(x, ["c", "a", "xyz", "b"]) end x[1] = v - @test x[1] === x.pool.valindex[3] + @test x[1] === x.pool.valindex[4] @test x[2] === x.pool.valindex[1] @test levels(x) == ["c", "a", "xyz", "b"] end diff --git a/test/12_missingarray.jl b/test/12_missingarray.jl index 4a023b36..73bfc609 100644 --- a/test/12_missingarray.jl +++ b/test/12_missingarray.jl @@ -113,9 +113,9 @@ const ≅ = isequal @test_throws Exception x[1] > x[2] @test_throws Exception x[3] > x[2] - @test x[1] === x.pool.valindex[1] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[1] + @test x[1] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[2] @test_throws BoundsError x[4] x2 = x[:] @@ -148,20 +148,20 @@ const ≅ = isequal @test isordered(x2) == isordered(x) x[1] = x[2] - @test x[1] === x.pool.valindex[2] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[1] + @test x[1] === x.pool.valindex[1] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[2] x[3] = "c" - @test x[1] === x.pool.valindex[2] - @test x[2] === x.pool.valindex[2] + @test x[1] === x.pool.valindex[1] + @test x[2] === x.pool.valindex[1] @test x[3] === x.pool.valindex[3] @test levels(x) == ["a", "b", "c"] x[2:3] .= "b" - @test x[1] === x.pool.valindex[2] - @test x[2] === x.pool.valindex[1] - @test x[3] === x.pool.valindex[1] + @test x[1] === x.pool.valindex[1] + @test x[2] === x.pool.valindex[2] + @test x[3] === x.pool.valindex[2] @test levels(x) == ["a", "b", "c"] @test droplevels!(x) === x @@ -173,9 +173,9 @@ const ≅ = isequal @test levels!(x, ["b", "a"]) === x @test levels(x) == ["b", "a"] - @test x[1] === x.pool.valindex[1] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[2] + @test x[1] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[1] + @test x[3] === x.pool.valindex[1] @test levels(x) == ["b", "a"] @test_throws ArgumentError levels!(x, ["a"]) @@ -184,15 +184,15 @@ const ≅ = isequal @test levels!(x, ["e", "a", "b"]) === x @test levels(x) == ["e", "a", "b"] - @test x[1] === x.pool.valindex[1] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[2] + @test x[1] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[3] + @test x[3] === x.pool.valindex[3] @test levels(x) == ["e", "a", "b"] x[1] = "c" @test x[1] === x.pool.valindex[4] - @test x[2] === x.pool.valindex[2] - @test x[3] === x.pool.valindex[2] + @test x[2] === x.pool.valindex[3] + @test x[3] === x.pool.valindex[3] @test levels(x) == ["e", "a", "b", "c"] @test_throws ArgumentError levels!(x, ["e", "c"]) diff --git a/test/13_arraycommon.jl b/test/13_arraycommon.jl index 1561ae55..771b608c 100644 --- a/test/13_arraycommon.jl +++ b/test/13_arraycommon.jl @@ -2,7 +2,7 @@ module TestArrayCommon using Test using Future: copy! using CategoricalArrays, DataAPI -using CategoricalArrays: DefaultRefType, index +using CategoricalArrays: DefaultRefType const ≅ = isequal const ≇ = !isequal @@ -243,61 +243,175 @@ end end end - @testset "copy! and copyto!" begin + @testset "copy! and copyto!" for ordered in (false, true) x = CategoricalArray{Union{T, String}}(["Old", "Young", "Middle", "Young"]) levels!(x, ["Young", "Middle", "Old"]) - ordered!(x, true) + ordered!(x, ordered) y = CategoricalArray{Union{T, String}}(["X", "Z", "Y", "X"]) for copyf! in (copy!, copyto!) x2 = copy(x) - @test copyf!(x2, y) === x2 - @test x2 == y - @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] - @test !isordered(x2) + if ordered + @test_throws OrderedLevelsException copyf!(x2, y) + @test x2 == x + @test levels(x2) == ["Young", "Middle", "Old"] + @test isordered(x2) + else + @test copyf!(x2, y) === x2 + @test x2 == y + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + @test !isordered(x2) + end end x = CategoricalArray{Union{T, String}}(["Old", "Young", "Middle", "Young"]) levels!(x, ["Young", "Middle", "Old"]) - ordered!(x, true) + ordered!(x, ordered) + x2 = copy(x) y = CategoricalArray{Union{T, String}}(["X", "Z", "Y", "X"]) - a = (Union{String, Missing})["Z", "Y", "X", "Young"] + a = Union{String, Missing}["Z", "Y", "X", "Young"] # Test with missing values if T === Missing - x[3] = missing + x[3] = x2[3] = missing y[3] = a[2] = missing end - @test copyto!(x, 1, y, 2) === x - @test x ≅ a - @test levels(x) == ["Young", "Middle", "Old", "X", "Y", "Z"] - @test !isordered(x) + if ordered + @test_throws OrderedLevelsException copyto!(x2, 1, y, 2) + @test x2 ≅ x + @test levels(x2) == ["Young", "Middle", "Old"] + @test isordered(x2) + else + @test copyto!(x2, 1, y, 2) === x2 + @test x2 ≅ a + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + @test !isordered(x2) + end - @testset "0-length copy!/copyto! does nothing (including bounds checks)" begin + @testset "0-length copy!/copyto!" begin + # 0-length copy!/copyto! does nothing (including bounds checks) except setting levels u = x[1:0] v = y[1:0] - @test copyto!(x, 1, y, 3, 0) === x - @test x ≅ a - @test copyto!(x, 1, y, 5, 0) === x - @test x ≅ a + x2 = copy(x) + if ordered + @test_throws OrderedLevelsException copyto!(x2, 1, y, 3, 0) + @test levels(x2) == ["Young", "Middle", "Old"] + else + @test copyto!(x2, 1, y, 3, 0) === x2 + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test x2 ≅ x - @test copyto!(u, -5, v, 2, 0) === u - @test u ≅ v - @test copyto!(x, -5, v, 2, 0) === x - @test x ≅ a - @test copyto!(u, v) === u - @test u ≅ v - @test copyto!(x, v) === x - @test x ≅ a - @test copy!(u, v) === u - @test u ≅ v - @test copy!(x, v) === x - @test x ≅ a + x2 = copy(x) + if ordered + @test_throws OrderedLevelsException copyto!(x2, 1, y, 5, 0) + @test levels(x2) == ["Young", "Middle", "Old"] + else + @test copyto!(x2, 1, y, 5, 0) === x2 + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test x2 ≅ x + + u2 = copy(u) + if ordered + @test_throws OrderedLevelsException copyto!(u2, -5, v, 2, 0) + @test levels(u2) == ["Young", "Middle", "Old"] + else + @test copyto!(u2, -5, v, 2, 0) === u2 + @test levels(u2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test isempty(u2) + + x2 = copy(x) + if ordered + @test_throws OrderedLevelsException copyto!(x2, -5, v, 2, 0) + @test levels(x2) == ["Young", "Middle", "Old"] + else + @test copyto!(x2, -5, v, 2, 0) === x2 + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test x2 ≅ x + + u2 = copy(u) + if ordered + @test_throws OrderedLevelsException copyto!(u2, v) + @test levels(u2) == ["Young", "Middle", "Old"] + else + @test copyto!(u2, v) === u2 + @test levels(u2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test isempty(u2) + + x2 = copy(x) + if ordered + @test_throws OrderedLevelsException copyto!(x2, v) + @test levels(x2) == ["Young", "Middle", "Old"] + else + @test copyto!(x2, v) === x2 + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test x2 ≅ x + + u2 = copy(u) + if ordered + @test_throws OrderedLevelsException copy!(u2, v) + @test levels(u2) == ["Young", "Middle", "Old"] + else + @test copy!(u2, v) === u2 + @test levels(u2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test isempty(u2) + + x2 = copy(x) + if ordered + @test_throws OrderedLevelsException copy!(x2, v) + @test levels(x2) == ["Young", "Middle", "Old"] + else + @test copy!(x2, v) === x2 + @test levels(x2) == ["Young", "Middle", "Old", "X", "Y", "Z"] + end + @test isordered(x2) === ordered + @test x2 ≅ x + + # test with zero-levels source + x2 = copy(x) + @test copy!(x2, categorical(String[])) === x2 + @test levels(x2) == ["Young", "Middle", "Old"] + @test isordered(x2) === ordered + @test x2 ≅ x + + # test with zero-levels destination + for ordered2 in (true, false) + x2 = CategoricalArray{String}(undef, 2, ordered=ordered2) + @test copy!(x2, u) === x2 + @test levels(x2) == ["Young", "Middle", "Old"] + @test isordered(x2) === ordered + @test length(x2) == 2 + @test !any(isassigned(x2, i) for i in eachindex(x2)) + end + + # test with zero-levels source and destination + for ordered2 in (true, false) + x2 = CategoricalArray{String}(undef, 2, ordered=ordered2) + @test copy!(x2, categorical(String[], ordered=ordered)) === x2 + @test isempty(levels(x2)) + @test isordered(x2) === ordered2 + @test length(x2) == 2 + @test !any(isassigned(x2, i) for i in eachindex(x2)) + end end @testset "nonzero-length copy!/copyto! into/from empty array throws bounds error" begin u = x[1:0] v = y[1:0] + x2 = copy(x) @test_throws BoundsError copy!(u, x) @test u ≅ v @@ -305,25 +419,25 @@ end @test u ≅ v @test_throws BoundsError copyto!(u, 1, v, 1, 1) @test u ≅ v - @test_throws BoundsError copyto!(x, 1, v, 1, 1) - @test x ≅ a + @test_throws BoundsError copyto!(x2, 1, v, 1, 1) + @test x2 ≅ x end @testset "no corruption happens in case of bounds error" begin - @test_throws BoundsError copyto!(x, 10, y, 2) - @test x ≅ a - @test_throws BoundsError copyto!(x, 1, y, 10) - @test x ≅ a - @test_throws BoundsError copyto!(x, 10, y, 20) - @test x ≅ a - @test_throws BoundsError copyto!(x, 10, y, 2) - @test x ≅ a - @test_throws BoundsError copyto!(x, 1, y, 2, 10) - @test x ≅ a - @test_throws BoundsError copyto!(x, 4, y, 1, 2) - @test x ≅ a - @test_throws BoundsError copyto!(x, 1, y, 4, 2) - @test x ≅ a + @test_throws BoundsError copyto!(x2, 10, y, 2) + @test x2 ≅ x + @test_throws BoundsError copyto!(x2, 1, y, 10) + @test x2 ≅ x + @test_throws BoundsError copyto!(x2, 10, y, 20) + @test x2 ≅ x + @test_throws BoundsError copyto!(x2, 10, y, 2) + @test x2 ≅ x + @test_throws BoundsError copyto!(x2, 1, y, 2, 10) + @test x2 ≅ x + @test_throws BoundsError copyto!(x2, 4, y, 1, 2) + @test x2 ≅ x + @test_throws BoundsError copyto!(x2, 1, y, 4, 2) + @test x2 ≅ x end end @@ -434,6 +548,18 @@ end copyf!(vdest, src) @test vdest == src[1:2] @test levels(dest) == levels(vdest) == ["e", "f", "b", "a"] + + # Destination without any levels should be marked as ordered + src = levels!(CategoricalVector(v, ordered=true), reverse(v)) + dest = CategoricalVector{Union{String,Missing}}([missing, missing]) + dest2 = copy(dest) + vdest = view(dest2, 1:2) + res = @test_throws ArgumentError copyf!(vdest, src) + @test res.value.msg == "cannot set ordered=true on dest SubArray as it would " * + "affect the parent. Found when trying to set levels to [\"b\", \"a\"]." + @test dest2 ≅ dest + @test levels(dest2) == levels(vdest) == levels(dest) + @test !isordered(dest2) && !isordered(vdest) end @testset "copy a src into viewed dest and breaking orderedness" begin @@ -441,18 +567,18 @@ end src = levels!(CategoricalVector(v), reverse(v)) dest = CategoricalVector{String}(["e", "f", "g"], ordered=true) vdest = view(dest, 1:2) - res = @test_throws ArgumentError copyf!(vdest, src) - @test res.value.msg == "cannot set ordered=false on dest SubArray as it would affect the parent. " * - "Found when trying to set levels to $(["e", "f", "g", "b", "a"])." + res = @test_throws OrderedLevelsException copyf!(vdest, src) + @test res.value.newlevel == "b" + @test res.value.levels == levels(dest) @test dest[1:2] == ["e", "f"] @test levels(dest) == levels(vdest) == ["e", "f", "g"] @test isordered(dest) && isordered(vdest) dest = CategoricalVector{String}(["e", "f"], ordered=true) vdest = view(dest, 1:2) - res = @test_throws ArgumentError copyf!(vdest, src) - @test res.value.msg == "cannot set ordered=false on dest SubArray as it would affect the parent. " * - "Found when trying to set levels to $(["e", "f", "b", "a"])." + res = @test_throws OrderedLevelsException copyf!(vdest, src) + @test res.value.newlevel == "b" + @test res.value.levels == levels(dest) @test dest == ["e", "f"] @test levels(dest) == levels(vdest) == ["e", "f"] @test isordered(dest) && isordered(vdest) @@ -626,9 +752,10 @@ end ordered!(x, true) y = CategoricalArray{Union{T, String}}(["Middle", "Middle", "Old", "Young"]) levels!(y, ["X", "Young", "Middle", "Old"]) - @test copyf!(x, y) === x - @test levels(x) == ["X", "Young", "Middle", "Old"] - @test !isordered(x) + res = @test_throws OrderedLevelsException copyf!(x, y) + @test res.value.newlevel == "X" + @test levels(x) == ["Young", "Middle", "Old"] + @test isordered(x) end @testset "fill!()" begin @@ -697,8 +824,8 @@ end x = CategoricalArray{Union{T, Int}, 1, UInt8}([1, 3, 256]) res = @test_throws LevelsException{Int, UInt8} levels!(x, collect(1:256)) - @test res.value.levels == [255] - @test sprint(showerror, res.value) == "cannot store level(s) 255 since reference type UInt8 can only hold 255 levels. Use the decompress function to make room for more levels." + @test res.value.levels == [256] + @test sprint(showerror, res.value) == "cannot store level(s) 256 since reference type UInt8 can only hold 255 levels. Use the decompress function to make room for more levels." x = CategoricalArray{Union{T, Int}}(30:2:131115) res = @test_throws LevelsException{Int, UInt16} CategoricalVector{Int, UInt16}(x) @@ -760,7 +887,7 @@ end @test sort!(x) === x @test x == ["Young", "Young", "Middle", "Old"] - if T === Missing + if T !== Missing v = rand(["a", "b", "c", "d"], 1000) else v = rand(["a", "b", "c", "d", missing], 1000) @@ -776,7 +903,7 @@ end cv = categorical(v) levels!(cv, ["b", "a", "c", "d"]) @test sort(cv, rev=rev) ≅ - [levels(cv); missing][sort([5; CategoricalArrays.order(cv.pool)][cv.refs .+ 1], rev=rev)] + ["b", "a", "c", "d", missing][sort([5, 1, 2, 3, 4][cv.refs .+ 1], rev=rev)] cv = categorical(v) @test sort(cv, rev=rev, lt=(x, y) -> isless(y, x)) ≅ @@ -827,7 +954,6 @@ end @test isordered(y) === isordered(x) @test isordered(x) === ordered_orig @test y.refs == x.refs - @test index(y.pool) == index(x.pool) @test levels(y) == levels(x) @test y.refs !== x.refs @test y.pool !== x.pool @@ -839,7 +965,6 @@ end @test isordered(y) === isordered(x) @test isordered(x) === ordered_orig @test y.refs == x.refs - @test index(y.pool) == index(x.pool) @test levels(y) == levels(x) @test y.refs !== x.refs @test y.pool !== x.pool @@ -859,7 +984,6 @@ end @test isordered(y) === ordered @test isordered(x) === ordered_orig @test y.refs == x.refs - @test index(y.pool) == index(x.pool) @test levels(y) == levels(x) @test y.refs !== x.refs @test y.pool !== x.pool @@ -871,7 +995,6 @@ end @test isordered(y) === ordered @test isordered(x) === ordered_orig @test y.refs == x.refs - @test index(y.pool) == index(x.pool) @test levels(y) == levels(x) @test y.refs !== x.refs @test y.pool !== x.pool @@ -886,7 +1009,6 @@ end @test isordered(y) === isordered(x) @test isordered(x) === ordered_orig @test y.refs == x.refs - @test index(y.pool) == index(x.pool) @test levels(y) == levels(x) @test (y.refs === x.refs) == (eltype(x.refs) === eltype(y.refs)) @test (y.pool === x.pool) == (eltype(x.refs) === eltype(y.refs)) @@ -1203,20 +1325,37 @@ end b = ["z","y","x"] y = CategoricalVector{String}(b) - append!(x, y) - @test isordered(x) === ordered - @test length(x) == 9 - @test x == ["a", "b", "c", "a", "b", "c", "z", "y", "x"] - @test levels(x) == ["a", "b", "c", "x", "y", "z"] + if ordered + @test_throws OrderedLevelsException append!(x, y) + @test isordered(x) === ordered + @test length(x) == 6 + @test x == ["a", "b", "c", "a", "b", "c"] + @test levels(x) == ["a", "b", "c"] + else + append!(x, y) + @test isordered(x) === ordered + @test length(x) == 9 + @test x == ["a", "b", "c", "a", "b", "c", "z", "y", "x"] + @test levels(x) == ["a", "b", "c", "x", "y", "z"] + end z1 = view(CategoricalVector{String}(["ex1", "ex2"]), 1) z2 = view(CategoricalVector{String}(["ex3", "ex4"]), 1:1) - append!(x, z1) - append!(x, z2) - @test isordered(x) === ordered - @test length(x) == 11 - @test x == ["a", "b", "c", "a", "b", "c", "z", "y", "x", "ex1", "ex3"] - @test levels(x) == ["a", "b", "c", "x", "y", "z", "ex1", "ex2", "ex3", "ex4"] + if ordered + @test_throws OrderedLevelsException append!(x, z1) + @test_throws OrderedLevelsException append!(x, z2) + @test isordered(x) === ordered + @test length(x) == 6 + @test x == ["a", "b", "c", "a", "b", "c"] + @test levels(x) == ["a", "b", "c"] + else + append!(x, z1) + append!(x, z2) + @test isordered(x) === ordered + @test length(x) == 11 + @test x == ["a", "b", "c", "a", "b", "c", "z", "y", "x", "ex1", "ex3"] + @test levels(x) == ["a", "b", "c", "x", "y", "z", "ex1", "ex2", "ex3", "ex4"] + end end @testset "append! Float64" begin @@ -1231,20 +1370,36 @@ end b = [2.5, 3.0, 3.5] y = CategoricalVector{Float64}(b, ordered=ordered) - append!(x, y) - @test length(x) == 9 - @test x == [-1.0, 0.0, 1.0, -1.0, 0.0, 1.0, 2.5, 3.0, 3.5] - @test isordered(x) === ordered - @test levels(x) == [-1.0, 0.0, 1.0, 2.5, 3.0, 3.5] + if ordered + @test_throws OrderedLevelsException append!(x, y) + @test length(x) == 6 + @test x == [-1.0, 0.0, 1.0, -1.0, 0.0, 1.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0] + else + append!(x, y) + @test length(x) == 9 + @test x == [-1.0, 0.0, 1.0, -1.0, 0.0, 1.0, 2.5, 3.0, 3.5] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0, 2.5, 3.0, 3.5] + end z1 = view(CategoricalVector{Float64}([100.0, 101.0]), 1) z2 = view(CategoricalVector{Float64}([102.0, 103.0]), 1:1) - append!(x, z1) - append!(x, z2) - @test length(x) == 11 - @test x == [-1.0, 0.0, 1.0, -1.0, 0.0, 1.0, 2.5, 3.0, 3.5, 100.0, 102.0] - @test isordered(x) === ordered - @test levels(x) == [-1.0, 0.0, 1.0, 2.5, 3.0, 3.5, 100.0, 101.0, 102.0, 103.0] + if ordered + @test_throws OrderedLevelsException append!(x, y) + @test length(x) == 6 + @test x == [-1.0, 0.0, 1.0, -1.0, 0.0, 1.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0] + else + append!(x, z1) + append!(x, z2) + @test length(x) == 11 + @test x == [-1.0, 0.0, 1.0, -1.0, 0.0, 1.0, 2.5, 3.0, 3.5, 100.0, 102.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0, 2.5, 3.0, 3.5, 100.0, 101.0, 102.0, 103.0] + end end end @@ -1261,19 +1416,37 @@ end b = ["x","y",missing] y = CategoricalVector{Union{String, Missing}}(b) - append!(x, y) - @test length(x) == 9 - @test isordered(x) === ordered - @test levels(x) == ["a", "b", "x", "y"] - @test x ≅ [a; a; b] + if ordered + @test_throws OrderedLevelsException append!(x, y) + @test x ≅ [a; a] + @test levels(x) == ["a", "b"] + @test isordered(x) === ordered + @test length(x) == 6 + else + append!(x, y) + @test length(x) == 9 + @test isordered(x) === ordered + @test levels(x) == ["a", "b", "x", "y"] + @test x ≅ [a; a; b] + end + z1 = view(CategoricalVector{Union{String, Missing}}([missing, "ex2"]), 1) z2 = view(CategoricalVector{Union{String, Missing}}(["ex3", "ex4"]), 1:1) - append!(x, z1) - append!(x, z2) - @test length(x) == 11 - @test isordered(x) === ordered - @test levels(x) == ["a", "b", "x", "y", "ex2", "ex3", "ex4"] - @test x ≅ [a; a; b; missing; "ex3"] + if ordered + @test_throws OrderedLevelsException append!(x, z1) + @test_throws OrderedLevelsException append!(x, z2) + @test x ≅ [a; a] + @test levels(x) == ["a", "b"] + @test isordered(x) === ordered + @test length(x) == 6 + else + append!(x, z1) + append!(x, z2) + @test length(x) == 11 + @test isordered(x) === ordered + @test levels(x) == ["a", "b", "x", "y", "ex2", "ex3", "ex4"] + @test x ≅ [a; a; b; missing; "ex3"] + end end @testset "Float64" begin @@ -1288,19 +1461,133 @@ end b = [2.5, 3.0, missing] y = CategoricalVector{Union{Float64, Missing}}(b) - append!(x, y) - @test length(x) == 9 - @test x ≅ [a; a; b] - @test isordered(x) === ordered - @test levels(x) == [0.0, 0.5, 1.0, 2.5, 3.0] + if ordered + @test_throws OrderedLevelsException append!(x, y) + @test length(x) == 6 + @test x == [a; a] + @test isordered(x) === ordered + @test levels(x) == [0.0, 0.5, 1.0] + else + append!(x, y) + @test length(x) == 9 + @test x ≅ [a; a; b] + @test isordered(x) === ordered + @test levels(x) == [0.0, 0.5, 1.0, 2.5, 3.0] + end + z1 = view(CategoricalVector{Union{Float64, Missing}}([missing, 101.0]), 1) z2 = view(CategoricalVector{Union{Float64, Missing}}([102.0, 103.0]), 1:1) - append!(x, z1) - append!(x, z2) - @test length(x) == 11 - @test x ≅ [a; a; b; missing; 102.0] + if ordered + @test_throws OrderedLevelsException append!(x, z1) + @test_throws OrderedLevelsException append!(x, z2) + @test length(x) == 6 + @test x == [a; a] + @test isordered(x) === ordered + @test levels(x) == [0.0, 0.5, 1.0] + else + append!(x, z1) + append!(x, z2) + @test length(x) == 11 + @test x ≅ [a; a; b; missing; 102.0] + @test isordered(x) === ordered + @test levels(x) == [0.0, 0.5, 1.0, 2.5, 3.0, 101.0, 102.0, 103.0] + end + end +end + +@testset "push! ordered=$ordered" for ordered in (false, true) + @testset "push! String" begin + a = ["a", "b", "c"] + x = CategoricalVector{String}(a, ordered=ordered) + + push!(x, "a") + @test x == ["a", "b", "c", "a"] + @test isordered(x) === ordered + @test levels(x) == ["a", "b", "c"] + + if ordered + @test_throws OrderedLevelsException push!(x, "z") + @test isordered(x) === ordered + @test x == ["a", "b", "c", "a"] + @test levels(x) == ["a", "b", "c"] + else + push!(x, "z") + @test isordered(x) === ordered + @test x == ["a", "b", "c", "a", "z"] + @test levels(x) == ["a", "b", "c", "z"] + end + + b = ["z","y","x"] + y = CategoricalVector{String}(b) + if ordered + @test_throws OrderedLevelsException push!(x, y[1]) + @test isordered(x) === ordered + @test x == ["a", "b", "c", "a"] + @test levels(x) == ["a", "b", "c"] + else + push!(x, y[1]) + @test isordered(x) === ordered + @test x == ["a", "b", "c", "a", "z", "z"] + @test levels(x) == ["a", "b", "c", "x", "y", "z"] + end + end + + @testset "push! Float64" begin + a = [-1.0, 0.0, 1.0] + x = CategoricalVector{Float64}(a, ordered=ordered) + + push!(x, 0.0) + @test x == [-1.0, 0.0, 1.0, 0.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0] + + if ordered + @test_throws OrderedLevelsException push!(x, 3.0) + @test x == [-1.0, 0.0, 1.0, 0.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0] + else + push!(x, 3.0) + @test x == [-1.0, 0.0, 1.0, 0.0, 3.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0, 3.0] + end + + b = [2.5, 3.0, 3.5] + y = CategoricalVector{Float64}(b, ordered=ordered) + if ordered + @test_throws OrderedLevelsException push!(x, y[1]) + @test x == [-1.0, 0.0, 1.0, 0.0] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0] + else + push!(x, y[1]) + @test x == [-1.0, 0.0, 1.0, 0.0, 3.0, 2.5] + @test isordered(x) === ordered + @test levels(x) == [-1.0, 0.0, 1.0, 2.5, 3.0, 3.5] + end + end +end + +@testset "append! ordered=$ordered" for ordered in (false, true) + cases = (["b", "a", missing], Union{String, Missing}["b", "a", "b"]) + @testset "String, has missing: $(any(ismissing.(a)))" for a in cases + x = CategoricalVector{Union{String, Missing}}(a, ordered=ordered) + + push!(x, missing) + @test x ≅ [a; missing] + @test levels(x) == ["a", "b"] + @test isordered(x) === ordered + end + + @testset "Float64" begin + a = 0.0:0.5:1.0 + x = CategoricalVector{Union{Float64, Missing}}(a, ordered=ordered) + + push!(x, missing) + @test x ≅ [a; missing] @test isordered(x) === ordered - @test levels(x) == [0.0, 0.5, 1.0, 2.5, 3.0, 101.0, 102.0, 103.0] + @test levels(x) == [0.0, 0.5, 1.0] end end diff --git a/test/16_recode.jl b/test/16_recode.jl index 277c4d2d..54c13cf6 100644 --- a/test/16_recode.jl +++ b/test/16_recode.jl @@ -434,7 +434,7 @@ end end end -@testset "Recoding from $(typeof(x)) to Int/String (i.e. Any), with index and levels in different orders" for +@testset "Recoding from $(typeof(x)) to Int/String (i.e. Any), with levels in custom order" for x in (10:-1:1, CategoricalArray(10:-1:1)) y = @inferred recode(x, 0, 1=>"a", 2:4=>"c", [5; 9:10]=>"b") @@ -447,7 +447,7 @@ end @test typeof(y) === Vector{Any} end - # Recoding from Int to String via default, with index and levels in different orders + # Recoding from Int to String via default, with levels in custom order y = @inferred recode(x, "x", 1=>"a", 2:4=>"c", [5; 9:10]=>"b") @test y == ["b", "b", "x", "x", "x", "b", "c", "c", "c", "a"] if isa(x, CategoricalArray) diff --git a/test/runtests.jl b/test/runtests.jl index f1185744..588b15db 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -11,14 +11,11 @@ module TestCategoricalArrays using CategoricalArrays tests = [ - "01_typedef.jl", - "02_buildorder.jl", - "03_buildfields.jl", + "01_value.jl", "04_constructors.jl", "05_convert.jl", "05_copy.jl", "06_show.jl", - "06_length.jl", "07_levels.jl", "08_equality.jl", "08_string.jl",