Make CategoricalValue <: AbstractString #77

nalimilan · 2017-09-14T20:08:13Z

Also implement all methods of the AbstractString interface.
Unfortunately, CategoricalValue{T} <: AbstractString even when
!(T <: AbstractString), but in practice it should not be a big problem.

Also implement all methods of the AbstractString interface. Unfortunately, CategoricalValue{T} <: AbstractString even when !(T <: AbstractString), but in practice it should not be a big problem.

ararslan · 2017-09-15T20:01:45Z

I can see why this would be practically useful but IMO conceptually it's pretty weird.

nalimilan · 2017-09-16T10:12:37Z

I'm not sure what "conceptually" exactly means, can you elaborate? :-)

In my experience, string operations can be useful even for clearly categorical data. For example, European regional codes are formed of the country code plus a digit (e.g. FR1), and it's often useful to extract the two first letters to get the country. If somebody reads a CSV file with regional codes and wants to create a country variable, it's nice that df[:country] = [code[1:2] for code in df[:region]] works even for CategoricalArray columns. Conceptually, I don't see what's the problem.

ararslan · 2017-09-16T18:35:33Z

It's not a problem per se; by "conceptually weird" I mean that the concept of every categorical value being an AbstractString seems odd to me. You could have an integer-valued factor with levels 1, 2, and 3, but they would be considered as strings from the perspective of the type system. It's not bad or wrong or even confusing, really, I just find the idea of it a little odd. I agree that it seems like in practice it would be useful though.

nalimilan · 2017-09-16T18:54:43Z

OK. In theory, yes, we could have integer categories, and indeed we still support them. Even though CategoricalValue <: AbstractString, string methods are only defined for CategoricalValue{<:AbstractString}. So they should never interfere with non-string types. Maybe in the future we will be able to say that only CategoricalValue{<:AbstractString} <: AbstractString, which would be ideal.

In practice, integer categorical variables seem to be very rare, since using a custom ordering of levels for integer codes would be really weird, and there are generally no storage gains to be had by using a CategoricalArray rather than a Vector{Int}. So I even wonder whether it makes sense to support anything other than strings; though supporting other types is pretty cheap.

ararslan · 2017-09-16T19:29:49Z

Yep. This is good as far as I'm concerned.

nalimilan · 2017-09-22T13:54:01Z

Turns out there's a practical issue, not just a conceptual one, at JuliaData/DataFrames.jl#1237. Since CategoricalValue{T} <: AbstractString even when T is not an AbstractString, dispatch sometimes uses a method which works only for strings, and it fails later because the data isn't actually a string. Even if we implement the full AbstractString interface, a package can always define a custom function which will break in some cases.

The best solution would of course to be having the ability to declare that only CategoricalValue{<:AbstractString} is an AbstractString, but that's currently not possible. I'm inclined to consider that CategoricalValue is mainly intended to work with strings, and that non-string uses cases are supported, but may not work in all situations. We could also enforce this type restriction, but that would prevent using it in cases where a non-string type would work.

bkamins · 2017-09-22T13:58:31Z

In my opinion CategoricalValue{T} <: AbstractString should imply that T <: AbstractString.
How do you want to assure all functions defined on AbstractString to always work on CategoricalValue{T} if !T<AbstractString?

nalimilan · 2017-09-22T14:02:02Z

Yeah, the answer is: you can't. Though in practice it works fine for many functions, that's why I'm not sure we should forbid non-string types.

bkamins · 2017-09-22T14:12:02Z

So I have the following question:
what type except <:AbstractString is:

immutable,
large enough to benefit from CategoricalArrays,
can be expected to have multiple identical values in a column,
really would be used in practice?

I would say that this is a rare case.
E.g. what is the benefit of using CatagoricalArrays{Char} instead of Char?
(I am not an expert on CategoricalArrays so maybe there is some benefit I am not aware of)

nalimilan · 2017-09-22T14:36:18Z

Honestly I don't see a serious use case except strings, but others might disagree (@araslan?).

cjprybol · 2017-09-22T19:02:03Z

The only serious non-string use case I can think of is using CategoricalArrays to sort and order values in DataFrames e.g. unstack. As long as this doesn't break existing uses like that, I think the practical benefits described so far of CategoricalValue{T} <: AbstractString make these changes a good idea until CategoricalValue{<:AbstractString} <: AbstractString works.

ararslan · 2017-09-22T19:26:42Z

Honestly I don't see a serious use case except strings

Perhaps declaring a variable to be categorical would be more efficient for group-by operations on DataFrames. However, I think the vast majority of situations in which CategoricalValue is useful is for string data. I can envision things like calling a yes/no indicator variable categorical or other things like that, but I'm not sure that declaring those to be categorical would really help all that much.

Perhaps @dmbates could provide some sagely wisdom here?

bkamins · 2017-09-22T20:10:10Z

Also related to an earlier comment by @nalimilan:

In practice, integer categorical variables seem to be very rare

For modeling purposes probably it would be nice if you could have Int vector treated as continuous variable and CategoricalValue{Int} treated as factor. It is quite common to have Likert scale encoded by numbers in source data.

After thinking about it given the current possibilities of subtyping I would not make CategoricalValue <: AbstractString. However, I would extend the list of predefined functions on CategoricalValue{AbstractString} (over the ones that are already defined) to give more or less a complete coverage of AbstractString.

The other option would be (I do not know what would be the performance but I understand that small unions are now fast?), names are examples:

CategoricalAny{T}: categorical type that can hold anything;
CategoricalString{T} <: T where {T<:AbstractString}: categorical type holding strings; (edit - removed unnecesary second subtype clause)
CategoricalValue{T} = Union{CategoricalAny{T}, CategoricalString{T}}

with constructor for CategoricalValue{T} that properly dispatches to CategoricalAny{T}or CategoricalString{T}. Would it work then?

dmbates · 2017-09-22T20:37:01Z

As @bkamins indicates, it is not uncommon for the levels of ordered categorical data to be numbers. The levels for unordered categorical data are just labels and could be restricted to String. I sometimes use Char or even Symbol. The latter may be a misuse.

nalimilan · 2017-09-23T10:24:55Z

I guess we could offer another mechanism to mark that an integer variable should be considered as categorical rather than continuous in models. If you have a Vector{Int}, it would be really silly to force you to convert it to CategoricalArray for that. A simple wrapper type (or even a special formula syntax) would be enough. Cf. JuliaData/DataFrames.jl#867. What is particularly problematic with CategoricalArray{Int} is that if you reorder the levels, < will not respect the standard comparison rules for real, which would be quite disturbing. So in that case better convert them to String to make it clear they are not numbers.

After thinking about it given the current possibilities of subtyping I would not make CategoricalValue <: AbstractString. However, I would extend the list of predefined functions on CategoricalValue{AbstractString} (over the ones that are already defined) to give more or less a complete coverage of AbstractString.

@bkamins That's the approach I had taken, but it actually doesn't work because packages are free to create new methods. So even if I add escape_string, a user may run into issues at some point with a particular function, and we may never know about it.

The solution to use CategoricalAny and CategoricalString would work, but introducing Union into the picture just for a corner case isn't ideal. We would rather have two different types of categorical arrays, and categorical would choose the appropriate one. We can always add this kind of thing at any point, but for now I'd rather keep the design simple and concentrate on the main use case: strings.

Finally, the case of unstack is quite specific, and actually looking at the code it seems we could make it more efficient by not allocating a CategoricalArray copy for non-categorical columns, but instead compute codes on the fly (with an optimized path for CategoricalArray). So I'd say it's mostly a separate issue.

bkamins · 2017-09-23T14:46:00Z

@nalimilan Would the following be an acceptable solution?

Keep CategoricalValue{T} accept any T and be subtype of AbstractString;
Normally accessing the value held in x::CategoricalValue{T} should return string representation of the value held in x or null;
Have a special get function (something like get raw) that would return the underlying value held in x;

Then a documentation should be clear about that difference and when what should be used, because the corner case is that T is eg. Float64 and two different categorical values have the same string representation (so. e.g. comparison of those returned strings should not be used for equality tests, but either CategoricalValue{T} itself should be used for comparisons or get raw should be used).

nalimilan · 2017-09-24T15:35:31Z

That hybrid approach is interesting, but it sounds too complex to me. To make it reasonably efficient, we would have to convert each level to a string and keep a copy to avoid allocating a new one on each access.

The lesson we have learned from the experience with PooledDataArray is that CategoricalArray should not be just an efficient way of storing values of any type, but a type which a restricted set of features adapted to categorical data. So better keep it simple for strings, and not add too many features for other types which are unusual for categorical data. It's always easier to add features later if we feel the need for them, than to remove them.

bkamins · 2017-09-24T18:31:20Z

Good approach :).
Actually then CategoricalArray should ideally accept (looking at my past experience and @dmbates suggestions):

<:AbstractString
<:Integer
::Symbol
::Char

All should be relatively simple to handle (in particular they all have a property that there is a unique mapping from their value to their string representation (edit here about the mapping to be more precise) so there will be no problem with uniqueness of values) and we do not have to cover every possibility from the start (e.g. initially the user would be instructed to convert to <:AbstractString anything that is intended to be stored in CategoricalAray).

Additionally (this is covered in the above, but current CategoricalArrays do not have this property): I would recommend that whatever is decided CategoricalValue{T} constructor should throw an error if any part of T is mutable (i.e. T is mutable or recursively any field contained in T is mutable).

alyst · 2017-10-16T17:08:03Z

So what is exactly an alternative to e.g. CategoricalArray{Float64} (suppose it is used to denote the timepoint of an experiment) if it's going to be deprecated?
Of course, the simplest is to convert it to a string and make the latter a category, but that's exactly what I didn't like in R. These tostring()/parse() roundtrips are pretty annoying.

We cannot do CategoricalValue{<:AbstractString} <: AbstractString [yet], but similar functionality could be implemented through traits.
If there are some specific operations that now require <: AbstractString (what are those exactly?) to work transparently, maybe it's still time to propose the change to Julia string API?

nalimilan · 2017-10-16T17:47:08Z

Would a PooledArray suit your needs? The idea is that if all you're looking for is a way to store a float efficiently, CategoricalArray is not the best structure. The fact that levels can be reordered can give weird results with a float since < can be inconsistent with operations on reals. When the order is already given by the type, a CategoricalArray doesn't add much. Can you describe your use case?

I agree it's annoying that the string interface isn't based on traits, but I don't think it's realistic to expect it to be changed by 1.0. There are no operations that require <:AbstractString per se, but since CSV.jl is going to create CategoricalArray columns by default when the number of unique values is small, I wanted to make it as practical as possible by making CategoricalValue implement the string interface.

alyst · 2017-10-16T21:13:47Z

Can you describe your use case?

I'm using CategoricalArray{Float64} to define a timepoint of an experiment.
There is finite (usually, <10) number of timepoints, with thousands of measurements associated to each, so it's natural to make timepoint a factor (for GLMs; to exclude any floating point comparison issues; to have error checking/diagnostic for incorrect values).
But it would be also convenient to make it CategoricalValue{Float64}, not CategoricalValue{String} as it helps for e.g. plotting: the timepoints are unevenly distributed, so CategoricalValue{String} would plot it wrong unless you would convert it back to Float64.

The fact that levels can be reordered can give weird results with a float since < can be inconsistent with operations on reals.

IIUC, by default CategoricalArray{<:Number} would have the natural order, so the reordering of the levels have to be done explicitly.
It's also not something that the novice user is likely to accidentally type. So if the user really wants the weird order, let him have it.
But allowing only CategoricalArray{String} doesn't prevent misordering. Actually, it just makes it easy. E.g for "1", "2", "10" the default lexicographic order would be "1", "10", "2".

Another aspect of "transparent" string operations is nullable data.
Right now it doesn't always work as expected:

julia> match.(r"a", categorical(["a", "b", "ab", null]))
ERROR: MethodError: no method matching match(::Regex, ::Nulls.Null)
....

julia> string.(categorical(["a", "b", "ab", null]))
4-element Array{String,1}:
 "a"   
 "b"   
 "ab"  
 "null"

This is easy to fix, but the last example ("null") shows that there are some decisions to make, and some of them may be not obvious for every user.
So making the user to write an explicit get() could help avoiding some surprises.

CSV.jl is going to create CategoricalArray columns by default

This is a little bit OT (is there a ticket # for CSV.jl? I couldn't find it myself), but actually Hadley's readr made a change in an opposite direction (from base R's read.table(stringsAsFactors=TRUE)), and this must have been based on quite some real cases.
Personally I find strings by default a more reliable/reproducible behaviour as these guesses are often incorrect. (for corner cases there's always col_types=)
What is worse, different files of the same format may give different predictions; the script that was working fine for dataset1.csv may unexpectedly fail for dataset2.csv.
My only hope is that DataFrames.jl would not fully replicate R's data.frame(stringsAsFactors=TRUE) ;)

ararslan · 2017-10-16T21:35:46Z

My only hope is that DataFrames.jl would not fully replicate R's data.frame(stringsAsFactors=TRUE) ;)

💯 Let's avoid assuming things about users' data as much as possible

bkamins · 2017-10-16T22:30:26Z

@alyst if you put 0.0 and -0.0 into CategoricalArray would you want it to be the same category or two separate categories? (if you make them the same you lose byte level information; if you make them different then this is against == equality)

Switching to a general view: in my opinion CategoricalArray in principle should be used for modeling purposes only therefore I would recommend to require an explicit conversion of raw data to CategoricalArray. If we want to have an array type that performs string (or any other immutable type) interning to save space this should be completely transparent to the user and is a different use case.

I think that CategoricalArray <: AbstractString in general is not a crucial feature, but an important thing is to make sure that T in CategoricalArray{T} is clearly unique not to surprise the user with an unexpected behavior. That is why I believe that allowing only AbstractString, Integer, Char and Symbol is a good restriction to start with (this can be always made looser in the future). Allowing Rational could be considered as they are clearly unique.

In the above by clearly unique I mean that == is true only for identical objects and that their natural representation as string is also unique.

nalimilan · 2017-10-17T08:52:16Z

There is finite (usually, <10) number of timepoints, with thousands of measurements associated to each, so it's natural to make timepoint a factor (for GLMs; to exclude any floating point comparison issues; to have error checking/diagnostic for incorrect values).
But it would be also convenient to make it CategoricalValue{Float64}, not CategoricalValue{String} as it helps for e.g. plotting: the timepoints are unevenly distributed, so CategoricalValue{String} would plot it wrong unless you would convert it back to Float64.

I'm not sure why you find it natural. You won't save a lot of space by using a CategoricalArray (better use Float32 for that), and you'll introduce a lot of complexity. You won't fix any floating point comparison issues either, since unique levels will be determined using isequal anyway. And what kind of error checking/diagnostic are you referring to? The only advantage I can see is that they won't be treated as a numeric variable in GLMs.

Also, I'm not suggesting converting your data to CategoricalArray{String}. That would make no sense. I'm suggesting using the Float64 values directly, or using PooledArray if all you want is save some RAM.

IIUC, by default CategoricalArray{<:Number} would have the natural order, so the reordering of the levels have to be done explicitly.
It's also not something that the novice user is likely to accidentally type. So if the user really wants the weird order, let him have it.

My point is that for reals the possibility to use a custom ordering for levels makes no sense, yet it's the main reason to use a CategoricalArray. So if that's not an advantage and that it's even a potential risk if you set an ordering different from the mathematical ordering, what's the point? Note that the risk is not so limited:

julia> x = CategoricalVector{Int}(2);

julia> x[1] = 100;

julia> x[2] = 0;

julia> levels(x)
2-element Array{Int64,1}:
 100
   0

By default < doesn't work as long as you don't mark the array as ordered, but if you add a new value to an already ordered array, < will go against mathematical ordering.

That said, I wouldn't be opposed to be PR introducing CategoricalString <: AbstractCategoricalValue so that we can keep the general CategoricalValue type as is, as @bkamins suggested above. In general I prefer keeping things as open as possible and leave users decide what's the most appropriate for them as long as it doesn't hurt the most standard use case. Would you be willing to work on this before I deprecate CategoricalValue{<:T} for non-AbstractString T? Sorry for rushing things, but that's going to block the new DataFrames release since people moving from PooledDataArray will try to use non-string types, so we must at least warn them that it's not supported since it triggers bugs e.g. with DataFrames printing.

The choice of creating CategoricalArrays by default is a relatively independent issue, which has basically be done for performance. Discussion happened at JuliaData/DataFrames.jl#895 (and linked issues), please add your arguments there. That's a difficult decision, but performance gains are so large for most cases that it's hard to resist. I precisely decided to make CategoricalValue <: AbstractString to avoid repeating the same mistakes as stringsAsFactors=TRUE in R: since CategoricalValue would then behave like a string, it shouldn't be a problem to use them by default. Contrary to R we're still young so we could easily switch back if experience shows it's problematic.

The question of the treatment of missing values is quite separate too. The point is precisely that you don't need to call string.() since CategoricalValue objects are already strings. In fact, you're arguing in favor of CategoricalValue <: AbstractString, since without it people will call string.() to be able to use them as strings.

alyst · 2017-10-17T09:00:33Z

if you put 0.0 and -0.0 into CategoricalArray would you want it to be the same category or two separate categories?

there's also NaN that is not equal to itself. So right now

julia> categorical([-0.0, 0.0, 1.0, NaN, NaN])
ERROR: ArgumentError: cannot remove level NaN as it is used at position 4. Change the array element type to Union{Float64, Null} using convert if you want to transform some levels

julia> categorical([-0.0, 0.0, 1.0])
3-element CategoricalArrays.CategoricalArray{Float64,1,UInt32}:
 -0.0
 0.0 
 1.0

I would say, NaN behaviour looks too restrictive, we can have NaNs in numeric data, why not in categorical, NaN != null. Overall, the rule that x and y are in the same category iff isequal(x, y) makes total sense.
But the full answer is that the proper behaviour depends on the use case, and it's very easy to adjust (change -0.0 to 0.0 before conversion to categorical, convert number to string etc).

CategoricalArray in principle should be used for modeling purposes only therefore I would recommend to require an explicit conversion of raw data to CategoricalArray. If we want to have an array type that performs string (or any other immutable type) interning to save space this should be completely transparent to the user and is a different use case

+1 for explicit conversion, -1 to any alternative categorical data types. JuliaData is suffering a lot from fragmentation already. Categories are useful to save space and modeling, but I also hope to use categories for enforcing [referential] data integrity.

I believe that allowing only AbstractString, Integer, Char and Symbol is a good restriction to start with

That's what the founding fathers of Julia would probably call "passive agressive" ;) "We don't let you do X because you can misuse it". I think the general principle should be

the behavior should meet the common sense in the typical scenarios (the types you listed), so it "just works" for the average user
there is simple uniform logic that covers as much types as possible (at the expense of being subtle in some non-default scenarios), so that power users have control/understanding of what's going on.

In addition to AbstractFloat it would be so nice to support ranges for cutting data, because parsing the factor labels or implementing any other struts to extract ranges is very ugly.

alyst · 2017-10-17T09:48:12Z

@nalimilan

You won't save a lot of space by using a CategoricalArray (better use Float32 for that), and you'll introduce a lot of complexity. You won't fix any floating point comparison issues either, since unique levels will be determined using isequal anyway. And what kind of error checking/diagnostic are you referring to? The only advantage I can see is that they won't be treated as a numeric variable in GLMs.

CategoricalArray{Float64, UInt8} should be also quite efficient for space saving, but that's not my primary concern.
With levels(A) I can easily diagnose what values are in the column and whether something needs fixing.
If it would be possible to restrict the implicit level addition, I would have some data integrity checks, which are not possible with float.
levels(A) == levels(B) helps a lot already.

So if that's not an advantage and that it's even a potential risk if you set an ordering different from the mathematical ordering, what's the point? Note that the risk is not so limited: ...

But the same applies to any other type as long as implicit level addition is possible.
With this code pattern you can have "Bob" < "Alice" as well.
That's why IMO implicit level addition is a dangerous thing when enabled by default for ordered categories.
It can also lead to misuses of categorical arrays (aggressive data transformations resulting in length(pool) ~ length(array) that should be done on non-categorical data instead).

In general I prefer keeping things as open as possible and leave users decide what's the most appropriate for them as long as it doesn't hurt the most standard use case.

That's the point. I don't see how CategoricalArray{<:Number} hurts the standard use cases.
AFAIU so far CategoricalValue <: AbstractString wasn't crucial for the default scenarios either.

In fact, you're arguing in favor of CategoricalValue <: AbstractString, since without it people will call string.() to be able to use them as strings.

I'm not arguing for that, because I don't want CSV.jl to create categories by default in the first place. :)
In fact, I put the string() example because it was introduced by this PR.

nalimilan · 2017-10-17T16:01:02Z

Actually I think we could make things safer by calling ordered!(x, false) when adding a level from setindex! (in addition to adding a field to disallow adding levels implicitly at all). Should be easy to do.

As I see it, CategoricalValue <: AbstractString is crucial for default scenarios since it's really convenient to be able to treat string categorical values as strings in some contexts. I think I've mentioned elsewhere the example of country + region codes, which are the poster child for categorical variables, and yet can sometimes be treated as strings if you want to extract the two-letter country prefix. Or if you have country codes and region codes in separate (categorical) variables and need to paste them to compute the full codes. Even if you don't consider it as the number-one feature, having to choose between supporting non-string types and supporting string operations is annoying.

How about my suggestion to experiment with CategoricalString <: AbstractCategoricalValue? Would you be willing to investigate this solution?

alyst · 2017-10-17T23:13:49Z

Actually I think we could make things safer by calling ordered!(x, false) when adding a level from setindex! (in addition to adding a field to disallow adding levels implicitly at all). Should be easy to do.

That's a nice way to force the user to re-sort the levels.
One potential problem with implicit ordered!(x, false) is that the same code may end up with x being ordered or not depending on the input data, but I guess it's a minor issue.

How about my suggestion to experiment with CategoricalString <: AbstractCategoricalValue?

So your idea is to:

relax the requirement that the values returned by CategoricalArray are of type CategoricalValue{V} and allow any arbitrary class S that implements "categorical value concept", that is at least get(s::S)::V, eltype(s::S) = V and iscategorical(s::S) = true (so essentially instead of String traits that do not exist, we would introduce categorical traits and use them in the dependent packages instead of CategoricalValue).
Unfortunately, CategoricalString <: AbstractCategoricalValue would not work, as it has to be <: AbstractString, so there would be CategoricalValue{T} for any T except <:AbstractString as before plus CategoricalString
in values.jl all methods would have to be updated to dispatch by something like Union{CategoricalValue, CategoricalString}
additionally, CategoricalString would define string interface methods.
for CategoricalArray(String[]) outer ctor the result would be of type CategoricalArray{...CategoricalString...}, for any other types CategoricalArray{...CategoricalValue{T}...} as before
?

nalimilan · 2017-10-18T08:59:06Z

That's more or less what I have in mind, yes. But as you spotted the lack of multiple inheritance makes it less clean in terms of type hierarchy than it could be. Though we don't really need AbstractCategoricalValue, we can simply define a type alias const CatValue = Union{CategoricalValue, CategoricalString}. If at some point other packages want to extend this, we could create a trait (and maybe they will be supported in Base by then).

These changes should actually be quite easy to do since CategoricalPool is already parameterized on the CategoricalValue due to circular definition issues. CategoricalArray would just need to have its V parameter be a CatValue instead of being the element type of the value. Then all dispatch on CategoricalValue would have to use CatValue instead, and the CategoricalArray constructors would have to choose CategoricalString or CategoricalValue depending on the element type.

bkamins · 2017-10-18T09:36:32Z

I like Union approach 👍.

nalimilan · 2019-06-09T15:21:29Z

Since we now have PooledArray, which is used by CSV.jl by default instead of CategoricalArray, it would be less of an issue to drop CategoricalString and require users to call String(x::CategoricalValue{String}) when they want to do string operations. The (only) advantage is that the design (and the number of methods) could be simplified again.

#198 deprecates all AbstractString methods for CategoricalString so that we then can deprecate it in favor of CategoricalValue{String} in a second phase without breaking user's code.

nalimilan mentioned this pull request Sep 14, 2017

Support CategoricalArray parsing JuliaData/CSV.jl#43

Closed

nalimilan force-pushed the nl/string branch from 090974c to da6563e Compare September 15, 2017 08:10

Make CategoricalValue <: AbstractString

c650265

Also implement all methods of the AbstractString interface. Unfortunately, CategoricalValue{T} <: AbstractString even when !(T <: AbstractString), but in practice it should not be a big problem.

nalimilan force-pushed the nl/string branch from da6563e to c650265 Compare September 15, 2017 08:28

JuliaData deleted a comment from codecov bot Sep 15, 2017

nalimilan mentioned this pull request Sep 19, 2017

Pooling strings JuliaData/DataFrames.jl#895

Closed

nalimilan merged commit cff2b8b into master Sep 19, 2017

nalimilan deleted the nl/string branch September 19, 2017 07:48

nalimilan mentioned this pull request Sep 22, 2017

Improve show in corner cases JuliaData/DataFrames.jl#1237

Merged

This was referenced Oct 11, 2017

Show for columns of CategoricalVector{Int} JuliaData/DataFrames.jl#1250

Closed

fix ambiguous method with Base.in(::CategoricalValue, ::Set) #83

Merged

cjprybol mentioned this pull request Oct 17, 2017

Deprecate use of categorical! function on non-AbstractString columns JuliaData/DataFrames.jl#1254

Closed

alyst mentioned this pull request Oct 17, 2017

Fix #43 by supporting CategoricalArray parsing. JuliaData/CSV.jl#102

Merged

alyst mentioned this pull request Oct 18, 2017

Allow arbitrary "categorical value" types #87

Merged

nalimilan mentioned this pull request Apr 29, 2022

Add parse method #392

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CategoricalValue <: AbstractString #77

Make CategoricalValue <: AbstractString #77

nalimilan commented Sep 14, 2017 •

edited

Loading

ararslan commented Sep 15, 2017

nalimilan commented Sep 16, 2017

ararslan commented Sep 16, 2017

nalimilan commented Sep 16, 2017

ararslan commented Sep 16, 2017

nalimilan commented Sep 22, 2017

bkamins commented Sep 22, 2017

nalimilan commented Sep 22, 2017

bkamins commented Sep 22, 2017

nalimilan commented Sep 22, 2017

cjprybol commented Sep 22, 2017

ararslan commented Sep 22, 2017

bkamins commented Sep 22, 2017 •

edited

Loading

dmbates commented Sep 22, 2017

nalimilan commented Sep 23, 2017

bkamins commented Sep 23, 2017

nalimilan commented Sep 24, 2017

bkamins commented Sep 24, 2017 •

edited

Loading

alyst commented Oct 16, 2017 •

edited

Loading

nalimilan commented Oct 16, 2017

alyst commented Oct 16, 2017 •

edited

Loading

ararslan commented Oct 16, 2017

bkamins commented Oct 16, 2017

nalimilan commented Oct 17, 2017

alyst commented Oct 17, 2017

alyst commented Oct 17, 2017

nalimilan commented Oct 17, 2017

alyst commented Oct 17, 2017

nalimilan commented Oct 18, 2017

bkamins commented Oct 18, 2017

nalimilan commented Jun 9, 2019

Make CategoricalValue <: AbstractString #77

Make CategoricalValue <: AbstractString #77

Conversation

nalimilan commented Sep 14, 2017 • edited Loading

ararslan commented Sep 15, 2017

nalimilan commented Sep 16, 2017

ararslan commented Sep 16, 2017

nalimilan commented Sep 16, 2017

ararslan commented Sep 16, 2017

nalimilan commented Sep 22, 2017

bkamins commented Sep 22, 2017

nalimilan commented Sep 22, 2017

bkamins commented Sep 22, 2017

nalimilan commented Sep 22, 2017

cjprybol commented Sep 22, 2017

ararslan commented Sep 22, 2017

bkamins commented Sep 22, 2017 • edited Loading

dmbates commented Sep 22, 2017

nalimilan commented Sep 23, 2017

bkamins commented Sep 23, 2017

nalimilan commented Sep 24, 2017

bkamins commented Sep 24, 2017 • edited Loading

alyst commented Oct 16, 2017 • edited Loading

nalimilan commented Oct 16, 2017

alyst commented Oct 16, 2017 • edited Loading

ararslan commented Oct 16, 2017

bkamins commented Oct 16, 2017

nalimilan commented Oct 17, 2017

alyst commented Oct 17, 2017

alyst commented Oct 17, 2017

nalimilan commented Oct 17, 2017

alyst commented Oct 17, 2017

nalimilan commented Oct 18, 2017

bkamins commented Oct 18, 2017

nalimilan commented Jun 9, 2019

nalimilan commented Sep 14, 2017 •

edited

Loading

bkamins commented Sep 22, 2017 •

edited

Loading

bkamins commented Sep 24, 2017 •

edited

Loading

alyst commented Oct 16, 2017 •

edited

Loading

alyst commented Oct 16, 2017 •

edited

Loading