Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use NullableArrays and CategoricalArrays #145

Merged
merged 9 commits into from
Sep 29, 2016
Merged

Conversation

randy3k
Copy link
Member

@randy3k randy3k commented Sep 26, 2016

TODO

  • Handle ordered factor
  • Optionally convert scalar NA to Nullable instead of NullableArray

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

import Compat.String

import DataStructures: OrderedDict

import Base: eltype, show, convert, isascii,
import Base: eltype, show, convert, isascii, isnull,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need import for this, merely using AFAICT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is because RCall also has a isnull method.

@test rcopy(Nullable, RObject(1)).value == 1
@test rcopy(Nullable, RObject("abc")).value == "abc"
@test rcopy(RObject(Nullable(1))) == 1
@test rcopy(Nullable, RObject(Nullable(1, true))).isnull
Copy link
Member

@nalimilan nalimilan Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really avoid accessing the isnull field directly, as it's likely to go away soon (JuliaLang/julia#18510). Use the accessor instead. (This also applies in other places.)

@test rcopy(DataArray,"c(NA,'NA')").na == @data([NA,"NA"]).na
@test_throws ErrorException rcopy(DataArray,"as.factor(c('a','a','c'))")
@test rcopy(PooledDataArray,"as.factor(c('a','a','c'))").pool == ["a","c"]
@test rcopy(NullableArray,"c(NA,TRUE)").isnull == NullableArray([true,true], [true,false]).isnull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for NullableArray. You can use isequal to test for equality of both contents and missingness, which is even better.

Copy link
Member Author

@randy3k randy3k Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any method to compare the following withoout values?

isequal(NullableArray([1,2,3]), collect(1:3)) # true
isequal(NullableArray([1,2,3]).values, collect(1:3)) # false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you need to do NullableArray(1:3).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isequal(NullableArray(1:3), 1:3) or isequal(NullableArray(1:3), collect(1:3)) both return false..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant isequal(rcopy(NullableArray,"c(NA,TRUE)"), NullableArray([true,true], [true,false])and so on (i.e. convert both sides to aNullableArray`).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops.. i see

function sexp{T<:Compat.String,N,R<:Integer}(v::$typ{T,N,R})
rv = protect(sexp(v.refs))
try
for (i,isna) = enumerate(v.refs .== 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.refs .== 0 will allocate a vector. Better do == 0 inside the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching it

rv[i] = naeltype(eltype(rv))
end
end
setattrib!(rv, Const.LevelsSymbol, sexp(v.pool.index))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use CategoricalArrays.index(v), ordered(v) and levels(v). Private fields should really not be accessed directly as they could change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoricalArrays.index does not work?

julia> v = CategoricalArray(repeat(["a", "b"], inner = 5));

julia> CategoricalArrays.levels(v)
2-element Array{String,1}:
 "a"
 "b"

julia> CategoricalArrays.index(v)
ERROR: MethodError: no method matching index(::CategoricalArrays.CategoricalArray{String,1,UInt32})
Closest candidates are:
  index(::CategoricalArrays.CategoricalPool{T,R<:Integer,V}) at /Users/Randy/.julia/v0.5/CategoricalArrays/src/pool.jl:161

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed, at first I didn't intend it to be used outside of the package. For now the best approach is CategoricalArrays.index(v.pool). I'll try to see if I can improve the API in next releases.

@randy3k randy3k merged commit 303e97b into master Sep 29, 2016
@randy3k randy3k deleted the dataframes branch September 29, 2016 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants