port to Julia 0.7 #182

shashi · 2018-07-11T11:03:46Z

Replace DataValues.jl with Union{Missing,T} and NamedTuples.jl with native named tuples.

Get all but TableTraits.jl code to pass tests.

cc @davidanthoff I have commented tabletraits.jl code for now. Are you planning to make it possible to use it on 0.7 with Union{Missing, T}?

cc @quinnj

@piever maybe _is_subtype is now kind of redundant since Union{Missing, T} can be tested with <: for the same property?

- Remove Nullables and DataValues - colnames now returns a tuple - Remove NamedTuples dependency - Get test_core.jl to pass

piever · 2018-07-11T11:18:44Z

I think we should switch to Base.promote_typejoin to widen element when collecting an iterable of tuples, but @nalimilan probably knows more about this.

I would also like to get rid of is_subtype, but we need to make sure that Base.promote_typejoin(S, T) === T if and only if S <: T. If that's not the case we could even use Base.promote_typejoin(S, T) !== T as a condition to widen (meaning, widen only if it would actually get strictly wider).

nalimilan · 2018-07-11T11:40:41Z

I think we should switch to Base.promote_typejoin to widen element when collecting an iterable of tuples, but @nalimilan probably knows more about this.

Yes, that sounds right. Basically, do the same thing as map.

I would also like to get rid of is_subtype, but we need to make sure that Base.promote_typejoin(S, T) === T if and only if S <: T. If that's not the case we could even use Base.promote_typejoin(S, T) !== T as a condition to widen (meaning, widen only if it would actually get strictly wider).

I would say that's the case. At least promote_typejoin is equivalent to typejoin except for Union{T,Missing} and Union{T,Nothing}.

nalimilan · 2018-07-11T11:41:06Z

src/table.jl

@@ -28,7 +27,7 @@ struct NextTable{C<:Columns} <: AbstractIndexedTable
    # Cache permutations by various subsets of columns
    perms::Vector{Perm}
    # store what percent of the data in each column is unique
-    cardinality::Vector{Nullable{Float64}}
+    cardinality::Vector{Any}


Should this be Union{Float64,Missing}?

davidanthoff · 2018-07-22T22:21:32Z

TableTraits.jl (and the broader Queryverse.jl family of packages) will continue to use DataValue for missing values, I think the consensus in various discussions was that the combination of named tuples and Union{T,Missing} is not ready for the Query.jl use case in the julia 1.0 time frame.

I'm kind of surprised that you aren't running into some of the same issues? This is essentially still the same set of problems that we discussed in JuliaData/Missings.jl#6 and https://discourse.julialang.org/t/missing-data-and-namedtuple-compatibility/8136. Have you benchmarked this branch?

TableTraits.jl is ported to 0.7, do you want me to open a PR that reenables the TableTraits.jl integration?

andreasnoack · 2018-08-17T07:14:36Z

.travis.yml

@@ -4,7 +4,7 @@ os:
  - linux
  - osx
 julia:
-  - 0.6
+  - nightly


Should add 0.7 and 1.0 as well.

andreasnoack · 2018-08-17T07:15:40Z

Bump. It would be great to get this merged soon.

JeffBezanson · 2018-09-04T18:42:24Z

David's performance concern here is valid --- it might be better to do everything but the Missing change first.

shashi · 2018-09-04T23:01:39Z

@davidanthoff I think @piever's iterator based approach with widencolumns was key to making this possible with missing. He did strategically design for that with this change in mind.

quinnj · 2018-09-04T23:05:29Z

It would certainly be helpful if some benchmarks were actually performed. @piever, do you happen to have any comparisons you did in doing the widencolumns work?

shashi · 2018-09-04T23:08:36Z

Of course. I was just commenting on the point about not being able to use Iterators + missing.

shashi · 2018-09-04T23:14:36Z

Ah Okay I hadn't schooled myself on that discourse discussion, my bad.

davidanthoff · 2018-09-05T00:32:28Z

@piever's code is great, but I wouldn't expect it to solve the performance problems. I might be wrong, so benchmarking and comparing would be a good idea.

piever · 2018-09-05T11:00:43Z

At the time I mainly tested that it didn't lose performance compared to the previous inference based implementation in the type stable case: as Missing in Julia 0.6 was slow anyway I didn't test it. I suspect there may be issues in the case of multiple columns allowing missing data, with the combinatorial explosion of possible types, but I don't have benchmarks.

OTOH, I think that JuliaData/Tables.jl#10 is managing to combine the good ideas from collect_columns and the good ideas from Tables and may be the ideal implementation: in case my implementation here has performance issues we could try using that one instead (I also think it's slightly suboptimal to have to maintain two separate implementations of the same thing). It would allow several other optimizations as well like iterating lazy rows (see here) rather than materializing NamedTuples like we currently do.

On a related note, I also seem to remember that some functions had yet to be ported to the new iteration framework (join for example) and still rely on inference, so we probably need to keep that in mind.

tshort · 2018-09-05T16:04:23Z

test/test_collect.jl

-    v = [@NT(a = 1, b = 2), @NT(a = 1, b = 3)]
-    @test collect_columns(v) == Columns(@NT(a = Int[1, 1], b = Int[2, 3]))
+    v = [(a = 1, b = 2), (a = 1, b = 3)]
+    @test collect_columns(v) == Columns((a = Int[1, 1], b = Int[2, 3]))


It'd be nice to have a constructor for Columns using keyword args, so a level of parens can be dropped.

@test collect_columns(v) == Columns(a = Int[1, 1], b = Int[2, 3])

I think there is such a constructor; maybe this is just checking that passing a (named)tuple also works.

davidanthoff · 2018-09-06T20:05:24Z

I suspect there may be issues in the case of multiple columns allowing missing data, with the combinatorial explosion of possible types, but I don't have benchmarks.

Yes, that is the scenario I would worry about. Doesn't seem like a corner case to me :)

I think in general a good strategy would be to get this working on julia 0.7, with the minimal set of change, and then think about broader redesigns. No need to couple those two decisions and thereby delay everything by a lot.

quinnj · 2018-09-11T22:04:56Z

Just pushed a commit that gets this further for 0.7/1.0, some remaining test failures involve:

Some complaints about isnull not being defined: @davidanthoff is this a DataValues thing/change?
Some complaints about no method for setindex_shape_check(::Bool, ::Int64); I'm guessing that was an internal Base method that got changed?
A bunch of join tests are still checking DataValue missing vs. missing
MethodError: no method matching +(::Int64, ::UnitRange{Int64}) from [1] valuenames(::NDSparse{Tuple{Int64,Int64},Tuple{Int64,Int64},Columns{Tuple{Int64,Int64},Tuple{Array{Int64,1},Array{Int64,1}}},Columns{Tuple{Int64,Int64},Tuple{Array{Int64,1},Array{Int64,1}}}}) at /Users/jacobquinn/.julia/dev/IndexedTables/src/ndsparse.jl:271
And a PooledArray error like MethodError: no method matching copy!(::Array{String,1}, ::Int64, ::PooledArray{String,UInt8,1,Array{UInt8,1}}, ::Int64, ::Int64).

Can anybody else pick things up from here? If not, I can try to take another crack or two over the next few days.

piever · 2018-09-11T22:44:36Z

Some complaints about isnull not being defined: @davidanthoff is this a DataValues thing/change?

This is also affecting StatPlots, FWIU it is isna now.

JeffBezanson · 2018-09-11T22:56:05Z

The + should be .+ and copy! should be copyto!.

davidanthoff · 2018-09-12T02:02:37Z

Yes, isna is the new isnull on 0.7+ for DataValue.

I haven’t gotten the broadcasting stuff to work for DataValues.jl on 0.7 yet. Not sure that is used anywhere here, though, just a heads up.

shashi · 2018-09-12T02:28:11Z

BTW I did a simple map with tables of n missable/DataValue columns. It does scale exponentially with n if you're using Missing...

Nice work @quinnj ! I'll try to fix up the tests tomorrow (they still use missing and there are a couple of failures related to reflection). You can continue to fix other things...

piever · 2018-09-12T11:32:18Z

@quinnj did you happen to test the scaling of your buildcolumns implementation in case of unknown schema? If you have better performance we could try to port IndexedTables to use it, I suspect materializing the NamedTuples is bad for performance with several missable columns.

davidanthoff · 2018-09-12T14:54:04Z

I wouldn’t expect unknown Schemas to be the problem, I think iterators that produce streams of values with heterogeneous types are the culprit (and those you easily get in a projection with Missing, but not so easily with DataValue). Those are really distinct.

JeffBezanson · 2018-09-19T02:11:20Z

Got tests passing locally.

JeffBezanson · 2018-09-19T03:23:55Z

OK, this fails on 0.7 due to the conflicting export of select, but should pass on 1.0 once PooledArrays is updated.

JeffBezanson · 2018-09-19T06:55:27Z

Now with more TableTraits.

JeffBezanson · 2018-09-19T15:28:35Z

src/reduce.jl

@@ -431,23 +427,28 @@ end

 struct ApplyColwise{T}
    functions::T
-    names::Vector{Symbol}
+    names


Why this change?

JeffBezanson · 2018-09-19T16:47:18Z

Brought back the disabled tests. This should be pretty good now.

quinnj · 2018-09-20T13:46:10Z

Merge?

Shashi Gowda added 2 commits July 11, 2018 16:10

Julia 0.7 changes

fe08d7c

- Remove Nullables and DataValues - colnames now returns a tuple - Remove NamedTuples dependency - Get test_core.jl to pass

get more tests to pass

f3a4131

travis on 0.7

4f3b3fc

shashi force-pushed the s/0.7 branch from cf962fe to 4f3b3fc Compare July 11, 2018 11:28

nalimilan reviewed Jul 11, 2018

View reviewed changes

fix some deprecations

293013c

andreasnoack reviewed Aug 17, 2018

View reviewed changes

.travis.yml Outdated

@@ -4,7 +4,7 @@ os:

- linux

- osx

julia:

- 0.6

- nightly

Copy link

Member

andreasnoack Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add 0.7 and 1.0 as well.

andreasnoack mentioned this pull request Aug 22, 2018

Support for Julia 0.7/1.0 (i.e. NamedTuples in core) ? #186

Closed

Shashi Gowda and others added 2 commits August 23, 2018 21:27

more updates

9232b41

fix warnings from using IndexedTables

f9bd839

joshday mentioned this pull request Aug 23, 2018

Error with 0.7 & 1.0 JuliaData/JuliaDB.jl#218

Closed

fix some more depwarns

e39d7fc

tshort reviewed Sep 5, 2018

View reviewed changes

Shashi Gowda and others added 4 commits September 9, 2018 17:58

updates and some debug statements to track down hang

f22488b

[very wip] first attempt at reverting Missing

b8aaffa

fix Iterators.product wonkyness

91e29c1

Updates for 1.0

4eeea99

JeffBezanson force-pushed the s/0.7 branch from cc5b37c to 3018d30 Compare September 19, 2018 02:10

JeffBezanson force-pushed the s/0.7 branch from 3018d30 to 1e008d6 Compare September 19, 2018 02:18

JeffBezanson force-pushed the s/0.7 branch 2 times, most recently from 443f2e3 to 890e303 Compare September 19, 2018 06:55

JeffBezanson reviewed Sep 19, 2018

View reviewed changes

src/reduce.jl

@@ -431,23 +427,28 @@ end

struct ApplyColwise{T}

functions::T

names::Vector{Symbol}

names

Copy link

Contributor

JeffBezanson Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

more 1.0 fixes

a13cc94

JeffBezanson force-pushed the s/0.7 branch from 890e303 to a13cc94 Compare September 19, 2018 16:35

This was referenced Sep 24, 2018

Fix deprecations #183

Closed

WIP: Revise to work with new NamedTuple constructor #168

Closed

JeffBezanson changed the title ~~WIP port to Julia 0.7~~ port to Julia 0.7 Sep 24, 2018

JeffBezanson merged commit 4e0d676 into master Sep 24, 2018

simonbyrne mentioned this pull request Oct 4, 2018

Julia Version 1.0.0 "ERROR: LoadError: UndefVarError: T not defined" #188

Closed

shashi deleted the s/0.7 branch December 24, 2018 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

port to Julia 0.7 #182

port to Julia 0.7 #182

shashi commented Jul 11, 2018

piever commented Jul 11, 2018

nalimilan commented Jul 11, 2018

nalimilan Jul 11, 2018

davidanthoff commented Jul 22, 2018

andreasnoack Aug 17, 2018

andreasnoack commented Aug 17, 2018

JeffBezanson commented Sep 4, 2018

shashi commented Sep 4, 2018

quinnj commented Sep 4, 2018

shashi commented Sep 4, 2018

shashi commented Sep 4, 2018

davidanthoff commented Sep 5, 2018

piever commented Sep 5, 2018

tshort Sep 5, 2018

JeffBezanson Sep 6, 2018

davidanthoff commented Sep 6, 2018

quinnj commented Sep 11, 2018

piever commented Sep 11, 2018

JeffBezanson commented Sep 11, 2018

davidanthoff commented Sep 12, 2018

shashi commented Sep 12, 2018 •

edited

Loading

piever commented Sep 12, 2018

davidanthoff commented Sep 12, 2018

JeffBezanson commented Sep 19, 2018

JeffBezanson commented Sep 19, 2018

JeffBezanson commented Sep 19, 2018

JeffBezanson Sep 19, 2018

JeffBezanson commented Sep 19, 2018

quinnj commented Sep 20, 2018

@@ @@ -4,7 +4,7 @@ os: @@
                 - linux
                 - osx
               julia:
-                - 0.6
+                - nightly

port to Julia 0.7 #182

port to Julia 0.7 #182

Conversation

shashi commented Jul 11, 2018

piever commented Jul 11, 2018

nalimilan commented Jul 11, 2018

nalimilan Jul 11, 2018

Choose a reason for hiding this comment

davidanthoff commented Jul 22, 2018

andreasnoack Aug 17, 2018

Choose a reason for hiding this comment

andreasnoack commented Aug 17, 2018

JeffBezanson commented Sep 4, 2018

shashi commented Sep 4, 2018

quinnj commented Sep 4, 2018

shashi commented Sep 4, 2018

shashi commented Sep 4, 2018

davidanthoff commented Sep 5, 2018

piever commented Sep 5, 2018

tshort Sep 5, 2018

Choose a reason for hiding this comment

JeffBezanson Sep 6, 2018

Choose a reason for hiding this comment

davidanthoff commented Sep 6, 2018

quinnj commented Sep 11, 2018

piever commented Sep 11, 2018

JeffBezanson commented Sep 11, 2018

davidanthoff commented Sep 12, 2018

shashi commented Sep 12, 2018 • edited Loading

piever commented Sep 12, 2018

davidanthoff commented Sep 12, 2018

JeffBezanson commented Sep 19, 2018

JeffBezanson commented Sep 19, 2018

JeffBezanson commented Sep 19, 2018

JeffBezanson Sep 19, 2018

Choose a reason for hiding this comment

JeffBezanson commented Sep 19, 2018

quinnj commented Sep 20, 2018

shashi commented Sep 12, 2018 •

edited

Loading