RFC: Implement operators on Nullable with lifting semantics #16988

nalimilan · 2016-06-17T15:21:49Z

This moves to Base operators which were defined in NullableArrays. It keeps the lifting semantics: Nullable(x) $op Nullable(y) = Nullable(x $op y) when not null, and Nullable{promote_op{...}() when null. It generalizes these definitions in a safe yet efficient way by introducing a new null_safe_op(f, ::Type...) function for a fast-path with unchecked types (see commit message).

See JuliaStats/NullableArrays.jl#111 for discussion of the design. I've marked it as 0.5.0 since this really sounds like a blocker for progress on the data management front, and it shouldn't break anything.

The first commit is a quick hack to get the new tests to pass. Not sure that's the best strategy.

Cc: @johnmyleswhite @davidagold @quinnj @tshort

Error was triggered e.g. by false >> 0x01.

nalimilan · 2016-06-17T16:03:53Z

@JeffBezanson Care to discuss the milestone? That's a relatively limited change, but essential to make Nullable useful in real applications.

JeffBezanson · 2016-06-17T16:15:58Z

We have been doing weekly triage, and are trying very hard to work through the milestone issues. I think it's too late to add new features. Why will it make a big difference to have this in Base now?

nalimilan · 2016-06-17T16:22:01Z

I know it's late, but there's been a push recently to start using Nullable with DataFrames and to improve their support (which is quite limited). Having to wait for 0.6 wouldn't help reaching that goal.

I guess we could keep these operators in NullableArrays, but they really don't belong there as they aren't specific to that type at all. For example, I also need them in CategoricalArrays.jl. Also, Base currently defines ==(::Nullable, ::Nullable) as an error, and we would need to overwrite it to make it useful.

quinnj · 2016-06-17T16:24:21Z

+1 to including in 0.5 (or at least 0.5.x). We really are trying to make a big push to get rid of DataArrays and make DataFrames a stronger standard of type-niceness.

davidagold · 2016-06-17T16:26:53Z

Might a middle road be to host them in a NullableMath package for now?

StefanKarpinski · 2016-06-17T16:30:49Z

How about a Nullables package, required be NullableArrays and CategoricalArrays, etc. FWIW, I think that having most of the Nullable machinery outside of Base Julia for the 0.5 cycle will help accelerate the development of nullables and everything that depends on them.

nalimilan · 2016-06-17T16:33:37Z

That could work. I would say it depends on the strength of the consensus around these operators: if we all agree on the semantics, I'd say they could go into 0.5; if there's still some debate, better keep them in a package.

JeffBezanson · 2016-06-17T16:42:42Z

I do have some objections to the content here.

I don't want to officially sanction computing on invalid data. If there were an efficient way to give an error when accessing an uninitialized Int, we'd be doing that.
It seems strange to be concerned with SIMD performance on nullable values. Do we have performance numbers? I would think if you're at the point where you need SIMD, you have to eliminate nullables anyway.
Needing to define every function on Nullable strikes me as the vectorization issue all over again. I prefer doing lift(+), which can be implemented with a single definition. There is also a Nullables-as-containers proposal out there, which could potentially use the f.() syntax, but I haven't yet looked at that in detail.

StefanKarpinski · 2016-06-17T16:45:23Z

Is the part of this change that affects base/bool.jl separable? What's the motivation for that part?

nalimilan · 2016-06-17T16:55:35Z

@JeffBezanson OK, if we still have to discuss fundamental issues like this, let's go with a package. That said, the discussion is interesting, so:

I don't want to officially sanction computing on invalid data. If there were an efficient way to give an error when accessing an uninitialized Int, we'd be doing that.

Well, yeah, but the day such a feature is introduced, the instruction set will probably provide a way of combining it with vectorization (i.e. ignore errors). Anyway, that's quite theoretical, and that optimization is an implementation detail which can be abandoned.

It seems strange to be concerned with SIMD performance on nullable values. Do we have performance numbers? I would think if you're at the point where you need SIMD, you have to eliminate nullables anyway.

Unfortunately, SIMD doesn't work at this point in my tests. Cf. JuliaStats/NullableArrays.jl#111 (comment). But enabling SIMD in the long-term is essential for data analysis with NullableArrays. @davidagold and @johnmyleswhite have done a lot of work to ensure the efficiency of that structure.

Needing to define every function on Nullable strikes me as the vectorization issue all over again. I prefer doing lift(+), which can be implemented with a single definition. There is also a Nullables-as-containers proposal out there, which could potentially use the f.() syntax, but I haven't yet looked at that in detail.

The apparent consensus among JuliaStats people seems to be that we should follow C# in implementing lifting for all standard operators, and use an explicit syntax for functions (a macro? the f.(x) syntax?). So this PR wouldn't have to be extended to other functions.

Is the part of this change that affects base/bool.jl separable? What's the motivation for that part?

@StefanKarpinski The first commit is a quick hack to make tests pass in corner cases without moving Bool to a specific block. But that's really peripheral. The changes included in the second commit are the cleanest way I could find to implement comparison operators; I could as well treat them as a special case and remove the promote_op method, but I figured it was natural to have it.

Use fast path without a branch for types with unchecked arithmetic (for which the operation can be computed even when value is missing) and a slow path for other types. The new null_safe_op() function allows custom types to opt-in to the fast path when possible. Also use this strategy in isequal(), which keeps its current (non-lifting) behavior.

johnmyleswhite · 2016-06-17T17:13:32Z

I would personally prefer not trying to land these things for 0.5 to give us more time to consider them.

nalimilan · 2016-06-17T18:28:54Z

I'm fine with that, but it looks like we still need a few changes in Base:

The promote_op for Bool: promote_op() fixes #16995
The bitshift operators on Bool fixes (less important, but still good to fix): Fix bitshift operators on Bool #16996
==(::Nullable, ::Nullable): let's remove it as it's always an error anyway?
Less important, but isequal could be made faster using the null_safe_op mechanism. Do we care at this point? That's probably not a performance-critical function.

StefanKarpinski · 2016-06-17T18:55:21Z

Those changes all seem good except the last one, which seems like a pure performance improvement, which means it's not essential to get it into the release.

JeffBezanson · 2016-06-17T19:12:16Z

If you remove == for Nullable, keep in mind it falls back to ===, so you will not get an error.

nalimilan · 2016-06-17T19:27:04Z

If you remove == for Nullable, keep in mind it falls back to ===, so you will not get an error.

Yes, that's what I just realized. That sounds dangerous, since loading the package would change the behaviour of that function in a subtle way, possibly breaking code that relies on it. Maybe better keep it so that it always fails, and overwrite it from the package. That's not very nice because a warning is going to be printed all the time, but...

nalimilan · 2016-06-18T13:04:25Z

Should I make a PR against NullableArrays or create a new package? Maybe better keep everything in one place until these are moved to Base. Else we need to handle deprecations in NullableArrays, which is a pain.

nalimilan · 2016-06-18T21:20:07Z

I've moved this to JuliaStats/NullableArrays.jl#119 for now. Closing, but please continue discussing any point you think needs more consideration. Better do that sooner than later.

Fix StackOverflow error with >>, << and >>> on Bool and Unsigned

f143ae0

Error was triggered e.g. by false >> 0x01.

nalimilan added this to the 0.5.0 milestone Jun 17, 2016

nalimilan force-pushed the nl/nullops branch from 14a81d4 to 62c3994 Compare June 17, 2016 15:34

nalimilan mentioned this pull request Jun 17, 2016

RFC: Nullables as collections #16961

Merged

JeffBezanson removed this from the 0.5.0 milestone Jun 17, 2016

nalimilan force-pushed the nl/nullops branch from 62c3994 to ef687a3 Compare June 17, 2016 16:56

nalimilan mentioned this pull request Jun 17, 2016

promote_op() fixes #16995

Merged

nalimilan mentioned this pull request Jun 17, 2016

Fix bitshift operators on Bool #16996

Merged

nalimilan closed this Jun 18, 2016

nalimilan deleted the nl/nullops branch June 18, 2016 21:20

nalimilan mentioned this pull request Aug 31, 2016

Fast versions of isequal() and isless() for Nullables #18304

Merged

davidanthoff mentioned this pull request Sep 23, 2016

Tag Query v0.0.4 JuliaLang/METADATA.jl#6468

Merged

This was referenced Oct 4, 2016

WIP: Nullable lifting infrastructure #18758

Closed

Port to NullableArrays and CategoricalArrays JuliaData/DataFrames.jl#1008

Merged

nalimilan mentioned this pull request Oct 11, 2016

type hacking JuliaStats/NullableArrays.jl#161

Open

nalimilan mentioned this pull request Oct 19, 2016

Implement more operators on Nullable with lifting semantics #19034

Closed

nalimilan added the missing data Base.missing and related functionality label Oct 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Implement operators on Nullable with lifting semantics #16988

RFC: Implement operators on Nullable with lifting semantics #16988

nalimilan commented Jun 17, 2016

nalimilan commented Jun 17, 2016

JeffBezanson commented Jun 17, 2016

nalimilan commented Jun 17, 2016

quinnj commented Jun 17, 2016

davidagold commented Jun 17, 2016

StefanKarpinski commented Jun 17, 2016

nalimilan commented Jun 17, 2016

JeffBezanson commented Jun 17, 2016

StefanKarpinski commented Jun 17, 2016

nalimilan commented Jun 17, 2016

johnmyleswhite commented Jun 17, 2016

nalimilan commented Jun 17, 2016 •

edited

Loading

StefanKarpinski commented Jun 17, 2016

JeffBezanson commented Jun 17, 2016

nalimilan commented Jun 17, 2016

nalimilan commented Jun 18, 2016

nalimilan commented Jun 18, 2016

RFC: Implement operators on Nullable with lifting semantics #16988

RFC: Implement operators on Nullable with lifting semantics #16988

Conversation

nalimilan commented Jun 17, 2016

nalimilan commented Jun 17, 2016

JeffBezanson commented Jun 17, 2016

nalimilan commented Jun 17, 2016

quinnj commented Jun 17, 2016

davidagold commented Jun 17, 2016

StefanKarpinski commented Jun 17, 2016

nalimilan commented Jun 17, 2016

JeffBezanson commented Jun 17, 2016

StefanKarpinski commented Jun 17, 2016

nalimilan commented Jun 17, 2016

johnmyleswhite commented Jun 17, 2016

nalimilan commented Jun 17, 2016 • edited Loading

StefanKarpinski commented Jun 17, 2016

JeffBezanson commented Jun 17, 2016

nalimilan commented Jun 17, 2016

nalimilan commented Jun 18, 2016

nalimilan commented Jun 18, 2016

nalimilan commented Jun 17, 2016 •

edited

Loading