Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Nullable Redesign #21

Closed
wants to merge 3 commits into from
Closed

[WIP] Nullable Redesign #21

wants to merge 3 commits into from

Conversation

johnmyleswhite
Copy link
Member

This draft describes the core ideas behind a redesign of nullables. It will require substantial elaboration before we have a final design, but my goal is to get this out in front of people while we are still debating the remaining points of uncertainty in the design. The primary source of remaining uncertainty are the arguments in favor of Union{T, Void} vs Union{Some{T}, Void}.

This approach is essentially the inverse of the `Union{T, Void}`: here no
methods will work on objects of this type unless they are defined for both
`f(x::Some{T})` and `f(x::Void)`. As such, this approach requires consistent
of specialized lifting machinery via dot-broadcasting syntax. See the later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"consistent of"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "consistent usage of"

The representation of null-like constructs in Julia is too complicated. We
currently have all of the following kinds of null-like constructs:

* A ` Void` type with a singleton value `nothing`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space after the backtick

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

## Step 2:: Define Core Operations on Nullable Types by Lifting

The core semantic issue for nullable types is maximizing code reuse: we want
to evaluate `f(x::?T)`, but not require that every function be defined for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention what ?T is intended to mean—it's used here mid-discussion with no prior mention of the syntax

* `hasvalue`
* `isnull`
* `get`
* `get_or_default`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a function in Base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for C# comparisons. This is our two argument get. Will indicate that.

`T?`. Which is chosen will likely depend upon how they interact with the
ternary operator.

We might also introduce a null-coalescing operator of `??` with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence flows better without "of" IMO

First, we define `broadcast(f, x::Nullable)` as follows:

```
immutable Lifted{T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could specify T<:Function, since type constructors can't lift (i.e. Lifted{Int} whaaat)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need type constructors to lift if you want things like pdf(Binomial(n, 0.5), x) to work correctly when evaluated against a stream of tuples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right, I didn't think of that.

The `Union` type would have the same semantics as unions currently have. Their
implementation in the scalar case would likely match that of a possible
`TaggedUnion` type, which represents a discriminated union, but is a concrete
type that is not a parent of `T`, `Void`, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain what would be the advantage of it not being a parent of these types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove this section as I think it adds complexity without many gains.


The counter-argument is that such code will only work if the methods being
called are defined for both `T` and `Void`. As such, one must redefine
`f(x::Void) = nothing` for every new function, even though the semantics are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could mention the possibility of having a generic fallback definition like this for any function f (even if this might be undesirable and Jeff wouldn't support it).

This approach is essentially the inverse of the `Union{T, Void}`: here no
methods will work on objects of this type unless they are defined for both
`f(x::Some{T})` and `f(x::Void)`. As such, this approach requires consistent
of specialized lifting machinery via dot-broadcasting syntax. See the later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't mention dot-broadcasting syntax here: that's one of the possibilities for lifting, but that choice is kind of orthogonal to the issues tackled in this section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

section on lifting for details.

Because users will have to work with lifted functions, this approach is
less convenient, but more general and often safer. It also allows one to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense is it more general?

distinguish between outcomes in cases where a method returns something like
`Union{Some{Void}, Void}` -- as might happen in a hash table implementation
that returns `Some` when a key is present and associated with `nothing` as a
value, but returns `Void` when a key is not present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some -> Some(nothing)
Void -> nothing

There are natural extensions of this logic to the n-ary function case.

Because we would use broadcasting, lifting would happen via dot-syntax:
`x .+ y` would be lifted by default. In addition, we would special several
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

special -> specialize

Because we would use broadcasting, lifting would happen via dot-syntax:
`x .+ y` would be lifted by default. In addition, we would special several
operators like `+`, `-`, `==`, etc. to automatically be equivalent to
broadcasting on nullables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in C# (for the first two at least).

Following the lead of
[Flow for Javascript](https://flowtype.org/docs/nullable-types.html), we
would introduce `?T` as sugar for `Nullable{T}`. We have also considered C#'s
`T?`. Which is chosen will likely depend upon how they interact with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also used by Swift.


We might also introduce a null-coalescing operator of `??` with
right-associativity that essentially performs lifting: in this case,
`x ?? y ?? 0` would be equivalent to `hasvalue(x) ? x : (hasvalue(y) ? y : 0)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasvalue -> !isnull

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or isnull and switch the order


Some such functions are:

* `hasvalue`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't exist in Base either.

Copy link
Member

@andyferris andyferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up, @johnmyleswhite.

I realise this is WIP, but in general I'm left a bit confused whether you want Nullable to be a one-element container similar to RefValue{Union{T,Void}} or whether you literally mean typalias Nullable{T} Union{T,Void} and to use Void like NA was used in DataArrays, except that the compiler will be faster.

Because in the latter case, you don't want to use broadcast and higher-order lifiting, but instead you need method specializations (because methods can only ever receive concrete types).

That problem is essentially the problem we already have with `Nullable{T}`:
writing "clean" functions requires information about type-inference because of
a coupling of result types to knowledge about counterfactual outcomes in
branches.
Copy link
Member

@andyferris andyferris Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that this applies? Wouldn't you use just use dispatch, and define e.g. &(::Bool, Null{Bool})?

For containers of Union{T,Null{T}}, functions like map and broadcast will have this reliance on inference just as containers of other types always do currently (it won't matter at all that the element type is Union{T,Null{T}} as opposed to anything else).

### `Union{T, Null{T}}` Pros & Cons

This approach allows us to distinguish `Null{Bool}` from `Null{Int}` in
cases in which that distinction might be useful. R makes such a distinction,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this approach is reminiscent of NA from DataArrays, but much more precise, in that you can have true & Null{String} give an error, say, while true & Null{Bool} can return Null{Bool}

functions will return `null` as their output whenever any of their arguments
are `null`. Implementing this lifting of functions can be solved as follows:

First, we define `broadcast(f, x::Nullable)` as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, wait, I'm lost here. If we literally replace Nullable{T} with Union{T, Void} or Union{T, Null{T}} then dispatch will handle function calls. broadcast, like every other function, would receive a concrete type like T or Void or Null{T}.

Further, if you mean typealias Nullable{T} Union{T, Void}, then we would also have T <: Nullable for all possible types T.

Like DataArrays and NA, lifting is not necessary. You do need a bunch more method definitions for Void / Null{T} / Some{T}.

@andyferris
Copy link
Member

andyferris commented Dec 14, 2016

my goal is to get this out in front of people while we are still debating the remaining points of uncertainty in the design. The primary source of remaining uncertainty are the arguments in favor of Union{T, Void} vs Union{Some{T}, Void}.

I'm taking this as requests for feedback/opinions (apologies if I misunderstood).

I would advocate strongly for Union{T, Null{T}} because I feel it would help enormously with writing generic methods. If my method receives a Null{Vector{Int}} at least I know what's the element type of the vector that I was supposed to get, and that might help me to arrive at the correct type of my output. Having Void here doesn't help me predict what the user wants me to do, and it doesn't help me to write different methods of a function that might have slightly different semantics depending on input types.

I also feel that Void and nothing already have a meaning, in terms they are the return value of the thing the compiler doesn't have a return value for - like the result of a for loop, or an empty function body, or an if false statement with no else branch. Having to diagnose whether an unfamiliar function returns nothing on purpose or just because it just happened to be a "null" value for my chosen input would add unnecessary complication.

I see you also mentioned #undef. How would this work? Ideally, I guess we could make the GC aware of how Union works. This could be pretty compelling... but we probably need to solve the below:

There is one thing that remains to confuse me. Currently we can have Nullable{Any}. However Union{Any, Void} = Any (and so-on). I don't think we want to change these properties. So what, for example, would a Nullable{Any}() or an #undef of Any become?

@johnmyleswhite
Copy link
Member Author

Thanks for the comments, Andy!

I should point out that the Union{T, Null{T}} implementation is not really seriously being considered as far as I understand the growing consensus. Instead, that implementation is being documented to demonstrate why it offers few benefits over our current implementation.

On that note, I think you're misunderstanding the way Union{T, Void} would work. That's really helpful for me to see because I'll try harder to ensure that revisions of this Julep communicate this point more clearly.

If we use Union{T, Void}, we'll be abandoning the zero-or-one element container way of thinking of Nullable{T} and instead we'd be using what is essentially a tagged union that always contains exactly one value: either a value of type T or a value of type Void. We can think of this as a container to make the use of broadcast more reasonable, but we're mostly taking about broadcast because it provides minimal syntax for lifting.

Will address your other comments tomorrow morning.

@andyferris
Copy link
Member

andyferris commented Dec 14, 2016

Thanks @johnmyleswhite - that clarifies things immensely. So we would have something which roughly has this structure:

immutable Nullable{T}
    value::Union{T,Void}
end
size(::Nullable) = ()
getindex(n::Nullable) = n.value
eltype{T}(::Nullable{T}) = Union{T,Void} # ??
isnull(n::Nullable) = isa(n.value, Void)

and then use broadcast, "lifting", and other machinery to push the nullables through the system?

@andyferris
Copy link
Member

I should point out that the Union{T, Null{T}} implementation is not really seriously being considered as far as I understand the growing consensus. Instead, that implementation is being documented to demonstrate why it offers few benefits over our current implementation.

Of course this:

immutable Nullable{T}
    value::Union{T,Null{T}}
end

offers few benefits, and if that is what you meant, then I agree entirely. To be clear, I was suggesting to use "bare" nullables with no wrapper type, to replace DataArrays.NA with Base.Null{T}, and to remove Base.Nullable entirely (except maybe as a typealias for a Union).

@johnmyleswhite
Copy link
Member Author

Revised the document. Still a bunch more to do (and a few comments left to respond to), but I'm now almost completely convinced we'd want to make ?T mean Union{Some{T}, Void} since the benefits from having T and ?T represent disjoint sets seem to outweigh other considerations.

lifting.

The unique concern with `Union{T, Void}` is `f(x::T)` will often be defined
because it is exists even in the parts of Julia that deal with only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it is exists"

that can take in a function with a method of the form `f(x::T)` and employ a
default semantics for null values to handle the `f(x::Void)` case. In
particular, we will refer to the pattern of extending an `f(x::T)` method to
handle `f(x::Union{T, Void)` by assuming that `f(x::Void) = nothing` as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a }

for x in xs
s += x
end
return x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return s?

current `Nullable{T}` implementation as one must determine the type parameter
`T` even for the null case.

## Step 2:: Define Core Operations on Nullable Types by Lifting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2:: -> 2:?

`x ?? y ?? 0` would be equivalent to `hasvalue(x) ? x : (hasvalue(y) ? y : 0)`.

Finally, we might introduce `.?` for lifted field access such that
`x?.field_name` would be equivalent to `getfield.(x, :field_name)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You first mention .? but then use x?.. Which one is being considered? Or are both?

following occurs:

```
julia> Array(String, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That syntax is "deprecated" per the manual (though no warning or anything is emitted). In the now-preferred syntax, it would be Array{String,1}(1).


Deciding between these three options will require balancing a variety of
concerns. The arguments for each option are summarized below. We include all
of them for clarity, although we believe that the arguments on behalf of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in favor of" rather?

handle `f(x::Union{T, Void)` by assuming that `f(x::Void) = nothing` as
lifting.

The unique concern with `Union{T, Void}` is `f(x::T)` will often be defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say it's the unique concern. The strongest concern for me is that this nullable type wouldn't allow distinguishing between no value and a stored null value when indexing a dictionary.

functions in Base Julia were repeated again when adding methods for `NAtype`:
instead of describing a universal pattern using a generic abstraction, we
repeated that pattern manually over and over again, while always discovering
new cases that had been left out.
Copy link
Member

@nalimilan nalimilan Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem of manual redefinitions is not a criticism of the Union{T, Void} approach per se: it would rather justify hard-coding something like (pseudo-code) (::F){F<:Function}(args...) = nothing for all functions. (The next third paragraph below gives reasons why this would not be a good idea, but it's a different argument IMHO.)


```
function sum(xs::Vector{Union{Some{Int}, Void}})
s = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Some(0)?

end
```

What type does this return when `xs = ?Int[]`? What type does it return when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you illustrate the alternative solutions? It's not completely clear to me what problem you're describing.

the implementation details of the language. But Julia does not provide such
guarantees, so type-inference must be invoked to determine information about
the return type of the counterfactual branch. An alternative approach is to
use `promote_type` to explicitly register return types, but this is clearly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promote_op? Anyway, I don't think it's worth mentioning as it's clearly not a solution.

implementation that uses distinct sentinel values for each nullable type in
the language. Otherwise, this approach is similar to the `Union{T, Void}` case
described above, except for one crucial problem that would seem to suggest this
approach offers few benefits over our current implementation of `Nullable{T}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention possible benefits which justify detailing this approach at all. One benefit I can see is preventing things like Null{String}() + 1, which would most likely occur due to a bug somewhere.

What benefits we expect also matters for discussing the problems with this approach. In particular, when inference fails, we could fall back to Null{Any} or Null{Union{}}. Depending on what that type information is used for, it could be a problem or not.

@TotalVerb
Copy link

Let me add something to ensure that all potential questions have answers.

One difficulty with that approach is distinguishing when an f(x) = ... definition intends to lift over x::Null or otherwise. For instance, we might want string(Null()) == "Null()". Now if someone defines

foo(x) = length(string(x))

is it clear whether this should be Null() or 6 for foo(Null())?

@TotalVerb
Copy link

Unrelated to the current matter of discussion; but one thing we should do in 0.6 if at all still possible, is deprecate using the ternary operator without spaces. Otherwise things like

@foo x ?Int :sym

will parse in an unexpected way.

@TotalVerb
Copy link

There are also other functions which we must consider Null to not lift over. I wonder if maintaining such a list of functions might be just as much work as maintaining a list of functions it should lift over.

  • string, print, println, show, dump, display etc.; all IO functions
  • by extension, functions that call string, e.g. lpad. annoyingly, we might want to lift lpad over Null() as the first argument
  • Base.vect, i.e. [x, y]; also vcat hcat hvcat cat
  • push! unshift! setindex! fill! etc. in fact basically every function with a bang
  • in contains
  • => e.g. for Dict(x => Null()), Set
  • isnull, get, filter, map
  • most (but not all? e.g. Complex) type constructors
  • ifelse, isa, typeof, Ref, and most other builtins

And then of course a list of functions where the lifting behaviour is ambiguous

  • searchsorted and friends
  • |>, ==, !=, isequal
  • run, close, sleep, wait and other "non-bang but still imperative" functions for which a null argument doesn't seem to make sense

I am sure this list is incomplete; but it must be something to consider.

@ararslan
Copy link
Member

The more I think about it, the more I think we shouldn't lift by default; cases where f(Null) should not be Null seem to be equally prevalent if not more so than those where it should be Null. (Plus, I don't think Jeff has changed his stance on automatic lifting for Base functions.) It's easy enough to determine the subset of Base functions which should provide lifting, and packages can opt to provide functionality for their functions as well. But making a broad assumption on what f(Null) should be seems a little dangerous.

@TotalVerb
Copy link

TotalVerb commented Feb 27, 2017

There seems to be some consensus recently of Nullable going back to what it used to be, with NAType, except now with compiler support so it's faster. I wonder if the name should also change to NAType, as this seems in some ways to be moving away from what is called "null" in other programming languages.

@ararslan
Copy link
Member

I don't think it matters too much how we spell it so long as the behavior is well defined and well documented.

@andyferris
Copy link
Member

andyferris commented Feb 27, 2017

I also think "universal" default definitions (lifting) for Null would be undesirable. I think there are many situations where an error is better than continuing to propagate the Null.

I still believe that Null{T} would be a nicer semantic. Some functions support a wide range of signatures and not having the T means it's ambiguous which method to correspond with (and there may exist important semantic differences between the different method signatures). I feel that would be a real problem and become a major pain point when you are doing anything more complex than basic operations like +. The other nice thing about this that you can (safely) define operations on Null{T} such as &(x::Bool, ::Null{Bool}) = x ? Null{Bool}() : false for e.g. 3 valued logic (and users can do the same safely for a variety of types). Without the T then we should assume the Null value might be of Any type, so I wouldn't like to see 3-valued logic implemented with &(::Bool, ::Null) since users are free to overload the behavior of &(::Bool, ::T) for their own types T.

However, to get this semantic of a missing value of type T, we might benefit from some compiler support (specifically, from the type system). Ideally, Null{T} would be covariant in T to make working with non-concrete types easier and more consistent. For example, if Null{Int} represents some missing value of type Int, then I wonder if this missing value should also be considered of type Null{Integer}? That is, given that types represent sets of possible values, for every (concrete?) type T we create the nullable set by adding an extra member of the set Null{T}() which represents a missing value (let's name here the combined set as type Nullable{T} = Union{T, Null{T}}, though I think we should find a new name like ?T or something). We could imagine the set Nullable{S} (where S is an abstract supertype of T) as including the value Null{T}() (as well as Null{S}()?). (Does this make any sense, @JeffBezanson? no it does not)

If we don't get this behavior then a signature like f(::Nullable{Integer}) would fail for Null{Int}() and I do not believe that accurately captures the intentions of the author of f. If we did implement this covariant behavior, we do get that Null (being Null{T} where T) and Null{Any} are more-or-less equivalent (as may be Any and Nullable{Any}...), so in many cases ::Null would still be useful to put in signatures and there shouldn't be undue burden on users/package authors unless they actually require the type specificity to make dispatch consistent.

EDIT: What I said here about covariance of Null{T} is rubbish. Null{<:T} seems to have all the desired properties, e.g. Null{<:Integer} includes Null{Int64}(), Null{Unsigned}(), Null{Integer}(), Null{Union{UInt8, Int8}}(), etc.

@ararslan
Copy link
Member

@andyferris I don't see how Null{T} would differ significantly, or in any beneficial way, from the current Nullable{T}.

Some functions support a wide range of signatures and not having the T means it's ambiguous which method to correspond with

That's why you define proper promotions in Base and locally define f(::Null) to be whatever makes sense. If not knowing the type when calling a method on a null value is a problem, I think that's more indicative of a design flaw.

@andyferris
Copy link
Member

If not knowing the type when calling a method on a null value is a problem, I think that's more indicative of a design flaw.

Sorry, some functions are overloaded with lots and lots of methods on purpose, e.g. try methods(+). Making use of multiple dispatch is the whole point of Julia, and null values of certain types will follow distinctive propagation rules. If we allow for e.g. 3-valued logic,

&(x::Bool, ::Null) = x ? Null() : false
&(::Null, y::Bool) = y ? Null() : false
&(::Null, ::Null) = Null()

and another user creates a type T which

  • overloads &
  • interacts with Bool, such as &(::Bool, ::T) -> T
  • they want to use Union{T,Null} and propagate it through &(::Union{Bool,Null}, ::Union{T,Null})

then they will not ever be able to stop &(false, Null) from being false::Bool instead of some value of T or Null.

I would prefer we didn't add things which constrain multiple dispatch and user's ability to overload methods, whenever they decide they also need to use missing values. In the example above, the user would need to define another &-like function instead of adding a method to &. It seems, to me, to be against the Julian way.

I do agree that for typical, database stype operations, you're not going to run into this esoteric set of conditions. But the fact is that we can foresee circumstances where using the type and dispatch system more strongly will lead to bad interactions with Null.

@andyferris
Copy link
Member

In short terms, if a user defines T such that &(::Bool, ::T) -> T, they might expect to be able to enforce that &(::?Bool, ::?T) -> ?T. However, if anyone implements 3-valued logic on Bool, they won't be allowed to do that.

@ararslan
Copy link
Member

That doesn't seem like a problem to me; that's just kind of how I expect Unions to work. The behavior is clear when you think in terms of the methods defined for the given function. We just have to make sure the behavior is well documented, then people will know they can't expect to be able to enforce Union{Bool, Null} & Union{T, Null} = Union{T, Null}.

@andyferris
Copy link
Member

andyferris commented Feb 27, 2017

That doesn't seem like a problem to me; that's just kind of how I expect Unions to work.

Of course, this is exactly how Union{T,Null} would work. I was just suggesting that Union{T,Null{T}} would follow the same Union rules but is more flexible for users.

Do note that the covariance idea means that 99.9% of the time you can just use Null and ?T for dispatch, and any users that want Union{Bool, Null{Bool}} & Union{T, Null{T}} = Union{T, Null{T}} can still get that. It just seems like a more flexible solution, is all. The other problem with untyped Null and ?T = Union{T,Null} is that dispatch would have many ambiguities if users use ::?T in signatures. I can imagine that I'd like to write:

f(x::?T1) = if isnull(x) ? #= do a =# : #= do b =#  # note that branch is elided by compiler!
f(x::?T2) = if isnull(x) ? #= do c =# : #= do d =#
f(Null) # ambiguous

That basically means it's never safe to dispatch on ?T since two users/packages (who might not be collaborating) might both overload Base.f(::?T) for same f and different T, and end up with this ambiguity problem. This seems (to me at least) as a major flaw.

I don't see how Null{T} would differ significantly, or in any beneficial way, from the current Nullable{T}.

I'm arguing that there is (or that there might be) a benefit for Null{T} over Null. I'm wondering what people see as the downsides (especially w.r.t. dispatching on ?T)?

EDIT: please replace "covariance idea" with "using Null{<:T}" when reading the above.

@andyferris
Copy link
Member

andyferris commented Feb 27, 2017

I don't see how Null{T} would differ significantly, or in any beneficial way, from the current Nullable{T}.

I'm arguing that there is (or that there might be) a benefit for Null{T} over Null. I'm wondering what people see as the downsides (especially w.r.t. dispatching on ?T)?

Or to answer your question another way, the only problem I saw with Nullable was that it is a container. Null{T}() is not a container, it's a value, just as Null() would be.

What are the problems that Null{T}() inherits from Nullable{T}()? Is it that you might need to sometimes use promote_op and so-on? (I actually think you won't need that nearly as much for Null{T} as you would for Nullable{T}, since you can just treat it as a value. Only containers of ?T would use promote_op just as they do for T itself).

@johnmyleswhite
Copy link
Member Author

The relevant problem with Nullable{T} is also the problem with Null{T}: they're parametric types. Being a container or a union is not the problem: the problem is computing the type parameters of a parametric type in the absence of type inference.

@andyferris
Copy link
Member

andyferris commented Feb 28, 2017

The relevant problem with Nullable{T} is also the problem with Null{T}: they're parametric types. Being a container or a union is not the problem: the problem is computing the type parameters of a parametric type in the absence of type inference.

Actually it's only parametric containers that rely on inference.

You can for instance define +(::Null{Int32}, ::Null{Int64}) = Null{Int64}(). No inference required. Just dispatch.

You could use promote or promote_op(+, ...) to have a generic method for +(::Null{T1}, Null{T2}), but unlike the container situation, you are not forced to, and I would suggest we don't use promote_op (promote might be OK, wherever it is already used?).

@johnmyleswhite
Copy link
Member Author

I give up. I'm done having these debates when no one involved is paying me for my time.

@andyferris
Copy link
Member

John - please don't take me the wrong way. We all really appreciate the effort that is going into this - I think it is really fantastic!

(I'm only trying to bounce some ideas around and have a technical discussion about some things which I thought might have been overlooked)

@TotalVerb
Copy link

@johnmyleswhite I too greatly appreciate the effort that has gone into this Julep. Thanks for all your hard work; it's truly important for building a strong foundation for 1.0.

@andyferris There is a flaw, in my opinion, with the covariant Null{T} idea. It makes the assumption that every type on its own represents a coherent algebraic structure, to which we can add a unique null type. By making Null{T}() an instance iff T is concrete, we lose the ability to adjoin a null element to algebraic structures which are, for whatever reason, represented by an abstract type.

Consider for example

abstract type Symbolic <: Real end
struct SymInteger <: Symbolic; val::BigInt; end
struct Reciprocal <: Symbolic; of::Symbolic; end

and imagine that, conventionally, Symbolic is used to denote some real number represented symbolically in this fashion. Then, what we desire is a Null{Symbolic}() constant. But with Null{T}() existing iff T concrete, what we instead have are two disjoint Null{SymInteger}() and Null{Reciprocal}() types, which does not seem fun to have to deal with.

@andyferris
Copy link
Member

@TotalVerb I think you might be right (I think there was a ? in my post next to the "concrete"). Can we have consistent covariance (Null{SymInteger}()::Null{Symbolic}) as well as a Null{Symbolic}() value?

@TotalVerb
Copy link

Hmm, that would be unusual compared to anything else in the language (it would be a type which has an instance, but also has subtypes), but I suppose there is nothing preventing its implementation. Moving on to practical issues though, as mentioned in the Julep, consider

function map(f, x::Nullable{T})
    if !isnull(x)
        return Nullable(f(get(x)))
    else
        return Nullable{Core.Inference.return_type(f, (T, ))}()
    end
end

If we rewrite this with Union{T, Null{T}} we obtain

map(f, x::Any) = f(x)
map{T}(f, x::Null{T}) = Null{Core.Inference.return_type(f, Tuple{T})}()

which not only has an inference dependency, but also has a strange definition of map. It seems like we would be giving up on map entirely with Union{T, Null{T}}. This would be a departure from Scala, Java, Rust, F#, Swift, Kotlin, {Ca|Standard}ML. While I'm not saying that is necessarily a bad thing, we should consider why map is considered useful in such languages, and see if we have a replacement for those use cases.

@TotalVerb
Copy link

An instructive exercise, perhaps, is to find some concrete examples of code that uses Nullables in practice, and consider how it would be written using each suggested interpretation. The applications in NullableArrays are well-known and of course well-studied, but we should consider other applications also.

Let me get the ball rolling with some parts of Base Julia that use Nullable or an equivalent type.

  • match in Base returns nothing when there is no match. (This is effectively Union{T, Void} or Union{T, Null}.) As far as I can tell, any code using conditional regex matches is practically forced to use an if statement, because there is neither autolifting nor opt-in lifting for the Void in `Base.
  • get(dict::Associative, key) as proposed in Add get(coll, key) for Associative returning Nullable julia#18211. In particular, this is not only a convenience issue, but also a performance one, because this can avoid the double hash lookup.
  • The revised iteration protocol gives yet another perspective.

I suspect that a lot of these use cases will be relatively independent of which interpretation is chosen, but we should also consider package code that makes use of nullable.

@andyferris
Copy link
Member

andyferris commented Feb 28, 2017

Hmm, that would be unusual compared to anything else in the language (it would be a type which has an instance, but also has subtypes), but I suppose there is nothing preventing its implementation.

Actually that does seem like a really big departure from the current type system.

If we rewrite this (map) with Union{T, Null{T}} we obtain

Do people need map(f, x::Union{T, Null}) for values? If T is not a container; you just call f(x). Of course, you can do map(sin, 1) and for the null case we can make Null an iterable of length 1, just like Int.

If T is a container such as T <: AbstractArray, say, then we'd need an extra method for map for that, and it might invoke inference on the element types if that is what it does now for empty arrays anyway... seems equivalent.

@TotalVerb
Copy link

I think you're right. map is for containers; Union{T, Null{T}} is not a container and so makes it not generally necessary.

However, this goes back to the original issue of when functions should be lifted over null arguments. If I want to compute f.(x::Nullable{T}), whether or not the person who defined function f defined f(::Null{T}), there should still be a way to do that; otherwise we are losing the functionality of map. This is, modulo the inference dependency, I think fairly easy to resolve:

lift(f, x) = f(x)
lift(f, x::Null{T}) = Null{C.I.return_type(f, Tuple{T})}()

However, because there is no more Some{T} tag indicating that something was part of a Nullable, this means that we need to use a new syntax, and can no longer reuse the broadcast dot syntax.

@andyferris
Copy link
Member

Thank you very much @TotalVerb for helping me figure out the trade-offs here. It seems true that the Null{T} approach does seem to preclude "lifting" methods (or entire functions) without performing a type inference step.

My general approach here has been to see if sufficiently clever dispatch would mean lifting is not necessary - and demanding that users/package authors provide the necessary method. I will concede that this might not be very user-friendly, especially in the case that the methods are coming from somebody else's package. In any case, I would suggest providing a range of common methods for Null or Null{T} so that lifting is relatively rare. And, if lifting is relatively rare it (speculatively) seems not too ridiculous to invoke inference in these cases? I dunno.

Anyway, it seems that Null is better suited for an explicitly annotated lifting approach and Null{T} is more geared towards dispatch.

@andyferris
Copy link
Member

andyferris commented Feb 28, 2017

By the way, everything I was saying about the covariance of Null{T} is rubbish. In Julia v0.6 it's dead easy to express Null{<:Integer} which includes Null{Int64}(), Null{Unsigned}(), Null{Integer}(), Null{Union{UInt8, Int8}}(), etc.

Thus I would suggest having ?T = Union{T, Null{<:T}} (or whatever the syntax for the left-hand side here becomes). Then dispatch for f(::?Integer) would be quite sensible. No special compiler/type-system treatment required.

@nalimilan
Copy link
Member

My general approach here has been to see if sufficiently clever dispatch would mean lifting is not necessary - and demanding that users/package authors provide the necessary method. I will concede that this might not be very user-friendly, especially in the case that the methods are coming from somebody else's package. In any case, I would suggest providing a range of common methods for Null or Null{T} so that lifting is relatively rare. And, if lifting is relatively rare it (speculatively) seems not too ridiculous to invoke inference in these cases? I dunno.

@andyferris This approach has already been tried with vectorized methods, and it was considered as a design flaw. In 0.6 we've been able to move away from it by using the f.(x) syntax for element-wise operation. So we've followed that route already and we know it won't work. If a user needs to pass a null values to a function provided by a package which doesn't support it, s/he shouldn't need to rewrite the function. I think repeating these discussions once again is one of the reasons John closed the PR.

Also the issue with inference is a deep one. Even if we get rid of map support for nullables, lifting will have by definition to infer the T for Null{T} if we make it parametric. It's not OK for it to fail in some cases, as these cases will behave differently from others if you dispatch on the inferred type. What's the point of opting for Null{T} if we can't rely on it? Note that the case of empty collections is considered as fragile, but we haven't found a better solution. Making it the foundation of nullables isn't the best idea to write a robust system.

That basically means it's never safe to dispatch on ?T since two users/packages (who might not be collaborating) might both overload Base.f(::?T) for same f and different T, and end up with this ambiguity problem. This seems (to me at least) as a major flaw.

Yes, and that shows the only solution in that case is to have a common definition f(::Null) = Null(). That's why I suggest we do this by default for all functions.

(The example of 3VL & and | is not representative at all, since these are the only two functions I know for which we want lifting, but not with the standard lifting semantics. So let's not base the whole system on this special case, for which it could be OK to define methods manually.)

Finally, I'm not sure we need so many functions not to be lifted. Functions for which passing a nullable makes no sense should generally not be lifted, and people shouldn't pass nullables to them. Why would one call lpad(Null()) or run(Null()) and expect not getting Null() as a result?

Anyway, we could start without a default lifting fallback and see how it goes. At least, Union{T, Null} would be as functional as DataArrays or Nullable, but more convenient than the latter since non-null values would not even need unwrapping.

@andyferris
Copy link
Member

Thank you so much @nalimilan for your wonderful and very detailed response. I have certainly learned a lot today from this discussion. I had been interested in the development of Nullable and for quite some time have been following the public discussions, issues and this PR as best I can, yet I always was left with (central) one question: "why use lifting with Union-based nullables?" (as opposed to dispatch, which does require extra method definitions and certainly inconvenience, but nothing worse than e.g. C++ would be). I am very glad for receiving a full and direct explanation.

In any case, I feel that having this discussion has been valuable exercise. This certainly is a tricky design problem, and there are clearly rather few available options which satisfy e.g. determinism w.r.t. inference and user-friendliness.

@johnmyleswhite I'd like to apologize to you directly - as I alluded to earlier, I have been very impressed with the leadership shown with this PR and, in fact, have been looking forward to seeing this PR merge. It was never my intention to derail this process. However, I think it's reasonable to say this Julep has become a focus point for the future of Nullable and I assumed it would be an OK place to check my understanding and assumptions about this design, clear up some misunderstandings, float ideas, and in general, to come to a consensus. Thank you for your efforts - and I still believe this Julep process has been, and will continue to be, very worthwhile.

@nalimilan
Copy link
Member

I think we should merge this PR and add the missing details later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.