Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename None to Union() and Nothing to Void? #8423

Closed
johnmyleswhite opened this issue Sep 20, 2014 · 34 comments
Closed

Rename None to Union() and Nothing to Void? #8423

johnmyleswhite opened this issue Sep 20, 2014 · 34 comments
Assignees
Labels
breaking This change will break code needs decision A decision on this change is needed
Milestone

Comments

@johnmyleswhite
Copy link
Member

In the discussion of #8152, there was some concern about the potential existence of three "NULL"-like types in Julia:

  • None
  • Nothing
  • Nullable

One suggestion was to rename types to clarify their purpose. @JeffBezanson suggested renaming None to Union() and Nothing to Void to reflect their respective roles as the empty union of zero types and the result of functions that "do not return a value".

I personally think this would be a great change.

@johnmyleswhite johnmyleswhite added needs decision A decision on this change is needed breaking This change will break code labels Sep 20, 2014
@eschnett
Copy link
Contributor

In the spirit of "None -> Union()", one could also "Nothing -> ()", i.e. the empty tuple. Functions return multiple values as tuples, so a function returning nothing returns ().

@johnmyleswhite
Copy link
Member Author

I like that idea, but believe it might require a change in semantics rather than a change in names.

@nalimilan
Copy link
Member

And what about NA from DataArrays? What can it be replaced with, in the perspective of renaming DataArray to NullableArray and making it consistent with Nullable?

EDIT: for reference, https://github.com/johnmyleswhite/NullableTypes.jl/pull/3

@quinnj
Copy link
Member

quinnj commented Sep 20, 2014

Would we really need a separate type for NullableArrays? Or would they just
become Array{Nullable{T},1}?

On Sat, Sep 20, 2014 at 5:38 AM, Milan Bouchet-Valat <
notifications@github.com> wrote:

And what about NA from DataArrays? What can it be replaced with, in the
perspective of renaming DataArray to NullableArray and making it
consistent with Nullable?


Reply to this email directly or view it on GitHub
#8423 (comment).

@StefanKarpinski
Copy link
Member

Arrays of Nullables would not have good performance characteristics or memory efficiency.

@StefanKarpinski
Copy link
Member

I'm fine with renaming None to Union() but I think renaming Nothing to Void would be a mistake. The empty tuple is a completely valid and useful value – using it to indicate that there's nothing interesting to return is not a good idea.

@JeffBezanson
Copy link
Member

Yes we cannot use () to mean "nothing". For example it is the size of a 0-dimensional array. In that case there is definitely a value there, representing 0 dimensions.

I think making Void === Nothing is effectively a bugfix. A Void ccall returns nothing in julia, so anything else is just bound to cause problems.

I would much prefer simply renaming Nothing to Void, but I'm willing to accept Nothing === Void in the interest of fixing the bug.

@johnmyleswhite
Copy link
Member Author

@nalimilan: My plan is to remove NA as a concept from Julia completely because it has no coherent place in the type system. In R, NA is shorthand for what might be called NA_logical; that is, NA is a value of type logical. But the existence of multiple NA values for each of R's fundamental types gives rise to some paradoxical situations. See this gist for one example: https://gist.github.com/johnmyleswhite/fd6cbed2f691a9119cfe

In Julia, we started with an approach in which NA was a singleton object of a completely novel type, NAtype. This was convenient at the start, but problematic in the long-run because it induced endemic type-instability in all code that interacted with any source of NA, since that code always produced Union(NAtype, T) as the inferred type. From the perspective of the current Julia compiler, we might as well have been producing Any everywhere as the output type -- we were actively sabotaging everything clever about Julia's compiler's design.

One could improve support for union types in Julia substantially using techniques like polymorphic inline caching, but Jeff and others felt that this was not the best way to move forward. I've also come to feel that Julia doesn't use Union types in a way that's meaningfully similar to languages like ML or Haskell, where the Union(NAtype, T) pattern makes sense and is referred to as sum types. In those languages, the compiler forces one to decompose a sum type into separate cases for each possible type in the sum via explicit pattern matching. In other words. sum types are used to push branching into the type system's verifier for program correctness. Julia does not verify exhaustivity when working with union types.

From my perspective, Julia's union types tend to be used only for writing out a catch-all case that expresses a generic statement about all types. Specialized cases are handled via multiple dispatch, rather than pattern matching with exhaustive case analysis. As such, I don't think Julia's union type will ever come to be used in the same way that functional languages use their sum types. This makes me think that the old Union(NAtype, T) pattern in Julia should be expunged from the language completely.

@johnmyleswhite
Copy link
Member Author

@quinnj: The plan is to maintain a separate NullableArray type rather than use Array{Nullable}. The arguments for and against that I see are as follows.

Pros:

  • NullableArray is more efficient because it uses bits to store Boolean information, rather than bytes.
  • NullableArray mimics the data structure of Nullable more closely by decomposing values and missingness masks. This memory layout means that the values component can be operated on by standard functions over Julia's Array{T} type.
  • By using a more standard memory layout, we can avoid redefining operations. We get, for example, FFT's on NullableArray's for free.
  • By reusing existing functions, we also get to avoid having to define elementary operations between two Nullable{T <: Number} objects. We don't, for example, have to define + between two Nullable objects in order to define linear algebra operations.

Cons:

  • People will surely create Array{Nullable} objects. It will take some cultural force to ensure that people learn why those objects aren't supported by most libraries.
  • We need to reinvent lots of functionality currently defined over AbstractArray, including things like map, reduce, etc. But we have to do that anyway, because those functions aren't able to cope with Nullable objects anyway.
  • We have to do a lot more work to support an extra data structure.

I think we'll end up revisiting this question a few times, but I think the current plan is the best one we have so far.

@johnmyleswhite
Copy link
Member Author

@JeffBezanson and @StefanKarpinski: How about starting by making Void === Nothing and then considering a complete transition to Void over time?

@JeffBezanson
Copy link
Member

Yes that's probably what we'd do anyway as a deprecation process.

If we had a general way to do the array-of-structs-to-struct-of-arrays transformation, then NullableArray could become redundant. But that is a significant challenge.

@eschnett
Copy link
Contributor

On Sat, Sep 20, 2014 at 1:38 PM, John Myles White notifications@github.com
wrote:

@quinnj https://github.com/quinnj: The plan is to maintain a separate
NullableArray type rather than use Array{Nullable}. The arguments for and
against that I see are as follows.

Pros:

  • NullableArray is more efficient because it uses bits to store
    Boolean information, rather than bytes.
  • NullableArray mimics the data structure of Nullable more closely by
    decomposing values and missingness masks. This memory layout means that the
    values component can be operated on by standard functions over Julia's
    Array{T} type.
  • By using a more standard memory layout, we can avoid redefining
    operations. We get, for example, FFT's on NullableArray's for free.
  • By reusing existing functions, we also get to avoid having to define
    elementary operations between two Nullable{T <: Number} objects. We
    don't, for example, have to define + between two Nullable objects in
    order to define linear algebra operations.

Cons:

  • People will surely create Array{Nullable} objects. It will take some
    cultural force to ensure that people learn why those objects aren't
    supported by most libraries.
  • We need to reinvent lots of functionality currently defined over
    AbstractArray, including things like map, reduce, etc. But we have to
    do that anyway, because those functions aren't able to cope with
    Nullable objects anyway.
  • We have to do a lot more work to support an extra data structure.

Nullable{T} is in many ways similar to Array{T} with a size that is
constrained to be either 0 or 1. We may want to introduce "map" or "reduce"
or the iterator interface for nullable types. Similarly, the .+ kind of
operators would make sense, if both nullable objects are either null or
non-null.

-erik

Erik Schnetter schnetter@gmail.com
http://www.perimeterinstitute.ca/personal/eschnetter/

@StefanKarpinski
Copy link
Member

Can't we support both by just writing DataFrames et al. to work with any representation that takes indices and produces Nullables? That would include both Array{Nullable} and NullableArray.

@JeffBezanson JeffBezanson added this to the 0.4 milestone Sep 20, 2014
@JeffBezanson JeffBezanson self-assigned this Sep 20, 2014
@johnmyleswhite
Copy link
Member Author

We can certainly support both in some cases, but it seems pointless to recreate the work we've done to support things like matrix multiplication for Array{Nullable}.

@nalimilan
Copy link
Member

@johnmyleswhite Fine, but then how would you set an element of a NullableArray to be null, without a NULL object equivalent to the old NA? Will that require something like setnull(a, i) instead of a[i] = NULL?

@johnmyleswhite
Copy link
Member Author

a[i] = Nullable{Int}()

@StefanKarpinski
Copy link
Member

We can have some value of type Nullable{None}() and support conversion from that to any kind of Nullable, which should do the trick. I'm not sure what we want to call that, but NA or NULL would be reasonable.

@johnmyleswhite
Copy link
Member Author

It is worth noting that we didn't opt into using const NULL = Nullable{None}() during our discussion in #8152.

There are some potential gains from having NULL, but I worry about negative consequences. In particular, I really don't want to wind up in a situation in which two distinct values that render as NULL behave in powerfully different ways, as occurs in R: https://gist.github.com/johnmyleswhite/fd6cbed2f691a9119cfe

@nalimilan
Copy link
Member

@johnmyleswhite Two big differences from the R behavior illustrated in your commit are that 1) NULL in Julia would not be equal to Nullable{Bool}(), but to Nullable{None}(), and that 2) indexing with missing values fails in Julia (at least for now - but even if it didn't indexing with Nullable{None}() could still trigger an error).

@johnmyleswhite
Copy link
Member Author

@nalimilan: That solves the specific issue in that gist, but doesn't solve the broader lesson: it's dangerous to teach people to think about values that have ill-defined positions in the type system. Even though Nullable{Int}() is more verbose, it's more clear. And it's something you'll probably write quite rarely.

@StefanKarpinski
Copy link
Member

I think that having NULL as a shorthand for Nullable{None}() is pretty reasonable though. We have handy shorthands for a lot of things that behave generically and this seems to me no different.

@johnmyleswhite
Copy link
Member Author

Abstractly, I agree that having NULL as shortand for Nullable{None}() is reasonable. What worries me is how this NULL would be used.

I'm happy to allow a[i] = NULL. Indeed, that was my original idea for creating a NULL constant.

But I'm worried about people writing functions like:

function mean(na::NullableArray)
  s, n = 0.0, 0
  for i in 1:length(na)
    if isnull(na[i])
      return NULL
    else
      s += get(na[i])
      n += 1
    end
  end
  return Nullable(s / n)
end

I'm especially worried that the brevity of NULL will lead people to use it without understanding its position in the type system. That is to say: its strength is what makes it dangerous, because people will misuse things that are brief.

@JeffBezanson
Copy link
Member

It is also confusingly different from what NULL means in C and java and all
other languages that have it.

@eschnett
Copy link
Contributor

This problem could also be avoided by allowing type annotations on functions -- similar to local variables -- and then automatically converting the returned value to this type.

@johnmyleswhite
Copy link
Member Author

That's #1090.

@johnmyleswhite
Copy link
Member Author

If people want shorter ways to create appropriate Nullable objects, I'd rather use Null(T) and NotNull(x::T).

@StefanKarpinski
Copy link
Member

I'm also ok with that.

@eschnett
Copy link
Contributor

Is the name Nullable too long?

I believe Swift uses a question mark after the type to indicate types that are nullable, e.g. Int?. Nullable types could also be useful also for optional arguments, and then a very short syntax to create null objects as default values would also be quite handy.

@johnmyleswhite
Copy link
Member Author

A bunch of languages use ? to indicate Nullable objects. I'm not super fond of it, but I also personally don't feel that Nullable is too long.

@eschnett
Copy link
Contributor

I thought the discussion above was about introducing Null(T) as abbreviation of Nullable{T}().

@johnmyleswhite
Copy link
Member Author

Yes, but I don't actually think we need to make any changes. If we do make changes to provide shorthand for typed nulls, I think Null(T) is the way to do that.

@eschnett
Copy link
Contributor

Or null(T), since Null(T) looks as if it constructed a type Null.

@nalimilan
Copy link
Member

OTOH there's 0 and zero(T), so patterns requiring people to (sometimes) carefully choose the type of variables and return values already exist in Julia. Having both NULL and Nullable(T) would follow this schema.

To me the strongest argument in favor of NULL is that a short string is needed to print missing values in NullableArrays. For this use case, including the type would be a waste of space; and it would be good that the value used for printing can be used in the code. But instead of NULL, null() without any type argument could be fine too.

@johnmyleswhite
Copy link
Member Author

I think we can set up NullableArray objects to print out null entries as NULL without needing to define NULL. Think of NULL as the showcompact for Nullable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

6 participants