Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Some nullable operations can throw exceptions #116

Closed
TotalVerb opened this issue Jun 15, 2016 · 14 comments
Closed

Some nullable operations can throw exceptions #116

TotalVerb opened this issue Jun 15, 2016 · 14 comments

Comments

@TotalVerb
Copy link

TotalVerb commented Jun 15, 2016

Not all of the operations defined here are safe, because some operations are not defined on the entire domain, even if the types are bitstypes. Null values should not cause exceptions, especially in unpredictable ways.

Example:

julia> Nullable{Int}() ^ (Nullable{Int}() - Nullable{Int}() - Nullable{Int}())
ERROR: DomainError:
Cannot raise an integer x to a negative power -n. 
Make x a float by adding a zero decimal (e.g. 2.0^-n instead of 2^-n), or write 1/x^n, float(x)^-n, or (x//1)^-n.
 in power_by_squaring(::Int64, ::Int64) at ./intfuncs.jl:85
 in ^(::Nullable{Int64}, ::Nullable{Int64}) at /home/fengyang/.julia/v0.5/NullableArrays/src/operators.jl:38
 in eval(::Module, ::Any) at ./boot.jl:225
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

(this happens to crash on my REPL fairly often, but your mileage may vary)

It's somewhat rare to actually have issues, but the following three operations should definitely be removed:

  • % (doesn't work with 0)
  • ^ (doesn't work with negative exponents)
  • sqrt (doesn't work with negatives)
@nalimilan
Copy link
Member

Actually, no operation is really safe for all possible types, even bitstypes. For example, a checked integer type would throw an error in case of overflow with +. I think what we need is a an internal undef_safe_op(f::Function, T::Type...)::Bool which we would define as true for standard bitstypes and standard operations which are OK to call on any value (i.e. not the ones you list above).

Since you seem to be willing to work on Nullable, would you want to open a PR against Base to implement the operators which currently live in this package, but in fully safe approach? Cf. #111.

@davidagold
Copy link
Contributor

Would a more appropriate name be null_safe_op? Also, how would such a solution solve the overflow problem? That seems to be distinct from (yet related to) what @TotalVerb cites in their original post.

@nalimilan
Copy link
Member

I proposed undef_safe_op since it's not really related to null strictly speaking: it just means that one can always apply the operation. But unsafe isn't really accurate either, so maybe null is better.

Whatever its name, it would prevent the "overflow problem" by forcing a checked integer type to use the slow path in which the operation is only applied when the value is not missing. In the original example, ^ wouldn't be call with a negative exponent, so it wouldn't throw an error.

@davidagold
Copy link
Contributor

I see now. Thank you for explaining.

@TotalVerb
Copy link
Author

TotalVerb commented Jun 15, 2016

@nalimilan Could you point to any discussion where abs, abs2, and sqrt were decided to be autolifted? I find that behaviour strange, especially since very similar functions are not. I would prefer to keep autolifting to operators with an elementwise equivalent.

I am not a big fan of autolifting or autovectorization, but sane syntax is needed for elementwise lifted operations, and otherwise one would need to autolift elementwise operations on NullableArrays, which seems even more of a special case. So autolifting operators with elementwise equivalents seems sane.

@davidagold
Copy link
Contributor

@TotalVerb IIRC there was no discussion -- I believe I included abs and abs2 because they were needed for some reducing operations to be extended to NullableArrays. The fact that I included sqrt along with the (then used) Base.MinFun and Base.MaxFun "functors" (b6b9902) suggests to me that it was also included as something that was needed for a reducing operation. So, their inclusion was not entirely arbitrary, but it wasn't terribly principled either.

@TotalVerb
Copy link
Author

@davidagold Thanks for the explanation. These reducing operations have been reimplemented here anyway, so I think we are better off using explicit lifting, especially once abs.(A) syntax becomes available.

@nalimilan
Copy link
Member

once abs.(A) syntax becomes available.

AFAIK it is available on 0.5 (and on 0.4 using Compat), so we can start deprecating the lifting version as soon as your broadcast implementation is merged. That won't be possible immediately for sqrt because of type inference issues (the function has to return the same type or it fails, which is an issue e.g. for integer arrays).

@davidagold
Copy link
Contributor

@TotalVerb Also, I meant to ask, what do you mean by an "elementwise equivalent" --- i.e., this notion according to which you differentiate between operators such as abs and operators that you believe should have some auto-lifting semantics?

@nalimilan
Copy link
Member

There's a weird interaction with JuliaLang/julia#16961: in that PR, lifting on a single Nullable happens via broadcast, i.e. f.(x::Nullable) gives Nullable(f(x)) when not null. OTOH, with NullableArrays, f.(a::NullableArray) = NullableArray(f.(a.values)) when not null. This definition is different from what we'd get with an Array{Nullable}: there, f.(a::Array{Nullable}) = [f(x::Nullable) for x in a]. The lifting semantics of NullableArray correspond to a double elementwise operation, something like f..(a).

That's not necessarily an issue, but it means NullableArray is not merely an more efficient alternative to Array{Nullable}, but a different structure which is also more convenient to work with. I think we need to keep this in mind and make sure that's what we want.

@davidagold
Copy link
Contributor

@nalimilan didn't you bring up this automatic lifting behavior of broadcast when entries are all nonnull elsewhere? I can't remember where. I kind of think we should change that behavior.

@nalimilan
Copy link
Member

Not sure, there have been so many discussions in different places. The problem if we get rid of the automatic lifting with NullableArray is that we'll need to find another convenient syntax (a macro?). I'm still not certain what's the best approach.

@davidagold
Copy link
Contributor

I'm pretty sure the reason I put that behavior in there in the first place was for performance --- if no entries are null, just treat it like broadcasting over a regular array. I think I didn't realize that I was conflating that optimization with lifting behavior. Is there anything wrong with map(f, X, lift=true)?

@nalimilan
Copy link
Member

Fixed by #119.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants