-
Notifications
You must be signed in to change notification settings - Fork 21
[RFC] Fix [map]reduce with no non-null and single non-null element #158
base: master
Are you sure you want to change the base?
Conversation
0.4 and 0.5 ok, failing on nightlies in |
Codecov Report@@ Coverage Diff @@
## master #158 +/- ##
===========================================
- Coverage 85.89% 58.12% -27.77%
===========================================
Files 14 13 -1
Lines 865 855 -10
===========================================
- Hits 743 497 -246
- Misses 122 358 +236
Continue to review full report at Codecov.
|
Also fixed the return type of Actually, what should be the behaviour of |
Now fails on nightlies and also on 0.4 (all nulls test) I would like to fix that, but we should clarify the correct behaviour first. |
We should do the same thing as what Base does on empty arrays. Apparently it throws an error, so let's continue doing that. If that's annoying, we should file an issue in Julia. |
@nalimilan I suppose one of the Base.mr_empty{T}(f, op, Nullable{T}) = Nullable{T}() |
OK, that's an argument. Yet I guess Base has a strong reason for throwing an error instead of guessing the return type. I'm not sure which one, since in theory we should be able to infer the return type even when no element is present as long as we know the element type. But the definition you give doesn't do that: it assumes I think we need 1) to understand why Base does this, and 2) see how we can use |
@nalimilan You're right, this definition is not correct, it should be something like function Base.mr_empty{T<:DataType}(f, op, ::Type{Nullable{T}})
f_T = Core.Inference.return_type(f, Tuple{T})
op_T = Core.Inference.return_type(Base.r_promote, Tuple{typeof(op), f_T})
return Nullable{op_T}()
end Base defines the result of reduction over an empty array when "mathematically" it has some sense: |
exposes some corner cases that are not observed with N>2
Avoid counting the zeros twice (for missingdata flag and for nnull) if skipnull=true.
* mapreduce() returns Nullable{T}(mr_empty) instead of just mr_empty * added tests for [map]reduce() over non-empty collection that contain no non-null
Shouldn't the function call Anyway, I wonder whether we couldn't get rid of the x = [Nullable{Int}()]
mapreduce(x->x*Nullable(2), +, x) # -> Nullable{Int64}() So I'm not sure why we would need our custom method. |
The problem is that there is no valid element of type
Since
|
Use But I'd really like to know why Base doesn't does the same when the type can be inferred. There must be an issue about it. |
I cannot find
I think it's just "an issue" to be created. There's also e.g. julia> sum(Nullable{Float64}[])
ERROR: MethodError: no method matching zero(::Type{Nullable{Float64}})
Closest candidates are:
zero(::Type{Base.LibGit2.Oid}) at libgit2/oid.jl:88
zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:80
zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:120
...
in _mapreduce(::Base.#identity, ::Base.#+, ::Base.LinearFast, ::Array{Nullable{Float64},1}) at ./reduce.jl:148
in sum(::Array{Nullable{Float64},1}) at ./reduce.jl:229
in eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:64
in macro expansion at ./REPL.jl:95 [inlined]
in (::Base.REPL.##3#4{Base.REPL.REPLBackend})() at ./event.jl:68 |
Funny, it's been added two days ago in master. So I guess we need to use the less pretty form for now. Would be nice to have a The issue of reductions on empty collections is hardly new. See the links at: Though it might not have been discussed again since the introduction of As regard the |
|
What alternative is there to throwing an error when one reduces over an empty array? Since The case of
I'd argue that this is almost exactly what we want. I kind of want to have a discussion about whether I think that, in general, we should try to rely extant frameworks in Base as much as possible. The main reasons not to are (1) performance concerns and (2) usability concerns. The community seems to be heading towards addressing (2) in packages that are closer to (tabular) data structures built on top of |
@davidagold Thanks for your thorough considerations. I totally agree that the Nullable behaviour should be defined by the Base as much possible. Actually, this PR started as a simple fix for the typo in 1-non-null case (it throws an error), which was not covered by the tests, and then 0-non-null return type. So while the behaviour fixes should be probably done in Base, I guess there should be some immediate fix for the bug. Re the behaviour. It could work that the downstream packages (e.g. |
@alyst hmm, you're definitely right that there's a fifth behavior we want, namely non-throwing over non-empty but all-null
|
|
The difference between We could handle this by requiring/lowering to > x <- c()
> mean(x)
[1] NA
Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA Though it doesn't seem to be returning |
Actually, R doesn't warn if it's an empty logical or numeric, > x <- c()
> x
NULL
> x <- as.integer()
> x
integer(0)
> mean(x)
[1] NaN
> mean(x, na.rm=TRUE)
[1] NaN |
Good point. Looks like Julia implements the same behavior for |
Got it, thanks. Indeed returning null whenever the input contains a null isn't really useful for a type which is designed precisely to support missing values. I've always argued that this apparent correctness wouldn't offer any additional safety to users for who there always are nulls in the data. But as you say this can be handled by convenience macros. |
For consistency with #166, I think we should have This approach avoids calling |
Also test
[map]reduce()
with N=2 to expose this corner case.