Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unspecified behavior of mapreduce is bad #49042

Open
fedeinthemix opened this issue Mar 18, 2023 · 23 comments · May be fixed by #53945
Open

unspecified behavior of mapreduce is bad #49042

fedeinthemix opened this issue Mar 18, 2023 · 23 comments · May be fixed by #53945
Labels
fold sum, maximum, reduce, foldl, etc.

Comments

@fedeinthemix
Copy link

The documentation for mapreduce says that it is unspecified if the init key argument is used for non-empty collections. Since there is a single julia implementation leaving the behavior unspecified is bad. In general the init argument is a starting point that you want to combine with the rest. For example

sol = nlsolve(f!, initial_x)
 # mapreduce doesn't specify if `init` is used. Need to use reduce(..., map(...))
 if mapreduce(map(v -> v > 0, &,  sol.zero, init=converged(sol))
     return ...
 else
   ...
 end

Using the init argument to specify the result in case of an empty collection is a special case.

@jishnub
Copy link
Contributor

jishnub commented Mar 21, 2023

From what I understand, the nomenclature is unfortunate, and init doesn't specify a starting point, rather it provides the identity element in the reduction. The documentation of mapreduce states this as well:

If provided, `init` must be a neutral element for `op` that will be returned for empty collections.

This implies that init is not arbitrary, and must satisfy op(op(init, x), y) == op(x, y). For the operator &, therefore, init should be set to true. In such a case, whether init is used or not doesn't change the result, and it's left as an implementation detail.

In your case, perhaps you might prefer foldl which guarantees that init is used exactly once, even if it's not set to the identity element. Or, perhaps the more explicit

if converged(sol) && mapreduce(map(v -> v > 0, &, sol.zero)

@fedeinthemix
Copy link
Author

fedeinthemix commented Mar 21, 2023

You are right that the documentation states that init must be a neutral element of the operation, but then the similarly worded documentation for reduce shows the following example

reduce(*, [2; 3; 4]; init=-1)

There are plenty of alternatives for my example, that's not the issue. My points are two. The first is that mathematically init doesn't have to be the neutral element (like -1 above) and it would be useful to allow this generality (like in the example I gave). Some other languages embrace this freedom
https://hackage.haskell.org/package/base-4.18.0.0/docs/Prelude.html#v:foldr

Second, documenting some behavior of an algorithm implementation as unspecified is unhelpful. Unspecified behavior may be acceptable in standards that want to allow for radically different implementations.

@jakobnissen
Copy link
Contributor

jakobnissen commented Mar 21, 2023

Second, documenting some behavior of an algorithm implementation as unspecified is unhelpful. Unspecified behavior may be acceptable in standards that want to allow for radically different implementations.

Documenting it as unspecified allows Julia's implementation to change without being a breaking change. In other words, this is about being able to have different implementations across time.

That being said, the example you point to with reduce(*, [2; 3; 4]; init=-1) should indeed be changed, since its usage is incorrect.

Edit: Actually, it might be an issue (a bug?) that the two documented methods of reduce disagree on what init is. Is it the initial value, or the value returned on empty collections? Is it guaranteed to be used once? We should definitely clean this up.

Given that init has been documented to be the initial value for reduce on arrays, and it's called init, and the similar keyword in foldl, I think we should make it documented behaviour that it's used exactly once. Since we don't guarantee associativity, I'm not sure it's even meaningful to define it as the "initial value".

Are there any optimisation opportunities we lose if the guarantee that init is used exactly once?

@N5N3
Copy link
Member

N5N3 commented Mar 21, 2023

See #44819.

@fedeinthemix
Copy link
Author

Since we don't guarantee associativity, I'm not sure it's even meaningful to define it as the "initial value".

I guess I don't clearly understand the difference between reduce and foldr/foldl. Using (for clarity) Haskell's notation for the type of functions, the latter two require functions with a different signature

foldr :: (a -> b -> b) -> b -> t a -> b
foldl :: (b -> a -> b) -> b -> t a -> b 

Associativity is irrelevant if the types a and b are the same, and the function is associative. In that case the two give the same result, but with different memory and speed results. Was reduce added as a way of helping picking the fastest one? If not, what is the actual semantic of reduce then?

@jakobnissen
Copy link
Contributor

Re-ordering the operations can have various benefits. For example, re-ordering floating point operations allow you to use SIMD, making it much faster. It also enables pairwise summation, reducing the impact of float rounding errors.

So, if you have a function which is associative, you would use reduce in order to let the implementation decide the best order of operations. In that sense, foldl and foldr can be seen as special cases of reduce.

@mcabbott
Copy link
Contributor

Are there any optimisation opportunities we lose if the guarantee that init is used exactly once?

CUDA does parallel reduction, and relies on init being the neutral element:

julia> let x = collect(1:10^4), init=2
        a = reduce(+, x; init)
        b = reduce(+, CuArray(x); init)
        a, b, sum(x)
       end
(50005002, 50046150, 50005000)

@ctarn
Copy link
Contributor

ctarn commented Apr 5, 2023

It confuses me too. Both C++ and Python treat init of reduce as initial value or starting point.

C++: https://en.cppreference.com/w/cpp/algorithm/reduce

Generalized sum of init and *first, *(first+1), ... *(last-1) over binary_op

Python: https://docs.python.org/3/library/functools.html#functools.reduce

If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty.

@jishnub
Copy link
Contributor

jishnub commented Apr 5, 2023

The definition of functools.reduce in python seems to be equivalent to foldl in Julia

@N5N3 N5N3 added the fold sum, maximum, reduce, foldl, etc. label Apr 5, 2023
@fedeinthemix
Copy link
Author

If init must be a unit and the operation op associative, then op and type ty form a monoid. Wouldn't it be useful to add something like monoidunit(ty, op) generalizing e.g. one, ones, zero and zeros. Then the init parameter could be either dropped or, more usefully, used as an initial value.

@jariji
Copy link
Contributor

jariji commented Apr 6, 2023

There is InitialValues.jl which says

InitialValues.jl provides a generic singleton initial value Init(f) that can be used as a₀ in f(a₀, x). For a binary operator op, it means that Init(op) acts like the left identity for any type of x

I'm not sure if it's better to do it with the two-argument monoidunit(t, op) or the one-argument wrapper Init(op).

@fedeinthemix
Copy link
Author

fedeinthemix commented Apr 6, 2023

I'm not sure if it's better to do it with the two-argument monoidunit(t, op) or the one-argument wrapper Init(op).

The type is important. For example for + if you specify the type you can use it for any scalar number, but also matrices, vectors, ... Also, without type you may loose type information. For example for vector concatenation the unit is the empty one, but without specifying the type you get

julia> v = [1, 2]
2-element Vector{Int64}:
 1
 2

julia> vcat(v, [])
2-element Vector{Any}:
 1
 2

@ctarn
Copy link
Contributor

ctarn commented Apr 6, 2023

InitialValues.jl gives the following example:

julia> float(InitialValue(*))
1.0

julia> Integer(InitialValue(+))
0

So an InitialValue can also be casted to a specific type.

Personally I prefer to leave the feature as InitialValues.jl instead of adding it to standard lib of Julia.

@fedeinthemix
Copy link
Author

Personally I prefer to leave the feature as InitialValues.jl instead of adding it to standard lib of Julia.

It is a suggestion to remove the requirement that the init parameter of reduce and mapreduce be a unit. Instead the implementation could use monoidunit.

@vtjnash
Copy link
Member

vtjnash commented Oct 31, 2023

Trying to wrap my head around this question a bit, it seems like the original PR for init left the meaning of init unspecified for mapreduce, but did specify that reduce is defined in terms of mapreduce and specified that init have the meaning of being the first element of the reduction, thus transitively enforcing this upon mapreduce also:
https://github.com/JuliaLang/julia/pull/27711/files#diff-cc983f27f9c33e9b278f74989b8aaca48e461e0717bb96ea116d02754554746fR290. I don't know if @andyferris remembers this from that time, and if there was a counter-example that prevented it from being defined at that point?

@nlw0
Copy link
Contributor

nlw0 commented Oct 31, 2023

Perhaps it's not correct to say this is a case of "unspecified behavior". My understanding is that you should assume we could apply "op(init,x)" an arbitrary number of times to your input. Only the number of times is not specified. The idea is that we are enabling a potential parallel version of the reduce based on a number of folds that start with init. Is this fair to say?

Or I guess the number of applications is specified as "at least once", what includes the case of an empty list? And this would be the slightly complicated detail. Everything else, whether it's something suitable to do or not according to the operator in eg maximum or sum is just a consequence of the operator. My point is to clarify that it all sounds crazy unless you know you're talking about sum etc, and then it all sounds perfectly reasonable. And saying this is "unspecified behavior" doesn't do justice to this reasonableness.

@vtjnash
Copy link
Member

vtjnash commented Oct 31, 2023

There is no reason that fold needs to use init other than exactly once. In a parallel implementation, the init would likely be dealt with at the end, in the last reducer step, since non-empty chunks do not need it. Indeed, it is required that the implementations can use init zero times already, so it merely needs to arrange for init to only be used in the final serial potion (or any one of the initial parallel portions--either option is equivalent)

@nlw0
Copy link
Contributor

nlw0 commented Nov 1, 2023

I didn't mean it would have to be like this, it's just one option on my mind. I believe you're looking at it from the point of view of what is strictly required, that init is the output at an empty list, and that's great. I'm merely illustrating a situation that exercises the other requirement: that it might be applied an indefinite number of times. When you write down your fold(s) as a for-loop it's very handy to initialize an accumulator variable before the loop with something such as an "init" value (eg zero in a summation) instead of carefully loading the first datapoint and then proceeding from there --- what actually implies testing for an empty list as well, I believe that may be the best illustration to why this makes sense.

@andyferris
Copy link
Member

andyferris commented Nov 1, 2023

My memory is a bit vague. I feel init should probably be used if specified, not just for empty collections.

There might be an argument that (eg parallel) algorithms might use a divide-and-conquer approach and benefit from using init more than once.

I’m not sure if a different verb (generic function) is necessary for when we assume the operator is associative, or commutative, or that init is an identity, as opposed to one where the result must match the accumulation in iteration order as if you wrote it out imperatively?

@mikmoore
Copy link
Contributor

mikmoore commented Mar 20, 2024

EDIT: it's pointed out below that my tirade here only concerns one of several (sometimes conflicting) purposes of init. As such, it is incomplete in both its analysis and proposed "solution."

The init keyword is ridiculous except in the fold family where it is guaranteed to be applied exactly once. The reduce family should never have had this (badly-defined) init. It's a pun and it's a bad one.

If it wasn't a pun, then we would have maximum(Float64[]) == -Inf because the only valid init for this call is -Inf.
Instead it's a MethodError. (ASIDE: this should be an ArgumentError). We also have a minor infraction with our current sum(Float64[]) == +0.0 because the additive identity for IEEEFloat is -0.0 (compare +(-0.0, -0.0) and +(-0.0, +0.0).

Do I think that we should have maximum(Float64[]) == -Inf? No, it's a footgun. Do I think we should have sum(Float64[]) == -0.0? No, it's obnoxious and would be wildly unpopular. But this just shows that init was the wrong keyword with the wrong semantics.

An init keyword is useless for a reduce because the reduction order is unspecified. In other words, sum(x) + y is programmatically equivalent to what would want from sum(x; init=y) (if this were usable). I think we should deprecate init for all reduce functions.

Right now the correct thing to do is write (e.g.) isempty(x) ? DEFAULT_VALUE : maximum(x). If we want a keyword for this, we should make one like ifempty (insert bikeshedding) with usable semantics.

Under the semantics of init, we could make a nonbreaking change to deprecate init and introduce ifempty if we default ifempty = init when init is specified. This would simply resolve the ambiguous behavior of init to the semantic "only apply to empty collections when ifempty is not defined."

@mbauman
Copy link
Member

mbauman commented Mar 20, 2024

It's worth connecting this to #52397 — one of the sticky points there is figuring out exactly how far to lean into init's unspecified-ness and what its current docs mean.

I can count four distinct (but entwined) motivating reasons why someone might use init in the context of foldl:

  1. Provide the return value when empty
  2. Ensure op is called for all non-empty collections (including single-element collections)
  3. A shorthand to pretend init exists in the collection exactly once like foldl(op, [init; x]) or sum(x) + init.
  4. Ensure all operations happen on a promoted type like foldl(+, rand(UInt8,100), init=0)

But when it comes to a order-unspecified (possibly parallel) reduce, points 3 and 4 are in direct conflict with eachother. And while I am in near-complete agreement with @mikmoore's points above, renaming this to ifempty leaves point 2 in the dark.

@vtjnash
Copy link
Member

vtjnash commented Mar 20, 2024

I think the aside comment points to #52004 and a couple others like it. Just waiting for a rebase if someone wants to take those over.

@JeffBezanson
Copy link
Member

I agree it would have been better for reduce not to have the init argument. But it does, and what it does in Base seems to be Matt's point (3) above, pretend it's added to the collection once, so that's probably what the spec should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.