-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unspecified behavior of mapreduce
is bad
#49042
Comments
From what I understand, the nomenclature is unfortunate, and If provided, `init` must be a neutral element for `op` that will be returned for empty collections. This implies that In your case, perhaps you might prefer if converged(sol) && mapreduce(map(v -> v > 0, &, sol.zero) |
You are right that the documentation states that reduce(*, [2; 3; 4]; init=-1) There are plenty of alternatives for my example, that's not the issue. My points are two. The first is that mathematically Second, documenting some behavior of an algorithm implementation as unspecified is unhelpful. Unspecified behavior may be acceptable in standards that want to allow for radically different implementations. |
Documenting it as unspecified allows Julia's implementation to change without being a breaking change. In other words, this is about being able to have different implementations across time. That being said, the example you point to with Edit: Actually, it might be an issue (a bug?) that the two documented methods of Given that Are there any optimisation opportunities we lose if the guarantee that |
See #44819. |
I guess I don't clearly understand the difference between foldr :: (a -> b -> b) -> b -> t a -> b
foldl :: (b -> a -> b) -> b -> t a -> b Associativity is irrelevant if the types |
Re-ordering the operations can have various benefits. For example, re-ordering floating point operations allow you to use SIMD, making it much faster. It also enables pairwise summation, reducing the impact of float rounding errors. So, if you have a function which is associative, you would use |
CUDA does parallel reduction, and relies on julia> let x = collect(1:10^4), init=2
a = reduce(+, x; init)
b = reduce(+, CuArray(x); init)
a, b, sum(x)
end
(50005002, 50046150, 50005000) |
It confuses me too. Both C++ and Python treat C++: https://en.cppreference.com/w/cpp/algorithm/reduce
Python: https://docs.python.org/3/library/functools.html#functools.reduce
|
The definition of |
If |
There is InitialValues.jl which says
I'm not sure if it's better to do it with the two-argument |
The type is important. For example for julia> v = [1, 2]
2-element Vector{Int64}:
1
2
julia> vcat(v, [])
2-element Vector{Any}:
1
2 |
InitialValues.jl gives the following example:
So an Personally I prefer to leave the feature as |
It is a suggestion to remove the requirement that the |
Trying to wrap my head around this question a bit, it seems like the original PR for |
Perhaps it's not correct to say this is a case of "unspecified behavior". My understanding is that you should assume we could apply "op(init,x)" an arbitrary number of times to your input. Only the number of times is not specified. The idea is that we are enabling a potential parallel version of the reduce based on a number of Or I guess the number of applications is specified as "at least once", what includes the case of an empty list? And this would be the slightly complicated detail. Everything else, whether it's something suitable to do or not according to the operator in eg |
There is no reason that |
I didn't mean it would have to be like this, it's just one option on my mind. I believe you're looking at it from the point of view of what is strictly required, that init is the output at an empty list, and that's great. I'm merely illustrating a situation that exercises the other requirement: that it might be applied an indefinite number of times. When you write down your fold(s) as a for-loop it's very handy to initialize an accumulator variable before the loop with something such as an "init" value (eg zero in a summation) instead of carefully loading the first datapoint and then proceeding from there --- what actually implies testing for an empty list as well, I believe that may be the best illustration to why this makes sense. |
My memory is a bit vague. I feel There might be an argument that (eg parallel) algorithms might use a divide-and-conquer approach and benefit from using I’m not sure if a different verb (generic function) is necessary for when we assume the operator is associative, or commutative, or that |
EDIT: it's pointed out below that my tirade here only concerns one of several (sometimes conflicting) purposes of The If it wasn't a pun, then we would have Do I think that we should have An Right now the correct thing to do is write (e.g.) Under the semantics of |
It's worth connecting this to #52397 — one of the sticky points there is figuring out exactly how far to lean into I can count four distinct (but entwined) motivating reasons why someone might use
But when it comes to a order-unspecified (possibly parallel) |
I think the aside comment points to #52004 and a couple others like it. Just waiting for a rebase if someone wants to take those over. |
I agree it would have been better for |
The documentation for
mapreduce
says that it is unspecified if theinit
key argument is used for non-empty collections. Since there is a singlejulia
implementation leaving the behavior unspecified is bad. In general theinit
argument is a starting point that you want to combine with the rest. For exampleUsing the
init
argument to specify the result in case of an empty collection is a special case.The text was updated successfully, but these errors were encountered: