Clarify the possible uses of the `init` keyword in `minimum`, `maximum` and `extrema` #44819

greimel · 2022-04-01T08:14:21Z

The init keyword is quite nice because it allows to compute extrema iteratively (e.g. update bounds as new data come in).

julia> x = 0:10
0:10

julia> y = -1:5
-1:5

julia> extrema(x)
(0, 10)

julia> extrema(y)
(-1, 5)

julia> extrema(y, init=extrema(x)) == extrema(x ∪ y) == (-1, 10)
true

This PR documents and tests this use case. See also #43604 (comment)

tkf

This is not a valid use case. For the reduce-family of API, init does not mean initial value. It means identity/neutral element. If we were to mention this type of usage, I believe that we should mention that the user has to combine the result using the corresponding binary operator:

x1, x2 = extrema(xs)
y1, y2 = extrema(ys)
min(x1, y1), max(x2, y2)

N5N3 · 2022-04-01T08:35:23Z

I believe many users thought init means initial value, (and the default init is named as _InitialValue())
The current implement in Base use init as initial value rather than a identity/neutral element.

If init is used as the identity/neutral element, we should never use it if the input is not empty.

tkf · 2022-04-01T09:11:09Z

the default init is named as _InitialValue()

This is mainly a mechanism for foldl. That's why I only talked about "the reduce-family of API."

If it's a identity/neutral element, we should never use it if the input is not empty.

An empty collection is the identity element of the free monoid (i.e., isequal(vcat(vector, []), vector)). Considering reduction as a monoid morphism (i.e., isequal(reduce(⊗, vcat(xs, ys)), reduce(⊗, xs) ⊗ reduce(⊗, ys))), it is exactly what we should use.

N5N3 · 2022-04-01T09:29:38Z

This is mainly a mechanism for foldl. That's why I only talked about "the reduce-family of API."

This make sense. I just realize that assuming init as identity/neutral element also simplify parallel-reduction. (Although we don't have that in Base.)

greimel · 2022-04-04T12:53:42Z

If it's a identity/neutral element, we should never use it if the input is not empty.

An empty collection is the identity element of the free monoid (i.e., isequal(vcat(vector, []), vector)). Considering reduction as a monoid morphism (i.e., isequal(reduce(⊗, vcat(xs, ys)), reduce(⊗, xs) ⊗ reduce(⊗, ys))), it is exactly what we should use.

I don't understand this, sorry.

Could you elaborate why my attempt seemed to work in my examples above? What would have to happen so that extrema(x, init=extrema(y)) != extrema(x ∪ y)?

Given this potential source of confusion, shouldn't init give an error if the itr is not empty?

nlw0 · 2022-04-05T08:44:46Z

Completely agree with @tkf. If I may try to help explain, the problem is that there are clear constraints expected from the arguments, but unfortunately it is not easy, or maybe possible, to enforce these constraints with code.

I think we can say init will probably always behave like it was an extra element appended to the input vector, and we kind of have to live with that. It's not supposed to be always like that, though. The user is expected to provide a value that is consistent with a monoid. Otherwise, it's basically abusing the semantics of the function, and the user is susceptible to facing bugs if this was a parallelized version of the function, for instance.

There seems to be little that can be done other than offering defaults and explaining in the documentation that init is supposed to be the Identity element of a monoid defined along with op. Changing the name seems pretty drastic to me.

To be clear, the idea is that this function has a great potential to being parallelized. Or even more than that, the user should not assume that underneath the function we will simply iterate over the list and do ((a+b)+c), or ((init+a)+b)+c, etc. ((((a+init)+b)+init)+(init+c)) should return the same result. That's the "contract" if you will. It so happens that in the way things are implemented, especially in single-threaded execution, using init as means to append a value to the input works. But this is not the "contract", not guaranteed. reduce should feel free to op(init,...) your data as many times as it wants.

In fact, it's easy to see how this is a problem when you consider strings or list concatenation. Should this init go to the left or right of the result? Only if it's an empty list or string this will not matter. Once you consider non-commutative operations, it becomes clearer that you cannot really use init like that.

It might be great, in fact, to have knowledge about whether op is commutative or not. Working with commutative monoids and Abelian groups can bring advantages. reduce is intended to the more general cases, though.

greimel · 2022-04-05T09:08:23Z

Thanks, @nlw0 for your input.

Not sure if I've understood it already. You are saying that one requirement for init is that one should be able to push!(vec, init) arbitrarily often.

I see that this is a problem for sum.

sum([itr; init]) != sum([itr; init; init])

But this is not a problem for minimum, maximum, and extrema.

minimum([itr; init]) == minimum([itr; init; init]) == minimum([itr; fill(init, N)])

What am I missing?

EDIT: Does the contract imply that init is "appended" at least once?

greimel · 2022-04-05T09:35:16Z

And to add a little more confusion, here are two contradicting lines from the reduce docstring.

The first line

julia/base/reduce.jl

Line 430 in bf53498

    
           collections. It is unspecified whether `init` is used for non-empty collections.

answers my previous question (no, it's not guaranteed that init is used for non-empty collections). But the second line

julia/base/reduce.jl

Lines 454 to 455 in bf53498

    
           julia> reduce(*, [2; 3; 4]; init=-1) 
        
           -24

relies on its use for a non-empty collection.

EDIT: Even worse: this example relies on applying init exactly once. So this would qualify as an invalid use case according to @tkf, I suppose?

nlw0 · 2022-04-05T09:39:06Z

The contract implies that init might be inserted within your data zero or more times, unless it's an empty list and then it's one or more times. Reducing a list of "n" times init should be init.

min and max is a bit of a weird case. Julia does not specify a default init for reduce, and minimum and maximum do not accept empty lists. I suspect +Inf and -Inf would work as neutral elements to constitute a monoid with min and max over numbers, but I understand most programmers would rather have minimum(Int32[]) fail on an empty list than to get 4294967295 as an answer.

In fact, you won't have a problem with min and max as long as init is a data value in your original list. The result is going to be consistent. It's like you're defining a peculiar, ad-hoc monoid, but it does work. It's a bit of a quirk of this specific case... And it implies you're not processing an empty list.

nlw0 · 2022-04-05T09:50:40Z

The case reduce(*, [2; 3; 4]; init=-1) is that story, it runs, but it's breaking the monoid laws. It's not really guaranteed that the result should be -24 it might be 24 as well. But you can bet a milkshake it's going to be -24. Just like reduce(*, ["aa","bb","cc"], init="zz") is probably always going to be "zzaabbcc", but it's not guaranteed. Nothing that is not consistent with monoid laws is guaranteed, but you may still get deterministic behavior.

The docstring is saying that you "can't" use -, but actually you "can", it's just bonkers. Again, it violates the monoid laws, but the function call will run...

About the function not specifying that init should be returned, I don't know about that, I would imagine that it should specify, but I kind of understand being conservative and leaving up to the user the responsibility of handling empty inputs.

nlw0 · 2022-04-05T11:11:35Z

A short summary, perhaps:

These are good to go reduce(+, [1,2,3], init=0), reduce(*, [1,2,3], init=1) and reduce(*, ["aa","bb","cc"], init="")
These don't obey the monoid laws: reduce(-, [1,2,3], init=0), reduce(*, [1,2,3], init=-1) and reduce(*, ["aa","bb","cc"], init="zz")
(- is not associative, the others don't have valid identity elements)
reduce(min, [1,2,3],init=3) sort of follows the monoid laws as well... There's probably a special name for this group.
reduce(op, [], init=init) the docstring says init is necessary for empty inputs, but that it is unspecified whether it is going to be used. I would suggest we would perhaps like to specify that reduce will return init for empty inputs as long as it follows the monoid laws. Otherwise getting a result of op(init, init) should be permissible as well. I don't know if there's a reason for leaving it completely unspecified other than not willing to commit to anything in the case it's not following the monoid laws, or just that it's difficult to explain.

vtjnash · 2023-10-30T17:22:20Z

base/reduce.jl

 other element) as it is unspecified whether `init` is used
-for non-empty collections.
+for non-empty collections. If `init > maximum(itr)`, return `init`.


I am okay with making the behavior specified here (c.f. #49042), but the docstring needs to be consistent and not both say it is unspecified behavior and to also specify the behavior here

My understanding is that the current maximum (and minimum and maybe elsewhere?) documentation is merely repeating what is said about the init parameter of mapreduce. But this advice is only true for mapreduce in general, and not for the specific cases of maximum and minimum. Here we should be able to say init will indeed be treated as just another element in the input, and will be returned as the output in case of an empty list. It would just be nice to have 1. a confirmation that this is the desired behavior for maximum, minimum and mapreduce and 2. perhaps the ability to prove this is the case looking at the implementation, and what methods are called in the specific case of maximum and minimum. I'm unfortunately not too familiar with the implementation, I can't easily make sense of it myself, and I'm not sure where the implementation for these methods diverges compared to other reducing operations (if it diverges at all. is it still unspecified in general for mapreduce?).

In other words, I suggest actually removing the text mentioning anything unspecified for these methods, and only mention the term will be output for an empty list, and will generally be treated as an extra item in the input.

That behavior is already specified for mapfoldr, mapfoldl, and mapreduce(identity), so it seems reasonable to assume the same for mapreduce(max) and thus maximum as well. But that is the question intended to be answered by #49042 (it looks only like a doc change to me now).

Aside, to be pedantic about this question:

reduce(min, [1,2,3],init=3) sort of follows the monoid laws as well... There's probably a special name for this group

I think this is exactly the same monoid law as the first example. In particular, the init is supposed to be ranging over the domain of the inputs. So if the input was UInt8 instead, then the init is 0xff instead. But if the input function generating that array was 2pi*sin%Int, then the init is arguably 6, since the domain of that input function is [-6, 6]. Using typemax is just a rough approximation of the expected domain in any case.

adienes · 2024-09-11T16:54:28Z

probably subsumed / closed by documentation improvements in #53945

greimel added 2 commits April 1, 2022 09:41

maximum & minimum & extrema with init

9511405

update docstrings for minimum, maximum, extrema

ada79a0

greimel mentioned this pull request Apr 1, 2022

Fix extrema(x; dims) for inputs with NaN/missing #43604

Merged

5 tasks

tkf suggested changes Apr 1, 2022

View reviewed changes

N5N3 added docs This change adds or pertains to documentation fold sum, maximum, reduce, foldl, etc. labels Apr 1, 2022

nlw0 mentioned this pull request Apr 5, 2022

Allow empty reductions for maximum(Unsigned) and compose #44702

Open

N5N3 mentioned this pull request Mar 21, 2023

unspecified behavior of mapreduce is bad #49042

Open

vtjnash reviewed Oct 30, 2023

View reviewed changes

Merge branch 'master' into extrema-docs

3b6df79

fingolfin changed the title ~~Clarify the possible uses of theinit keyword in minimum, maximum and extrema~~ Clarify the possible uses of the init keyword in minimum, maximum and extrema Feb 8, 2024

vtjnash mentioned this pull request Mar 27, 2024

Change init's role in reduce-like functions: remove "neutral element" restriction and guarantee its use #53871

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the possible uses of the `init` keyword in `minimum`, `maximum` and `extrema` #44819

Clarify the possible uses of the `init` keyword in `minimum`, `maximum` and `extrema` #44819

greimel commented Apr 1, 2022

tkf left a comment •

edited

Loading

N5N3 commented Apr 1, 2022 •

edited

Loading

tkf commented Apr 1, 2022

N5N3 commented Apr 1, 2022 •

edited

Loading

greimel commented Apr 4, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022 •

edited

Loading

greimel commented Apr 5, 2022 •

edited

Loading

greimel commented Apr 5, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022

vtjnash Oct 30, 2023

nlw0 Oct 31, 2023

vtjnash Oct 31, 2023

adienes commented Sep 11, 2024

Clarify the possible uses of the init keyword in minimum, maximum and extrema #44819

Are you sure you want to change the base?

Clarify the possible uses of the init keyword in minimum, maximum and extrema #44819

Conversation

greimel commented Apr 1, 2022

tkf left a comment • edited Loading

Choose a reason for hiding this comment

N5N3 commented Apr 1, 2022 • edited Loading

tkf commented Apr 1, 2022

N5N3 commented Apr 1, 2022 • edited Loading

greimel commented Apr 4, 2022 • edited Loading

nlw0 commented Apr 5, 2022 • edited Loading

greimel commented Apr 5, 2022 • edited Loading

greimel commented Apr 5, 2022 • edited Loading

nlw0 commented Apr 5, 2022 • edited Loading

nlw0 commented Apr 5, 2022 • edited Loading

nlw0 commented Apr 5, 2022

vtjnash Oct 30, 2023

Choose a reason for hiding this comment

nlw0 Oct 31, 2023

Choose a reason for hiding this comment

vtjnash Oct 31, 2023

Choose a reason for hiding this comment

adienes commented Sep 11, 2024

Clarify the possible uses of the `init` keyword in `minimum`, `maximum` and `extrema` #44819

Clarify the possible uses of the `init` keyword in `minimum`, `maximum` and `extrema` #44819

tkf left a comment •

edited

Loading

N5N3 commented Apr 1, 2022 •

edited

Loading

N5N3 commented Apr 1, 2022 •

edited

Loading

greimel commented Apr 4, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022 •

edited

Loading

greimel commented Apr 5, 2022 •

edited

Loading

greimel commented Apr 5, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022 •

edited

Loading

nlw0 commented Apr 5, 2022 •

edited

Loading