-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the possible uses of the init
keyword in minimum
, maximum
and extrema
#44819
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a valid use case. For the reduce
-family of API, init
does not mean initial value. It means identity/neutral element. If we were to mention this type of usage, I believe that we should mention that the user has to combine the result using the corresponding binary operator:
x1, x2 = extrema(xs)
y1, y2 = extrema(ys)
min(x1, y1), max(x2, y2)
I believe many users thought If |
This is mainly a mechanism for
An empty collection is the identity element of the free monoid (i.e., |
This make sense. I just realize that assuming |
I don't understand this, sorry. Could you elaborate why my attempt seemed to work in my examples above? What would have to happen so that Given this potential source of confusion, shouldn't |
Completely agree with @tkf. If I may try to help explain, the problem is that there are clear constraints expected from the arguments, but unfortunately it is not easy, or maybe possible, to enforce these constraints with code. I think we can say There seems to be little that can be done other than offering defaults and explaining in the documentation that To be clear, the idea is that this function has a great potential to being parallelized. Or even more than that, the user should not assume that underneath the function we will simply iterate over the list and do In fact, it's easy to see how this is a problem when you consider strings or list concatenation. Should this It might be great, in fact, to have knowledge about whether |
Thanks, @nlw0 for your input. Not sure if I've understood it already. You are saying that one requirement for I see that this is a problem for
But this is not a problem for
What am I missing? EDIT: Does the contract imply that |
And to add a little more confusion, here are two contradicting lines from the The first line Line 430 in bf53498
answers my previous question (no, it's not guaranteed that init is used for non-empty collections). But the second line Lines 454 to 455 in bf53498
relies on its use for a non-empty collection. EDIT: Even worse: this example relies on applying |
The contract implies that min and max is a bit of a weird case. Julia does not specify a default In fact, you won't have a problem with min and max as long as |
The case The docstring is saying that you "can't" use About the function not specifying that |
A short summary, perhaps:
|
other element) as it is unspecified whether `init` is used | ||
for non-empty collections. | ||
for non-empty collections. If `init > maximum(itr)`, return `init`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with making the behavior specified here (c.f. #49042), but the docstring needs to be consistent and not both say it is unspecified behavior and to also specify the behavior here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the current maximum
(and minimum
and maybe elsewhere?) documentation is merely repeating what is said about the init
parameter of mapreduce
. But this advice is only true for mapreduce
in general, and not for the specific cases of maximum
and minimum
. Here we should be able to say init
will indeed be treated as just another element in the input, and will be returned as the output in case of an empty list. It would just be nice to have 1. a confirmation that this is the desired behavior for maximum
, minimum
and mapreduce
and 2. perhaps the ability to prove this is the case looking at the implementation, and what methods are called in the specific case of maximum
and minimum
. I'm unfortunately not too familiar with the implementation, I can't easily make sense of it myself, and I'm not sure where the implementation for these methods diverges compared to other reducing operations (if it diverges at all. is it still unspecified in general for mapreduce?).
In other words, I suggest actually removing the text mentioning anything unspecified for these methods, and only mention the term will be output for an empty list, and will generally be treated as an extra item in the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That behavior is already specified for mapfoldr, mapfoldl, and mapreduce(identity), so it seems reasonable to assume the same for mapreduce(max) and thus maximum as well. But that is the question intended to be answered by #49042 (it looks only like a doc change to me now).
Aside, to be pedantic about this question:
reduce(min, [1,2,3],init=3) sort of follows the monoid laws as well... There's probably a special name for this group
I think this is exactly the same monoid law as the first example. In particular, the init
is supposed to be ranging over the domain of the inputs. So if the input was UInt8
instead, then the init is 0xff
instead. But if the input function generating that array was 2pi*sin%Int
, then the init is arguably 6
, since the domain of that input function is [-6, 6]
. Using typemax
is just a rough approximation of the expected domain in any case.
init
keyword in minimum
, maximum
and extrema
init
keyword in minimum
, maximum
and extrema
probably subsumed / closed by documentation improvements in #53945 |
The init keyword is quite nice because it allows to compute extrema iteratively (e.g. update bounds as new data come in).
This PR documents and tests this use case. See also #43604 (comment)