add `sort(f <: Base.Callable, x)` method #38443

pdeffebach · 2020-11-14T23:00:52Z

I just did a quick search on github of for other issues asking for this feature. I didn't find any, so I'm filing this issue.

Given we have mean(f, x), count(f, x) etc. It would be nice if sort(f, x) also worked.

Currently I can use sort(x, by = f). But for convenience and consistency this method would be nice.

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2020-11-15T00:17:40Z

The counterpoint to this is that both by and lt are possible function arguments, but I realized that we can actually look at the function that's passed in and see if it has two or one argument methods and determine whether it's a transformation or a comparison function based on that. So 👍 would be a nice syntax.

pdeffebach · 2020-11-15T01:12:17Z

I would say that with a docstring + the existing convention of mean(f, x) etc. Having the first argument just refer to by would probably be fine. But your approach also makes sense

jw3126 · 2020-11-15T08:34:03Z

The counterpoint to this is that both by and lt are possible function arguments, but I realized that we can actually look at the function that's passed in and see if it has two or one argument methods and determine whether it's a transformation or a comparison function based on that. So 👍 would be a nice syntax.

I think a problem with this is that functions can have methods for both single and two arguments.

mcabbott · 2020-11-15T08:37:45Z

Is the proposal to make this equal to sort(x, by=f), or to sort(map(f, x))? I would have guessed the latter, since I thought the rule was that outer(f, data) is always a shortcut for outer(f.(data)). This is true for mean/maximum/unique, which otherwise return something the same eltype as the input, and for all/count/findall which don't.

Edit: The issue I was thinking of is #27613. The new method there findmax(f, domain) -> (f(x), x) doesn't precisely follow the broadcast rule, since the one-argument findmax(f.(domain)) -> (f(x), i) regards its domain as being the indices. But it does apply f to the data in what's returned, rather than (say) using it to decide the largest, and then discarding what f(x). That's the behaviour I'd expect from findmax(domain; by=f) -> (x, i).

KristofferC · 2020-11-15T10:47:43Z

but I realized that we can actually look at the function that's passed in and see if it has two or one argument methods and determine whether it's a transformation or a comparison function based on that.

👎 to that. That type of function reflection should not be used for a public API imo. Just trying to document this becomes kind of awkward:

If the function only has methods that accept one argument then it means this thing, if the function only has methods that accept two arguments, then it means this completely other thing, if the function has methods that accept a mixed number of arguments, then it errors...

pdeffebach · 2020-11-15T14:58:16Z

Is the proposal to make this equal to sort(x, by=f), or to sort(map(f, x))? I would have guessed the latter, since I thought the rule was that outer(f, data) is always a shortcut for outer(f.(data)). This is true for mean/maximum/unique, which otherwise return something the same eltype as the input, and for all/count/findall which don't.

I'm proposing to make it equal to sort(x, by = f). I get what you mean about sort(map(f, x)) though. That's a valid point.

mcabbott · 2020-11-15T16:23:46Z

Turns out I'm wrong about how unique acts, and surprised. (It's not new, added in 2015.) What is the rule here?

julia> rn = rand(Complex{Int8}, 5);

julia> sort(rn, by=real)  # very clear
5-element Vector{Complex{Int8}}:
 -110 + 5im
   42 - 46im
   58 + 62im
   65 + 31im
  110 - 124im

julia> unique(real, rn)  # not what I expected
5-element Vector{Complex{Int8}}:
   65 + 31im
 -110 + 5im
   42 - 46im
   58 + 62im
  110 - 124im

julia> Base.sort(f, itr) = sort!(collect(f(x) for x in itr))  # what I expected

julia> sort(real, rn)
5-element Vector{Int8}:
 -110
   42
   58
   65
  110

JeffBezanson · 2020-11-17T22:30:30Z

I think in general the by= behavior is more useful, since the "mapping" behavior is available just by calling map first. maximum is a bit of an exception to that, which would normally be inconsistent but is justified by argmax being a different function.

Anyway I'm against adding this since it would be a spurious variation on existing functionality. This discussion itself shows that sort(f, a) is not really clear.

StefanKarpinski added arrays [a, r, r, a, y, s] feature Indicates new feature / enhancement requests labels Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `sort(f <: Base.Callable, x)` method #38443

add `sort(f <: Base.Callable, x)` method #38443

pdeffebach commented Nov 14, 2020

StefanKarpinski commented Nov 15, 2020

pdeffebach commented Nov 15, 2020

jw3126 commented Nov 15, 2020

mcabbott commented Nov 15, 2020 •

edited

Loading

KristofferC commented Nov 15, 2020

pdeffebach commented Nov 15, 2020

mcabbott commented Nov 15, 2020 •

edited

Loading

JeffBezanson commented Nov 17, 2020

add sort(f <: Base.Callable, x) method #38443

add sort(f <: Base.Callable, x) method #38443

Comments

pdeffebach commented Nov 14, 2020

StefanKarpinski commented Nov 15, 2020

pdeffebach commented Nov 15, 2020

jw3126 commented Nov 15, 2020

mcabbott commented Nov 15, 2020 • edited Loading

KristofferC commented Nov 15, 2020

pdeffebach commented Nov 15, 2020

mcabbott commented Nov 15, 2020 • edited Loading

JeffBezanson commented Nov 17, 2020

add `sort(f <: Base.Callable, x)` method #38443

add `sort(f <: Base.Callable, x)` method #38443

mcabbott commented Nov 15, 2020 •

edited

Loading

mcabbott commented Nov 15, 2020 •

edited

Loading