Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sort(f <: Base.Callable, x) method #38443

Open
pdeffebach opened this issue Nov 14, 2020 · 8 comments
Open

add sort(f <: Base.Callable, x) method #38443

pdeffebach opened this issue Nov 14, 2020 · 8 comments
Labels
arrays [a, r, r, a, y, s] feature Indicates new feature / enhancement requests

Comments

@pdeffebach
Copy link
Contributor

I just did a quick search on github of for other issues asking for this feature. I didn't find any, so I'm filing this issue.

Given we have mean(f, x), count(f, x) etc. It would be nice if sort(f, x) also worked.

Currently I can use sort(x, by = f). But for convenience and consistency this method would be nice.

@StefanKarpinski
Copy link
Member

The counterpoint to this is that both by and lt are possible function arguments, but I realized that we can actually look at the function that's passed in and see if it has two or one argument methods and determine whether it's a transformation or a comparison function based on that. So 👍 would be a nice syntax.

@StefanKarpinski StefanKarpinski added arrays [a, r, r, a, y, s] feature Indicates new feature / enhancement requests labels Nov 15, 2020
@pdeffebach
Copy link
Contributor Author

I would say that with a docstring + the existing convention of mean(f, x) etc. Having the first argument just refer to by would probably be fine. But your approach also makes sense

@jw3126
Copy link
Contributor

jw3126 commented Nov 15, 2020

The counterpoint to this is that both by and lt are possible function arguments, but I realized that we can actually look at the function that's passed in and see if it has two or one argument methods and determine whether it's a transformation or a comparison function based on that. So 👍 would be a nice syntax.

I think a problem with this is that functions can have methods for both single and two arguments.

@mcabbott
Copy link
Contributor

mcabbott commented Nov 15, 2020

Is the proposal to make this equal to sort(x, by=f), or to sort(map(f, x))? I would have guessed the latter, since I thought the rule was that outer(f, data) is always a shortcut for outer(f.(data)). This is true for mean/maximum/unique, which otherwise return something the same eltype as the input, and for all/count/findall which don't.

Edit: The issue I was thinking of is #27613. The new method there findmax(f, domain) -> (f(x), x) doesn't precisely follow the broadcast rule, since the one-argument findmax(f.(domain)) -> (f(x), i) regards its domain as being the indices. But it does apply f to the data in what's returned, rather than (say) using it to decide the largest, and then discarding what f(x). That's the behaviour I'd expect from findmax(domain; by=f) -> (x, i).

@KristofferC
Copy link
Member

but I realized that we can actually look at the function that's passed in and see if it has two or one argument methods and determine whether it's a transformation or a comparison function based on that.

👎 to that. That type of function reflection should not be used for a public API imo. Just trying to document this becomes kind of awkward:

If the function only has methods that accept one argument then it means this thing, if the function only has methods that accept two arguments, then it means this completely other thing, if the function has methods that accept a mixed number of arguments, then it errors...

@pdeffebach
Copy link
Contributor Author

Is the proposal to make this equal to sort(x, by=f), or to sort(map(f, x))? I would have guessed the latter, since I thought the rule was that outer(f, data) is always a shortcut for outer(f.(data)). This is true for mean/maximum/unique, which otherwise return something the same eltype as the input, and for all/count/findall which don't.

I'm proposing to make it equal to sort(x, by = f). I get what you mean about sort(map(f, x)) though. That's a valid point.

@mcabbott
Copy link
Contributor

mcabbott commented Nov 15, 2020

Turns out I'm wrong about how unique acts, and surprised. (It's not new, added in 2015.) What is the rule here?

julia> rn = rand(Complex{Int8}, 5);

julia> sort(rn, by=real)  # very clear
5-element Vector{Complex{Int8}}:
 -110 + 5im
   42 - 46im
   58 + 62im
   65 + 31im
  110 - 124im

julia> unique(real, rn)  # not what I expected
5-element Vector{Complex{Int8}}:
   65 + 31im
 -110 + 5im
   42 - 46im
   58 + 62im
  110 - 124im

julia> Base.sort(f, itr) = sort!(collect(f(x) for x in itr))  # what I expected

julia> sort(real, rn)
5-element Vector{Int8}:
 -110
   42
   58
   65
  110

@JeffBezanson
Copy link
Member

I think in general the by= behavior is more useful, since the "mapping" behavior is available just by calling map first. maximum is a bit of an exception to that, which would normally be inconsistent but is justified by argmax being a different function.

Anyway I'm against adding this since it would be a spurious variation on existing functionality. This discussion itself shows that sort(f, a) is not really clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests

6 participants