Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename select* functions to partialsort* and various related fixes #23051

Merged
merged 3 commits into from
Aug 27, 2017

Conversation

nalimilan
Copy link
Member

The rename (first commit) fixes #22791. The second commit moves docstrings out of helpdb. The third commit fixes what appears as a documentation bug.

The fourth one changes selectperm!/partialsortperm! to return a view rather than a copy to avoid an allocation (which is really silly if the number of elements to sort is relatively large). Another possibility would be to return the input array just like sortperm! does, since the caller has enough information to extract the relevant elements. But that doesn't sound useful, and the cost of creating a view is quite small.

I've kept the commits separate to make it easier to distinguish the actual changes from mere renaming and moving operations. They don't need to be squashed when merging since they should all pass the tests.

Closes #22791.

@ararslan ararslan added the deprecation This change introduces or involves a deprecation label Jul 31, 2017
base/sort.jl Outdated
partialsortperm!(ix, v, k, [alg=<algorithm>,] [by=<transform>,] [lt=<comparison>,] [rev=false,] [initialized=false])

Like [`partialsortperm`](@ref), but accepts a preallocated index vector `ix`. If `initialized` is `false`
(the default), ix is initialized to contain the values `1:length(ix)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code highlight ix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed now.

@kmsquire
Copy link
Member

kmsquire commented Aug 1, 2017

I'm wondering about the status of using named tuples as part of dispatch (#22194)?

partialsort is reasonable as a name, but partialsortperm is starting to get a little unwieldy, in my opinion. I think this functionality would be a bit nicer as a keyword argument to sort / sortperm (and friends):

sort(x, partial=1:5)

or even

sort(x, select=1:5)

Thoughts?

@nalimilan
Copy link
Member Author

Actually, if we want to merge select into sort, we could just ask users to pass PartialQuickSort(k) as alg. A partial keyword argument would also work. A possible drawback is that it would only work with alg=QuickSort (i.e. replacing it with PartialQuickSort under the hood), and I'm not sure whether partial sorting variants exist for other sorting algorithms.

@kmsquire
Copy link
Member

kmsquire commented Aug 2, 2017

I like the idea of a partial keyword, even though it would only work with quicksort.

@nalimilan
Copy link
Member Author

Thinking about it again, a partial keyword argument would have the additional drawback that the algorithm type would be computed at runtime, so sort/sortperm would always have to go via dynamic dispatch. That would incur a performance penalty even when not passing partial=k explicitly. This could be problematic when sorting small arrays.

So I'd rather go with separate partialsort and partialsortperm functions, or remove the functions and require people to pass PartialQuickSort(k) via the alg argument.

Thoughts?

base/sort.jl Outdated
same thing as `partialsort!` but leaving `v` unmodified.
"""
partialsort(v::AbstractVector, k::Union{Int,OrdinalRange}; kws...) =
copy(partialsort!(copymutable(v), k; kws...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the call to copy. In the k isa Integer case, it means the sorted objects need to have copy methods, which should not be required to sort them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. OTOH it feels a bit weird to return a view of something which the user never had access to. But it doesn't really hurt either, and that's the most efficient solution (the user can still can copy manually if needed). I've updated the PR to drop copy.

@kmsquire
Copy link
Member

kmsquire commented Aug 5, 2017

Since alg is a keyword argument, isn't dispatch done at runtime anyway?

@nalimilan
Copy link
Member Author

Since alg is a keyword argument, isn't dispatch done at runtime anyway?

Yes, currently that's how it works, but I think there are plans to change this. @JeffBezanson can confirm or infirm that.

@JeffBezanson
Copy link
Member

I think sorting and partial sorting are different enough to warrant separate functions.

@kmsquire
Copy link
Member

kmsquire commented Aug 7, 2017

sort and partialsort are fine--it's partialsortperm that I was (originally) objecting to when I suggested merging the functions.

@kmsquire
Copy link
Member

kmsquire commented Aug 7, 2017

But anyway, I don't have any further ideas right now, so if others are fine with changing selectperm to partialsortperm, go for it.

@nalimilan nalimilan force-pushed the nl/select branch 2 times, most recently from 61b2f49 to a0ac7d1 Compare August 7, 2017 10:25
@@ -1632,6 +1632,12 @@ function SymTridiagonal(dv::AbstractVector{T}, ev::AbstractVector{S}) where {T,S
SymTridiagonal(convert(Vector{R}, dv), convert(Vector{R}, ev))
end

# issue #22791
@deprecate_binding select partialsort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you only need @deprecate_binding for types/modules? isn't @deprecate good enough for functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure what's the difference. Since the function itself is going to be removed (not only some methods), I figured deprecating the binding makes sense. Are there any unintended consequences?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get different stack traces, and to some extent different information:

julia> foo() = 1;

julia> Base.@deprecate_binding bar foo

julia> bar()
WARNING: Main.bar is deprecated, use Main.foo instead.
  likely near no file:0
1

vs

julia> foo() = 1;

julia> Base.@deprecate bar foo
:bar

julia> bar()
WARNING: bar is deprecated, use foo instead.
Stacktrace:
 [1] depwarn at ./deprecated.jl:70 [inlined]
 [2] bar() at ./deprecated.jl:31
 [3] eval(::Module, ::Expr) at ./repl/REPL.jl:3
 [4] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./repl/REPL.jl:69
 [5] macro expansion at ./repl/REPL.jl:100 [inlined]
 [6] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
while loading no file, in expression starting on line 0
1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't know the reason for this difference but that's enough to simply use @deprecate. PR updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecate_binding works at the C level and is able to detect more uses like in dispatch constraints of other methods (ref #11200), deprecate only detects calls, but that should be good enough for methods

@StefanKarpinski
Copy link
Member

partialsortperm is a pretty ugly name, but it's a very specific function, so I'm fine with it.

inds = indices(v, 1)
sort!(v, first(inds), last(inds), PartialQuickSort(k), o)
v[k]

if k isa Integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps using dispatch for this logic would be better than branching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'd agree with you, but in the present case that would force duplicating the two lines above, that's why I chose the if approach.

BTW, there seems to be a lack in the API: with getindex, you can either get an array or a scalar, but with view/@view you always get a subarray. Shouldn't there be a way to get the same behavior as getindex? Then one wouldn't have to check types manually like this (which could be hard or impossible in more complex cases where the index can be of any type.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not

function partialsort!(v::AbstractVector, k::Union{Int,OrdinalRange}, o::Ordering)
    inds = indices(v, 1)
    sort!(v, first(inds), last(inds), PartialQuickSort(k), o)
    return _dispatchftw(v, k)
end
_dispatchftw(v, k::Integer) = v[k]
_dispatchftw(v, k) = view(v, k)

do the trick? Could even reuse the helper functions for the same purpose below :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would work, but I don't find it clearer than the existing version, do you? To understand what the code does you need to look at the definitions of methods for _dispatchftw, which does not have a generic meaning on its own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would work, but I don't find it clearer than the existing version, do you?

Noting that my subjective evaluation is practically meaningless, yes, I prefer this style :).

Perhaps a more meaningful question would be whether one or the approach is friendlier to the compiler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is always the third option, with the same lines of code:

function partialsort!(v::AbstractVector, k::Int, o::Ordering)
    inds = indices(v, 1)
    sort!(v, first(inds), last(inds), PartialQuickSort(k), o)
    return v[k]
end
function partialsort!(v::AbstractVector, k::OrdinalRange, o::Ordering)
    inds = indices(v, 1)
    sort!(v, first(inds), last(inds), PartialQuickSort(k), o)
    return view(v, k)
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The harsh reality is, not all lines of code are equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm voting for status quo or perhaps change it to using the ternary operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the current version then. I've rebased again, let's not wait for too long as there were already substantial conflicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @views seems to do the trick. See #23760.

@@ -629,7 +709,12 @@ function selectperm!(ix::AbstractVector{<:Integer}, v::AbstractVector,

# do partial quicksort
sort!(ix, PartialQuickSort(k), Perm(ord(lt, by, rev, order), v))
return ix[k]

if k isa Integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, perhaps using dispatch for this purpose would be better than branching?

@nalimilan
Copy link
Member Author

I'll merge tomorrow unless somebody objects.

The new name is more explicit, more consistent with sort and with one of the
most commonly used names for this operation (from the C++ stdlib). It also
does not conflict with other meanings e.g. for POSIX sockets and SQL.
…a copy

Returning a copy (partially) defeats the purpose of these functions, which is
to avoid allocations.
partialsort(x, k::Integer), partialsortperm(x, k::Integer) and partialsortperm!(x, k::Integer)
all return a single value. Also add missing backticks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename select
10 participants