Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort: make sorting easier, add Slice, SliceStable, SliceIsSorted, reflect.Swapper #16721

Closed
bradfitz opened this issue Aug 16, 2016 · 158 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Aug 16, 2016

EDIT: the proposal has changed later in this thread. See #16721 (comment) and #16721 (comment)

I propose we make sorting slices a first class concept in Go 1.8.

I think everybody appreciates the cute sort.Interface type at this point, but the reality is that:

  • nobody really sorts anything except slices (the handful of exceptions can keep doing what they're doing)
  • the Len and Swap functions are redundant. They're always the same for slices.
  • naming a new type (e.g. "widgetsByName") is tedious to many (including me; I still loathe it after 6+ years)
  • making a new type and three methods at the top-level (because it has to have methods, so it can't be a local type) is more difficult to read, since I'd prefer my Less function inline with my code, in context.

I think we should add to package sort at least:

package sort

// Slice sorts the provided slice using the function less.
// The sort is not stable.
// If slice is not a slice, Slice panics.
func Slice(slice interface{}, less func(i, j int) bool)

And maybe at the same time, or in the future:

// SliceSorter returns an sorter for the provided slice, suitable
// for with Reverse or Stable.
// If slice is not a slice, SliceSorter panics.
func SliceSorter(slice interface{}, less func(i, j int) bool) Interface

The assumption is that the user's less function would close over the data to be sorted.

For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.

I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard for users.

/cc @griesemer @robpike @ianlancetaylor @adg @broady

@bradfitz bradfitz added this to the Proposal milestone Aug 16, 2016
@griesemer
Copy link
Contributor

I'd be ok with something like this. It's pragmatic, small, and probably would reduce a lot of sort-related boilerplate.

@cznic
Copy link
Contributor

cznic commented Aug 16, 2016

I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard to users.

I don't have access to the mentioned above dataset, but let me guess: The need-to-be-sorted type declarations and its associated methods declarations constitutes less than 1‰ of code in that corpus. Perhaps far less. If that is true then I don't think this proposal is justified. Also, using reflection in sorting, however small it is, feels to me like encouraging bad practices.

@bradfitz
Copy link
Contributor Author

@cznic, you do have access: https://github.com/blog/2201-making-open-source-data-more-available

We'll have to disagree on the metrics at which this is subjectively worthwhile.

@josharian josharian changed the title Proposal: first-class support for sorting slices proposal: first-class support for sorting slices Aug 16, 2016
@atdiar
Copy link

atdiar commented Aug 16, 2016

No opinion on this.

Only thing I would say is that, sorting a slice is not exactly to be seen as simply sorting an array.
Reason being that if a slice has multiple subslices, sorting one slice will somehow change the others (in a deep value kind of way).

Or should we decide that sorting a slice returns a completely new slice ? (and thus allocate ?)

Just a data point. (I guess you discussed a similar issue when adding the append function but, just to make sure).

@mackross
Copy link

mackross commented Aug 16, 2016

As the primary force behind golang at our organization, this and no generic max/min function are the two things I find embarrassingly hard to explain. I'm not sure what it's like at Shoreline Blvd but without fail when an engineer joins our team their mouth drops through the floor at their first sort. Subjectively, I would love this in 1.8.

@ianlancetaylor
Copy link
Contributor

I'm fine with adding something to the sort package, although of course func (i, j int) bool is not the function signature that most people expect.

@bradfitz
Copy link
Contributor Author

bradfitz commented Aug 16, 2016

@ianlancetaylor, the benefit of i, j int is that it lets you get at your []T's T or *T, if the values are too large to efficiently copy. And it fits with the existing Interface.Less signature. Plus it's not like we can do better, can we?

@ianlancetaylor
Copy link
Contributor

Oh, sure, I understand why you suggested it. I'm just commenting that it will still surprise people.

@jimmyfrasche
Copy link
Member

As much as this is complained about and as many slice types as I have sorted this has never bothered me, so I don't see the point. Boilerplate is never fun, but this pittance is far below the threshold of annoyance.

A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.

@ainar-g
Copy link
Contributor

ainar-g commented Aug 16, 2016

I'm with @jimmyfrasche here. Why can't this be a go generateable tool?

@natefinch
Copy link
Contributor

If it's basically as fast as sort.Inferface, and is just an addition to the stdlib, not a language change.... why would anyone ever say no to this? Yes, it's annoying that it has to use interface{} that we try not to encourage people to use... but what other choice do we have? No one has come up with a better suggestion so far.

Put me down for a +1.

@artursapek
Copy link

How would SliceSorter work? Wouldn't it have to define a new type, and return it? That's not possible in Go afaik.

@pbnjay
Copy link
Contributor

pbnjay commented Aug 16, 2016

I'd prefer usage of a []interface{} parameter (which would make the reflection unnecessary), but since that doesn't intuitively work on all slice types I know we're stuck with the proposed signature. +1 for saving the headaches anyway though.

To the go generate comments: It's still possible to use that method, this is only adding to the stdlib. But for newbies and rapid development use, go generate can be just as much of a pain...

@bradfitz
Copy link
Contributor Author

How would SliceSorter work?

It would be a private type implementing sort.Interface. It would return Interface, per the example above.

@riannucci
Copy link

@pbnjay I don't think []interface{} would be very convenient; it's not possible to go from e.g. []*MyThing -> []interface{} without allocating a new slice and copying every element (which would be way more annoying than writing out the Less/Len/Swap).

It's a bit unfortunate that it has to be func(i, j int) bool though, because frequently you'd have MyThing.Less(other *MyThing) bool, which means that you'd end up doing:

sort.Slice(mySlice, func(i, j int) bool { return mySlice[i].Less(mySlice[j]) })

But oh well :). I'd still take that over declaring a new type+3 methods just to be able to sort stuff.

@pbnjay
Copy link
Contributor

pbnjay commented Aug 16, 2016

@riannucci yup - that's what i meant by "doesn't intuitively work on all slice types" - maybe in Go 2 we can get a generic slice-interface type!

@bradfitz
Copy link
Contributor Author

@riannucci, in my experience, the Less method doesn't already exist somewhere. This is for ad-hoc sorts where you want to write the Less function inline, in context with your existing code:

   sort.Slice(people, func(i, j int) bool { return people[i].Name < people[j].Name })

@freeformz
Copy link

freeformz commented Aug 16, 2016

How about func Slice(slice interface{}, l int, less func(i, j int) bool), where l is the length up to which the slice is sorted. This means the common usage would just be calling Slice with len([]T), but saves the reflect. I'd wager it's possibly not worth it?

Edit: Nevermind, my bad, there will still be reflection.

@bradfitz
Copy link
Contributor Author

@freeformz, you already have to do the reflect a bit anyway, so it doesn't save you anything to pass the length explicitly.

@myitcv
Copy link
Member

myitcv commented Aug 16, 2016

Following on from @jimmyfrasche's comment (but my follow up may indeed be the subject of another issue/a separate go-nuts thread)

A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.

On the go generate suggestion.

This raises the question of whether to distribute core, oft-used programs that are go generate-ers.

As things stand, every go generate-er needs to be retrieved via go get (or some other means). If instead some core go generate-ers were distributed as part of a release then:

  • no further go get is required to do core things (like slice sorting)
  • we encourage people towards go generate
  • we avoid any reflection (per @cznic)
  • ...

There are of course some downsides here:

  • how to name these core-generators (to be relatively sure of avoiding PATH clashes with existing executable programs, we might need rather verbose names like goGenSort)
  • another step in the process of learning Go
  • ...

@klingtnet
Copy link

@mackross +1 for the missing max/min but Go also lacks an absolute value function for integers, which is also hard to explain.

@zephyr
Copy link

zephyr commented Aug 16, 2016

Shouldn’t there also be a stable counterpart? Maybe even with a variadic signature like

func SliceStable(slice interface{}, less ...func(i, j int) bool)

so that one could do

SliceStable(people, byName, byAge)

to sort people first by name and else (when they have the same name/are equivalent) by age?

@jimmyfrasche
Copy link
Member

This only simplifies a trivial unimportant amount of code so I don't see it as more than, at best, a minor annoyance, regardless of frequency.

I'm sure Brad's implementation is a negligible hit on performance compared to the current way, but it would have to be used in function bodies to close over the slice like:

func something() {
  //...
  sort.Slice(things, func(i, j int) bool) {
     //...
   })
  //...
}

so you have to pay the cost of allocating the closure and building the generic sorter on each invocation of something, when you don't have to pay anything defining the methods yourself, other than the usual indirections interfaces involve. That's going to be negligible to the cost of sorting anything for all but the smallest of n, and, if it shows up in profiling, it's simple enough to rewrite it the old way, but this sort of code is generally discouraged by the stdlib.

If this is going for pure convenience, I'd prefer the version @ianlancetaylor hinted at where the comparator is an interface containing a func(a, b T) bool (though the runtime hit from that would certainly be less negligible, even modulo potential copying overhead, and it's even further afield of encouraged practice).

Maybe I'm just uncomfortable that this is at a weird level between convenient and practical that results in this awkward situation where you have to close over a slice that's being permuted under the covers.

All that said, if this goes in, I'll use it—I'm lazy and hypocritical!

I don't have any problem with this existing—it's the inclusion in the stdlib that gives me pause. I have absolutely zero reservations about this being a go gettable library: people could even start using it today. The excellent go4.org or even golang.org/x seem like fine places to put this, and if it gets used a lot without issues moving it to the stdlib would be fine with me.

@twotwotwo
Copy link

Taking a key func that's allowed to return one of a list of types ((u)int, []byte, string) makes it look a bit more like sort functions in other languages that take a key func, and would let you later drop in faster sorts specific to the key types without adding APIs. Lots of downsides make that unpalatable (keyFunc'd be an interface{} in the signature, trickier internally, overhead of Less calling keyFunc twice, and can't write a custom Less to sort by two fields or whatever). Maybe someone sees a better variation of that idea that I don't :) so throwing it out there anyhow.

@bradfitz
Copy link
Contributor Author

@jimmyfrasche,

so you have to pay the cost of allocating the closure

Even today with sort.Sort you have to pay the cost of putting a slice into an interface value, since a slice (1 pointer & two ints) is not 0 or 1 pointers (which is all that goes into an interface allocation-free). So it's not significantly different.

I'd prefer to keep this issue focused on the proposal and not the implementation, though.

@broady
Copy link
Contributor

broady commented Aug 16, 2016

When I read the title, I thought you were going to propose something like a built-in:

var s []t
sort(s, func(a, b t) bool {
  return a < b
})

But this is certainly more pragmatic. Saying that 1% of programs touch sort isn't a very good argument against including this. This is scoped to the sort package, so programs that aren't sorting won't care about it. Even more reason to include it.

@rsc
Copy link
Contributor

rsc commented Sep 30, 2016

@mibk From a clean slate, maybe, but we already have IntsAreSorted, Float64sAreSorted, StringsAreSorted, not AreIntsSorted etc.

@bradfitz
Copy link
Contributor Author

Exposing reflect.Swapper may be a necessary evil. That's OK. Better to require other Go implementations to implement/port reflect.Swapper than to port unsafe code tucked into package sort.

So you're cool with 4 API additions: reflect.Swapper, sort.Slice{,Stable,IsSorted}?

@rsc
Copy link
Contributor

rsc commented Sep 30, 2016

So you're cool with 4 API additions: reflect.Swapper, sort.Slice{,Stable,IsSorted}?

SGTM

@bradfitz bradfitz self-assigned this Sep 30, 2016
@bradfitz bradfitz modified the milestones: Go1.8, Proposal Sep 30, 2016
@bradfitz bradfitz changed the title proposal: sort: make sorting easier sort: make sorting easier, add Slice, SliceStable, SliceIsSorted, reflect.Swapper Sep 30, 2016
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/30088 mentions this issue.

@rsc rsc added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 30, 2016
gopherbot pushed a commit that referenced this issue Sep 30, 2016
Swapper returns a func that swaps two elements in a slice.

Updates #16721

Change-Id: I7f2287a675c10a05019e02b7d62fb870af31216f
Reviewed-on: https://go-review.googlesource.com/30088
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@extemporalgenome
Copy link
Contributor

extemporalgenome commented Oct 1, 2016

@rsc considering that external packages may accept sort.Interface values, such as container/heap and presumably a subset of https://godoc.org/sort?importers (including a few of my own packages), I'd like to counter-argue that having no mechanism for creating a sort.Interface will be surprising and disappointing.

Even in the absence of such concerns, I'd like to argue that the proliferation of parallel functions (SliceStable) to be unfortunate -- I appreciate sort.StringSlice and friends, but see no justifiable benefit to sort.Strings and friends. Such functions bloat the package, saving users only a handful of keystrokes.

I feel that this may just boil down to a naming problem -- with a clearer name than MakeInterface, the usage of an interface builder may well become obvious.

@suyash
Copy link
Contributor

suyash commented Oct 3, 2016

sorry for bringing this up if this was resolved somewhere else, but any particular reason the sort.Search like syntax was not preferred?

I think that the fundamental argument of making reflect.Swapper function public is to allow users the option of using it or not, otherwise it can just be implemented inside the sort package itself.

another point is that getting the length of slice directly would probably be faster than getting it through reflection, though not sure by how much.

also, all the previous discussion about not encouraging generic programming through interface{}

@rsc
Copy link
Contributor

rsc commented Oct 3, 2016

@extemporalgenome It's trivial to write a converter from function triple to interface if someone needs that. That's true already today.

type funcs struct { n int; swap func(i, j int); less func(i, j int) bool }
func (x funcs) Len() int { return x.n }
func (x funcs) Swap(i, j int) { x.swap(i, j) }
func (x funcs) Less(i, j int) bool { return x.less(i, j) }
func FuncInterface(n int, swap func(i, j int), less func(i, j int) bool) Interface {
    return &funcs{n, swap, less}
}

If there is evidence that this comes up often in real programs, we can easily add it later. But we'd need to see that evidence first.

@suyash I assume that by sort.Search-like syntax you mean code like

sort.With(len(s), func(i, j int) { s[i], s[j] = s[j], s[i] }, func(i, j int) bool {
    if s[i].Foo != s[j].Foo {
        return s[i].Foo < s[j].Foo
    }
    return s[i].Bar < s[j].Bar
})

as in the comment above (Aug 21). The reason not to prefer this is that this entire proposal is about making sort easier to use. The above is not substantially less code than you have to write today. It does avoid the new type, but it is still full of boilerplate.

Compare to:

sort.Slice(s, func(i, j int) bool {
    if s[i].Foo != s[j].Foo {
        return s[i].Foo < s[j].Foo
    }
    return s[i].Bar < s[j].Bar
})

The new API is meant to be a helper, so it should be as helpful as possible - with as little boilerplate as possible - without restricting the scope of its help. My guess is that >99% of sorting happens on slices, in which case the generality added by the boilerplate serves no useful purpose. If someone has evidence otherwise, I'd be happy to see it.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/30250 mentions this issue.

@suyash
Copy link
Contributor

suyash commented Oct 4, 2016

@rsc lets take an example that comes up fairly often (for me), lets say you want to sort a set of numbers but you also want to track the original positions of the numbers.

So you had a slice like []int{5, 2, 3, 1, 4} and you sorted it to get []int{1, 2, 3, 4, 5} but at the same time, you also wanted to track positions, so in a separate slice you would want []int{3, 1, 2, 4, 0}.

With this addition, you would separately declare a pair type and a new pair array and copy original array's contents into the new array and initialize positions and then sort them, while defining the appropriate less closure.Something like

a := []int{...}

type pair struct {
    first, second int
}

p := make([]pair, len(a))
for i := 0 ; i < len(a) ; i++ {
    p[i] = pair{a[i], i}
}

sort.Slice(p, func (i, j int) bool {
    return p[i].first < p[j].first || (p[i].first == p[j].first && p[i].second < p[j].second)
})

while if we had the option to define a swapper, we can just define another slice, initialize that to positions (0, 1, 2...) and in the swap, swap in both slices.

something like

a := []int{...}

pos := make([]int, len(a))
for i := 0 ; i < len(a) ; i++ {
    pos[i] = i
}

sort.Slice(len(a), func(i, j int) {
    a[i], a[j] = a[j], a[i]
    pos[i], pos[j] = pos[j], pos[i]
}, func(i, j int) bool {
    return a[i] < a[j] || (a[i] == a[j] && pos[i] < pos[j])
})
  1. The first reason I am advocating the use of sort.Search like semantics is because sort.Search itself is defined only for slices, but the best part about it is that it can do 6 different operations simply by passing the appropriate closure (https://go-review.googlesource.com/#/c/29131/6/src/sort/example_search_test.go@58). One issue that comes up with doing things differently with sort.Slice is the difference from sort.Search in terms of semantics.

  2. More convenience (for me) means more flexibility, then less code.

  3. Again, the reason reflect.Swapper came up in the first place was to support the general case, so you could do

    sort.Slice(len(a), reflect.Swapper(a), func(i, j int) { ... })
    

    otherwise I don't think there was any need to have it as a public library function.

  4. another thing that came up in this discussion was the facilities provided by modern text editors
    w.r.t boilerplate code. editors these days have macros and snippets, so you can configure them to
    expand things, like you can define a swapper snippet that on a tabstop expands to func(i, j int) { a[i], a[j] = a[j], a[i] }

  5. @bradfitz raised a point about purity of the function irrespective of type errors (sort: make sorting easier, add Slice, SliceStable, SliceIsSorted, reflect.Swapper #16721 (comment))

  6. Again, and I don't know how much impact it will have, but getting the slice length directly will probably be faster than getting it through reflection

  7. You might think that something like sort.MakeInterface solves this, but @bradfitz himself did genzfunc.go for performance, the results of which are in the commit message at https://go-review.googlesource.com/#/c/27321/11. Also in case you didn't see this: https://github.com/nieksand/sortgenerics https://news.ycombinator.com/item?id=12602456

@ianlancetaylor
Copy link
Contributor

@suyash For your example you could use rsc's FuncInterface function above. It's marginally more typing for you, and no additional API in the standard library. To get this into the standard library there needs to be clear and common use cases.

gopherbot pushed a commit that referenced this issue Oct 4, 2016
I avoided anywhere in the compiler or things which might be used by
the compiler in the future, since they need to build with Go 1.4.

I also avoided anywhere where there was no benefit to changing it.

I probably missed some.

Updates #16721

Change-Id: Ib3c895ff475c6dec2d4322393faaf8cb6a6d4956
Reviewed-on: https://go-review.googlesource.com/30250
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
@rsc
Copy link
Contributor

rsc commented Oct 4, 2016

@suyash What Ian said, but also "text editors can macro-generate your code for you" is a particularly bad rationale for an API decision. Code has to be read and maintained, not just written. Also, if you are OK with text editor macros, one could generate the old, more flexible sort.Interface type+methods for you if you insist.

The time it takes to compute the slice length is not going to show up.

chromium-infra-bot pushed a commit to luci/luci-go that referenced this issue Jan 25, 2017
Inspired by comments on golang/go#16721.

This will be particularly handy when we get to use a version of go (1.8) which
has `sort.Slice`. But in the mean time, it's at least clearer and quicker than
hand-rolling Less methods (and potentially safer too!).

R=dnj@chromium.org, nodir@chromium.org, vadimsh@chromium.org
BUG=

Review-Url: https://codereview.chromium.org/2657733002
snsinfu added a commit to snsinfu/learn-go that referenced this issue May 13, 2017
Initial dumb implementation.

- CPU profiling. To see the result, run 07-magic_square, execute
  `go tool pprof 07-magic_square path/to/cpu.pprof` and input
  `top[Enter]` in the prompt.

- Generic algorithm. The trick is the Float64Slice adaptor; see [1].
  Maybe I will use the technique used in the sort library[2][3]
  which allows simpler API.

[1]: http://stackoverflow.com/a/6256127/5266681
[2]: golang/go#16721
[3]: golang/go@22a2bdf
@golang golang locked and limited conversation to collaborators Oct 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted
Projects
None yet
Development

No branches or pull requests