Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stable sort implementation to Slice and Array #10163

Merged
merged 5 commits into from
Jul 24, 2021

Conversation

makenowjust
Copy link
Contributor

@makenowjust makenowjust commented Dec 31, 2020

Fixed #6057

The stable sort algorithm is ported from Rust (it is called merge_sort in Rust, even though it is extended.)
It is similar to TimSort, but galloping is dropped.
Of course, it holds good properties of TimeSort.
For example, sorting is done in linear time when the collection is already sorted.

This PR adds a stable option to sorting methods (#sort, #sort!, #sort_by and #sort_by!) of Array and Slice, respectively.
The option's default value is false for backward compatibility.

p (0..20).sort_by(stable: false) { |i| i % 2 }
# => [0, 18, 16, 14, 12, 10, 8, 6, 4, 2, 20, 7, 9, 5, 11, 13, 3, 15, 17, 1, 19]

p (0..20).sort_by(stable: true) { |i| i % 2 }
# => [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

It is my last work in 2020. Have a happy new year 🎉

src/slice/sort.cr Show resolved Hide resolved
src/slice/sort.cr Show resolved Hide resolved
Copy link
Contributor

@j8r j8r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'd rename v to other and size to other_size.

Having a variable for this is not even necessary, other.size will fit as well.

src/slice.cr Outdated Show resolved Hide resolved
src/slice.cr Outdated Show resolved Hide resolved
src/slice.cr Outdated Show resolved Hide resolved
@j8r
Copy link
Contributor

j8r commented Dec 31, 2020

I think the protected methods should be tested too, ideally.

@makenowjust
Copy link
Contributor Author

@j8r suggestions are accepted in a450923. Thank you!

I think all works are done now. Please review this!

@yxhuvud
Copy link
Contributor

yxhuvud commented Jan 4, 2021

The option's default value is false for backward compatibility.

Personally I'd not consider going from undefined order to defined order a backwards compatibility breaking change. Though perhaps I'm wrong there. However, I'd argue for that the default would be whatever is fastest. Is there any performance numbers? (performance tests can perhaps be found in the previous change to the sorting).

@asterite
Copy link
Member

asterite commented Jan 4, 2021

For Crystal, like in Ruby, I prefer correct over fast. In fact, if this stable sort is fast enough I would prefer to make stable the default and remove the non-stable option. There's less code and less worries for developers.

@makenowjust
Copy link
Contributor Author

A rough benchmark:

require "benchmark"

N = 100_000

puts "=== random ==="
rand = Random.new(seed: 42)
arr = [] of Int32
N.times { arr << rand.next_int }

Benchmark.ips do |x|
  x.report("stable: true") { arr.sort(stable: true) }
  x.report("stable: false") { arr.sort(stable: false) }
end

puts "=== partially sorted ==="
arr = [] of Int32
4.times do
  (N//4).times { |i| arr << i }
end

Benchmark.ips do |x|
  x.report("stable: true") { arr.sort(stable: true) }
  x.report("stable: false") { arr.sort(stable: false) }
end

And a result on my MacBook Air 2018 is:

=== random ===
 stable: true  93.78  ( 10.66ms) (± 6.09%)  587kB/op   1.57× slower
stable: false 147.20  (  6.79ms) (± 6.21%)  391kB/op        fastest
=== partially sorted ===
 stable: true   1.51k (662.69µs) (± 7.85%)  586kB/op        fastest
stable: false 306.63  (  3.26ms) (± 4.24%)  391kB/op   4.92× slower

I don't have a strong opinion, but I think we can switch to stable: true as the default.

@asterite
Copy link
Member

asterite commented Jan 4, 2021

Wow, the stable algorithm is pretty good! I think we should just leave that as a default.

@asterite
Copy link
Member

asterite commented Jan 4, 2021

Interesting, sort in Ruby is not stable: https://bugs.ruby-lang.org/issues/1089

But looking at the times above, it seems stable sort isn't that much slower... hmmm...

@Fryguy
Copy link
Contributor

Fryguy commented Jan 4, 2021

The option's default value is false for backward compatibility.

I'd be surprised if anyone relied on instability of sorting, so is there even a backward compatibility concern? As others have stated, IMO, default should be stable: true, with stable: false still available for possibly faster alternatives (if there even is a faster alternative).

@makenowjust
Copy link
Contributor Author

Now, stable: true is the default.

@straight-shoota
Copy link
Member

straight-shoota commented Jan 6, 2021

About specs: I would like to avoid quadruple tests for Array#sort, Array#sort!, Slice#sort, Slice#sort!. All delegate to the same implementation and there's no benefit for testing that multiple times. It's just a lot of unnecessary duplication.
I would prefer to have extensive specs only once, for the main implementation Slice#sort!. The other methods can be covered by simple tests for basic functionality.

@makenowjust
Copy link
Contributor Author

@straight-shoota Please open another PR. Your concern can separate from this at all. This PR already contains two fixes (stable sort implementation and switching the default).

@straight-shoota
Copy link
Member

It would be great to include the deduplication in this PR because it makes the spec changes much easier to review. But I can make a separate PR for that and then we merge it into this.

@Sija
Copy link
Contributor

Sija commented Jan 6, 2021

@straight-shoota Or you could include those within this PR, whatcha think @makenowjust?

@asterite
Copy link
Member

asterite commented Jan 6, 2021

I still think we should make stable be the only option for sort. That would mean there won't be duplicate specs.

@straight-shoota
Copy link
Member

That would mean there won't be duplicate specs.

I'm not talking about duplication stable vs. unstable behaviour but duplication between four different methods which all are based on the same implementation.

makenowjust added a commit to makenowjust/crystal that referenced this pull request Jan 6, 2021
@makenowjust
Copy link
Contributor Author

Removing existing specs is an incomprehensible thing in general. A squashed commit is hard to describe the reason. IMHO we should open another PR in this case.

@rdp
Copy link
Contributor

rdp commented Jan 29, 2021

Is anything holding this up, I'd be happy to help.

@straight-shoota
Copy link
Member

Waiting on #10208.

makenowjust added a commit to makenowjust/crystal that referenced this pull request Feb 1, 2021
@bcardiff
Copy link
Member

I would rather wait for 1.1 to avoid surprises like edge-cases, memory footprint. Or at least more heave benchmarks since this is changing the default sort.

As mentioned in #10163 (comment) this can be seen as a non-breaking change so it can arrive after 1.0.

@bcardiff bcardiff removed this from the 1.0.0 milestone Mar 19, 2021
@straight-shoota
Copy link
Member

straight-shoota commented Mar 19, 2021

While not a huge breaking chage, it is surely going to affect code that relies on the unstable sort implementation. Unstable is still deterministic.

Not sure if that's much relevant besides specs that test for specific sort results. But even that should probably be considered a breaking change.

@oprypin
Copy link
Member

oprypin commented Mar 19, 2021

Why not merge it as off by default and later change to on by default

@oprypin
Copy link
Member

oprypin commented Mar 19, 2021

^That way people get to try it out and tell about it, so maybe you won't need to guess whether it's breaks some expectation

@oprypin
Copy link
Member

oprypin commented Mar 19, 2021

Actually that's not even my point.

It's a very useful feature to have, and the argument against merging it is only with the assumption that it becomes the default.

@straight-shoota
Copy link
Member

straight-shoota commented Mar 19, 2021

Why should it not be the default? I'd assume there is a mutual agreement that stable sort should be default (eventually) and if you don't need it but want more performance, you can disable it.

Adding (non-default) stable sort isn't relevant for 1.0. If the default behaviour doesn't change, it can just be introduced in 1.1. But changing default behaviour should better be in 1.0 or wait for 2.0.

@HertzDevil
Copy link
Contributor

Heads up: this can now be applied on top of #10609.

@stakach
Copy link
Contributor

stakach commented May 26, 2021

might be worth merging this sooner rather than later given #10748

@Sija
Copy link
Contributor

Sija commented May 27, 2021

This could be merged already and the remaining optimization using #10609 could be applied in the followup PR - if @makenowjust has no time to do it right now.

makenowjust and others added 4 commits May 28, 2021 02:55
The stable sort algorithm is ported from Rust.
It is similar to TimSort, but galloping is dropped.
Thanks @j8r for suggestion!

Co-authored-by: Julien Reichardt <git@jrei.ch>
And, tests are added.
@makenowjust
Copy link
Contributor Author

#10609 is applied. This is ready to merge now.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already been milestoned for a long time, and I think it's ready for merging.

@makenowjust I would appreciate if you could just merge master instead of rebase the next time. Force pushes make proper reviewing unnecessarily hard.

@@ -166,7 +166,7 @@ class Crystal::Doc::Type
defs << method(def_with_metadata.def, false)
end
end
stable_sort! defs, &.name.downcase
defs.sort_by!(stable: true, &.name.downcase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since stable is default, the argument is technically not necessary. I don't mind keeping it explicit, though.

Copy link
Contributor Author

@makenowjust makenowjust May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's stable is meaningful and essential. When sorting implementation change other, it will be broken if stable: true is missing. Thus, I set stable: true.

@HertzDevil What's your opinion in point of #10609 author.

@asterite
Copy link
Member

@straight-shoota Great!

Just to be sure: did you review the entire code and do you understand how it works? I mean, all the internals and details. I only ask because once this is merged, if there's a bug or something that needs to be changed, we must be able to fix it on our own. I didn't have time to review this yet. Or... if we aren't confident, we can mark this as experimental so we could remove it in the future if there are issues and we can't solve them.

@straight-shoota
Copy link
Member

In my recent review I just checked the diff between the previous HEAD and the force-pushed HEAD (which is really just the last commit 8647b6d), because I had already previously approved.

I don't think I completely understood every little detail. But the code looks readable, is well documented and tested. So I don't think there are should be any maintenance issues.

@straight-shoota straight-shoota modified the milestones: 1.1.0, 1.2.0 Jun 15, 2021
@straight-shoota
Copy link
Member

In the last core team meeting we decided to postpone this PR after the release of 1.1 to rethink the public sorting API. See #6057 (comment) for the discussion about that.
Sorry for the delay and thanks for your patience!

@straight-shoota straight-shoota merged commit e97096a into crystal-lang:master Jul 24, 2021
@makenowjust makenowjust deleted the stable-sort branch July 24, 2021 09:16
@straight-shoota
Copy link
Member

Ooops, I think I was a bit too quick on this one. We were not satisfied with the API design.
Should've removed the milestone before 🙈

But this is fine, probably. It brings the implementation and we can iterate on the API in a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Array stable sort