-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stable sort implementation to Slice
and Array
#10163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I'd rename v
to other
and size
to other_size
.
Having a variable for this is not even necessary, other.size
will fit as well.
I think the protected methods should be tested too, ideally. |
Personally I'd not consider going from undefined order to defined order a backwards compatibility breaking change. Though perhaps I'm wrong there. However, I'd argue for that the default would be whatever is fastest. Is there any performance numbers? (performance tests can perhaps be found in the previous change to the sorting). |
For Crystal, like in Ruby, I prefer correct over fast. In fact, if this stable sort is fast enough I would prefer to make stable the default and remove the non-stable option. There's less code and less worries for developers. |
A rough benchmark: require "benchmark"
N = 100_000
puts "=== random ==="
rand = Random.new(seed: 42)
arr = [] of Int32
N.times { arr << rand.next_int }
Benchmark.ips do |x|
x.report("stable: true") { arr.sort(stable: true) }
x.report("stable: false") { arr.sort(stable: false) }
end
puts "=== partially sorted ==="
arr = [] of Int32
4.times do
(N//4).times { |i| arr << i }
end
Benchmark.ips do |x|
x.report("stable: true") { arr.sort(stable: true) }
x.report("stable: false") { arr.sort(stable: false) }
end And a result on my MacBook Air 2018 is:
I don't have a strong opinion, but I think we can switch to |
Wow, the stable algorithm is pretty good! I think we should just leave that as a default. |
Interesting, But looking at the times above, it seems stable sort isn't that much slower... hmmm... |
I'd be surprised if anyone relied on instability of sorting, so is there even a backward compatibility concern? As others have stated, IMO, default should be |
Now, |
About specs: I would like to avoid quadruple tests for |
@straight-shoota Please open another PR. Your concern can separate from this at all. This PR already contains two fixes (stable sort implementation and switching the default). |
It would be great to include the deduplication in this PR because it makes the spec changes much easier to review. But I can make a separate PR for that and then we merge it into this. |
@straight-shoota Or you could include those within this PR, whatcha think @makenowjust? |
I still think we should make |
I'm not talking about duplication stable vs. unstable behaviour but duplication between four different methods which all are based on the same implementation. |
Removing existing specs is an incomprehensible thing in general. A squashed commit is hard to describe the reason. IMHO we should open another PR in this case. |
Is anything holding this up, I'd be happy to help. |
Waiting on #10208. |
I would rather wait for 1.1 to avoid surprises like edge-cases, memory footprint. Or at least more heave benchmarks since this is changing the default sort. As mentioned in #10163 (comment) this can be seen as a non-breaking change so it can arrive after 1.0. |
While not a huge breaking chage, it is surely going to affect code that relies on the unstable sort implementation. Unstable is still deterministic. Not sure if that's much relevant besides specs that test for specific sort results. But even that should probably be considered a breaking change. |
Why not merge it as off by default and later change to on by default |
^That way people get to try it out and tell about it, so maybe you won't need to guess whether it's breaks some expectation |
Actually that's not even my point. It's a very useful feature to have, and the argument against merging it is only with the assumption that it becomes the default. |
Why should it not be the default? I'd assume there is a mutual agreement that stable sort should be default (eventually) and if you don't need it but want more performance, you can disable it. Adding (non-default) stable sort isn't relevant for 1.0. If the default behaviour doesn't change, it can just be introduced in 1.1. But changing default behaviour should better be in 1.0 or wait for 2.0. |
Heads up: this can now be applied on top of #10609. |
might be worth merging this sooner rather than later given #10748 |
This could be merged already and the remaining optimization using #10609 could be applied in the followup PR - if @makenowjust has no time to do it right now. |
The stable sort algorithm is ported from Rust. It is similar to TimSort, but galloping is dropped.
Thanks @j8r for suggestion! Co-authored-by: Julien Reichardt <git@jrei.ch>
And, tests are added.
#10609 is applied. This is ready to merge now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has already been milestoned for a long time, and I think it's ready for merging.
@makenowjust I would appreciate if you could just merge master instead of rebase the next time. Force pushes make proper reviewing unnecessarily hard.
@@ -166,7 +166,7 @@ class Crystal::Doc::Type | |||
defs << method(def_with_metadata.def, false) | |||
end | |||
end | |||
stable_sort! defs, &.name.downcase | |||
defs.sort_by!(stable: true, &.name.downcase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since stable is default, the argument is technically not necessary. I don't mind keeping it explicit, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's stable
is meaningful and essential. When sorting implementation change other, it will be broken if stable: true
is missing. Thus, I set stable: true
.
@HertzDevil What's your opinion in point of #10609 author.
@straight-shoota Great! Just to be sure: did you review the entire code and do you understand how it works? I mean, all the internals and details. I only ask because once this is merged, if there's a bug or something that needs to be changed, we must be able to fix it on our own. I didn't have time to review this yet. Or... if we aren't confident, we can mark this as experimental so we could remove it in the future if there are issues and we can't solve them. |
In my recent review I just checked the diff between the previous HEAD and the force-pushed HEAD (which is really just the last commit 8647b6d), because I had already previously approved. I don't think I completely understood every little detail. But the code looks readable, is well documented and tested. So I don't think there are should be any maintenance issues. |
In the last core team meeting we decided to postpone this PR after the release of 1.1 to rethink the public sorting API. See #6057 (comment) for the discussion about that. |
Ooops, I think I was a bit too quick on this one. We were not satisfied with the API design. But this is fine, probably. It brings the implementation and we can iterate on the API in a follow-up. |
Fixed #6057
The stable sort algorithm is ported from Rust (it is called
merge_sort
in Rust, even though it is extended.)It is similar to TimSort, but galloping is dropped.
Of course, it holds good properties of TimeSort.
For example, sorting is done in linear time when the collection is already sorted.
This PR adds a
stable
option to sorting methods (#sort
,#sort!
,#sort_by
and#sort_by!
) ofArray
andSlice
, respectively.The option's default value is
false
for backward compatibility.It is my last work in 2020. Have a happy new year 🎉