Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enumerable#min(count) and #max(count) #13057

Merged
merged 5 commits into from
Feb 15, 2023

Conversation

nthiad
Copy link
Contributor

@nthiad nthiad commented Feb 10, 2023

This patch adds max(k), min(k), max?(k), and min?(k) support as requested in #12817.

puts [7, 5, 2, 4, 9].max(3) # => [9, 7, 5]
puts ["Eve", "Alice", "Bob", "Mallory", "Carol"].max(2) # => ["Mallory", "Eve"]
puts [7, 5, 2, 4, 9].min(3) # => [2, 4, 5]
puts ["Eve", "Alice", "Bob", "Mallory", "Carol"].min(2) # => ["Alice", "Bob"]

Like ruby core, this patch uses an internal quickselect implementation which empirically has much better performance than full sorting when k is small. However, when k is large, the performance is also poor, but that is also the case in ruby core.

Here is a little benchmark script comparing max(k) with sort():

[100,10_000,1_000_000].each do |n|
  data = (0..n-1).map { |i| Random.rand(4_000_000) }
  [3,10,20,n//3].each do |k|
    s_elapsed = Time.measure do
      data.sort { |a,b| b-a }
    end
    puts "n=#{n} k=#{k} sort() elapsed=#{s_elapsed}"

    max_elapsed = Time.measure do
      data.max(k)
    end
    puts "n=#{n} k=#{k} max(k) elapsed=#{max_elapsed}"
  end
end

It runs but hangs for quite a while on the last max(k):

n=100 k=3 sort() elapsed=00:00:00.000020394
n=100 k=3 max(k) elapsed=00:00:00.000005727
n=100 k=10 sort() elapsed=00:00:00.000005727
n=100 k=10 max(k) elapsed=00:00:00.000012572
n=100 k=20 sort() elapsed=00:00:00.000007124
n=100 k=20 max(k) elapsed=00:00:00.000025353
n=100 k=33 sort() elapsed=00:00:00.000007263
n=100 k=33 max(k) elapsed=00:00:00.000025842
n=10000 k=3 sort() elapsed=00:00:00.000938610
n=10000 k=3 max(k) elapsed=00:00:00.000291731
n=10000 k=10 sort() elapsed=00:00:00.000866812
n=10000 k=10 max(k) elapsed=00:00:00.000707782
n=10000 k=20 sort() elapsed=00:00:00.000919264
n=10000 k=20 max(k) elapsed=00:00:00.000897054
n=10000 k=3333 sort() elapsed=00:00:00.000617336
n=10000 k=3333 max(k) elapsed=00:00:00.116306633
n=1000000 k=3 sort() elapsed=00:00:00.108918789
n=1000000 k=3 max(k) elapsed=00:00:00.024309854
n=1000000 k=10 sort() elapsed=00:00:00.091967585
n=1000000 k=10 max(k) elapsed=00:00:00.036159780
n=1000000 k=20 sort() elapsed=00:00:00.091330064
n=1000000 k=20 max(k) elapsed=00:00:00.069409304
n=1000000 k=333333 sort() elapsed=00:00:00.090361980
^C

but that is exactly what ruby does:

$ irb
irb(main):001:0> (0..1_000_000).map { |i| Random.rand(4_000_000) }.max(3)
=> [3999995, 3999991, 3999988]
irb(main):002:0> (0..1_000_000).map { |i| Random.rand(4_000_000) }.max(300_000)
^C

It might be worth pointing out these performance characteristics in the docs, but ruby doesn't do that so I decided not to in this patch.

I tried to preserve the other quirks from the ruby implementation such as k > array.size truncating results instead of raising an exception. And raising an exception when k < 0.

src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nthiad for this work! @HertzDevil already gave a thorough review; I'm just adding two little grammar fixes.

EDIT: probably they're ok, deleting my comments.

@Sija
Copy link
Contributor

Sija commented Feb 10, 2023

Passing negative k should IMO raise regardless, making min?/max? redundant.

@straight-shoota
Copy link
Member

@Sija Agreed. See #12817 (comment)

src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have the full name of the algorithm show up somewhere to make it findable.
Either mentioning that in a comment on the helper method or renaming it to quickselect_internal - both should work. But I don't think there's a compelling reason to abbreviate the method name, so I'd prefer spelling it out.

src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
spec/std/enumerable_spec.cr Outdated Show resolved Hide resolved
…erval, qselect => quickselect, error message, formatting (crystal-lang#12817)
Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

As possible future enhancements, the algorithm could use more efficient mechanics such as wrapping operators (&+) as well as unsafe_fetch and Pointer#swap to skip bounds checking. These operations should be guaranteed to be safe in this context.

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nthiad!

@beta-ziliani beta-ziliani added this to the 1.8.0 milestone Feb 10, 2023
@straight-shoota straight-shoota changed the title Add Enumerable#min(k)/max(k) Add Enumerable#min(count) and #max(count) Feb 10, 2023
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
src/enumerable.cr Outdated Show resolved Hide resolved
Thanks @Sija

Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty to directly commit Sija's observations. @nthiad if you disagree with these let us know, we can always roll it back.

@straight-shoota straight-shoota merged commit 677e60f into crystal-lang:master Feb 15, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants