Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Enumerable#{zip,zip?} when self is an Iterator (#9328) #9330

Merged
merged 2 commits into from
May 26, 2020
Merged

Fix Enumerable#{zip,zip?} when self is an Iterator (#9328) #9330

merged 2 commits into from
May 26, 2020

Conversation

mneumann
Copy link
Contributor

Both methods call size and as such consume the Iterator (self). Then they
call Enumerable.zip which tries to again iterate over the now empty Iterator
resulting in an empty array.

Running std_spec "before" removing the call to size results in the following
failures:

Failures:

  1) Enumerable zip works for Iterators as receiver
     Failure/Error: SpecCountUpIterator.new(3).zip(1..3, 2..4).should eq([{0, 1, 2}, {1, 2, 3}, {2, 3, 4}])

       Expected: [{0, 1, 2}, {1, 2, 3}, {2, 3, 4}]
            got: []

     # spec/std/enumerable_spec.cr:1057

  2) Enumerable zip? works for Iterators as receiver
     Failure/Error: SpecCountUpIterator.new(3).zip?(1..2, 2..4).should eq([{0, 1, 2}, {1, 2, 3}, {2, nil, 4}])

       Expected: [{0, 1, 2}, {1, 2, 3}, {2, nil, 4}]
            got: []

     # spec/std/enumerable_spec.cr:1063

Fixes: #9328

Both methods call `size` and as such consume the Iterator (`self`). Then they
call Enumerable.zip which tries to again iterate over the now empty Iterator
resulting in an empty array.

Running std_spec "before" removing the call to `size` results in the following
failures:

    Failures:

      1) Enumerable zip works for Iterators as receiver
         Failure/Error: SpecCountUpIterator.new(3).zip(1..3, 2..4).should eq([{0, 1, 2}, {1, 2, 3}, {2, 3, 4}])

           Expected: [{0, 1, 2}, {1, 2, 3}, {2, 3, 4}]
                got: []

         # spec/std/enumerable_spec.cr:1057

      2) Enumerable zip? works for Iterators as receiver
         Failure/Error: SpecCountUpIterator.new(3).zip?(1..2, 2..4).should eq([{0, 1, 2}, {1, 2, 3}, {2, nil, 4}])

           Expected: [{0, 1, 2}, {1, 2, 3}, {2, nil, 4}]
                got: []

         # spec/std/enumerable_spec.cr:1063

Fixes: #9328
@jhass
Copy link
Member

jhass commented May 20, 2020

Shouldn't we try to preserve the optimization for Indexables?

Copy link
Member

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@asterite
Copy link
Member

Shouldn't we try to preserve the optimization for Indexables?

There are no changes to Indexable in this PR.

@@ -1634,7 +1634,7 @@ module Enumerable(T)
# a.zip(b, c) # => [{1, 4, 8}, {2, 5, 7}, {3, 6, 6}]
# ```
def zip(*others : Indexable | Iterable | Iterator)
pairs = Array(typeof(zip(*others) { |e| break e }.not_nil!)).new(size)
pairs = Array(typeof(zip(*others) { |e| break e }.not_nil!)).new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if other is an Indexable, we could use its size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct, the size is always taken from self.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speficially, [1, 2, 3].zip(4..5) should give [[1, 4], [2, 5]]. Even if we take the size from the other indexable, it might not be enough and there will be a reallocation.

In short, maybe it can be done, but I don't know. Better to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asterite: [1,2,3].zip(4..5) raises an IndexError as written in the documentation. All enumerable must have at least as many elements as self. So the size of the returned Array is always exactly self.size or an exception is thrown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to keep this optimization in for Indexable. self.is_a?(Indexable) ? size : 0 should work well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RX14: thanks. I did exactly that!

@straight-shoota
Copy link
Member

It's certainly not a huge deal to loose the size hint for the array allocation. But it would still be a nice to use that for Indexable which has a static size that doesn't need to iterate for counting.
Maybe overriding Indexable#zip would be a solution to this.
Alternatively, we could consider to change Enumerable#size. For example, it could be nilable and return nil by default. For some enumerables (and I think that should be reflected in the default implementation) size just doesn't work because some iterators for example can only count via iterating. And after that it's not useful anymore when you can't roll back. I realize this is a little bit different topic, but it would resolve this issue.

@mneumann
Copy link
Contributor Author

mneumann commented May 22, 2020

@straight-shoota One could add Indexable#zip and #zip? as shown below. Note that they both use size. It's a bit code-duplication though.

--- a/src/indexable.cr
+++ b/src/indexable.cr
@@ -580,6 +580,19 @@ module Indexable(T)
     indexes.map { |index| self[index] }
   end
 
+  def zip(*others : Indexable | Iterable | Iterator)
+    pairs = Array(typeof(zip(*others) { |e| break e }.not_nil!)).new(size)
+    zip(*others) { |e| pairs << e }
+    pairs
+  end
+
+  def zip?(*others : Indexable | Iterable | Iterator)
+    pairs = Array(typeof(zip?(*others) { |e| break e }.not_nil!)).new(size)
+    zip?(*others) { |e| pairs << e }
+    pairs
+  end
+
+

A more general solution would be to add a method size_hint to Enumerable (and other modules) that return the lower and upper bounds (if known), i.e. a Tuple(Int, Int?).

Rust has this:
https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint

Further adding Array.new(size_hint : Tuple(Int, Int?)) or Array.new(size_hint : SizeHint) would make creating an Array from a size_hint even easier. But I'd rather make this a separate commit. This commit tries to fix a bug.

Copy link
Member

@jhass jhass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, if you want to follow up with an optimized version for Indexable, that would be much appreciated!

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection labels May 23, 2020
@RX14 RX14 added this to the 0.35.0 milestone May 25, 2020
@bcardiff bcardiff merged commit f3f0b29 into crystal-lang:master May 26, 2020
@mneumann mneumann deleted the bug-enumerable-zip-exhausts-receiver branch May 29, 2020 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unintuitive behavior of Enumerable#zip depending on receiver arg
6 participants