Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator.drop cannot handle (very) large iterators #13052

Open
charpov opened this issue Oct 28, 2024 · 6 comments · May be fixed by scala/scala#10903
Open

Iterator.drop cannot handle (very) large iterators #13052

charpov opened this issue Oct 28, 2024 · 6 comments · May be fixed by scala/scala#10903

Comments

@charpov
Copy link

charpov commented Oct 28, 2024

I sometimes use very large iterators (more than Int.MaxValue elements) to check that students don't expend in memory when they shouldn't. However, some iterator methods, like drop, don't work correctly on those:

class TenBillionIterator extends AbstractIterator[Long]:
   private var n        = 0L
   def hasNext: Boolean = n < 10_000_000_000L
   def next: Long       = { val r = n; n += 1L; r }

val i: Iterator[Long] =
   TenBillionIterator()
      .drop(1_000_000_000)
      .drop(1_000_000_000)
      .drop(1_000_000_000)

i.next() // 2147483647, should be 3000000000

Interestingly, ++ (concat) does work:

val j = TenBillionIterator() ++ TenBillionIterator()
var n = 13_000_000_000L
while n > 0 do
   n -= 1L
   j.next()

j.next() // 3000000000, correctly

I understand it's a somewhat corner case, but the documentation is silent about such a limitation and the behavior caught me off-guard (especially because ++ worked and things only fell apart when I started using drop). I don't know if this is a documentation issue or if some code needs fixing (maybe just a matter of making dropping a Long in class SliceIterator).

(Note that Iterator.range(0L, 10_000_000_000L) also doesn't work, but it's a different issue.)

I'm using Scala 3.5.2, but this is a Scala 2 collection problem.

@som-snytt som-snytt linked a pull request Oct 28, 2024 that will close this issue
@som-snytt
Copy link

The linked solution just evaluates the underlying iterator, rather than cap the drop count.

An iterator wanting different behavior would implement appropriate overrides (such as sliceIterator).

The current workaround would be to call hasNext, which has the same effect.

@charpov
Copy link
Author

charpov commented Oct 28, 2024

Would this work?

val sum = dropping + lo
if (sum < 0) {
  dropping = Int.MaxValue
  skip()
  sum + Int.MaxValue
}
else sum

I find it easier to read, but I don't trust myself with 2s-complement arithmetic...

@som-snytt
Copy link

Simpler is

if (sum < 0) {
  skip()
  lo
}

which has the advantage of forcing less, while still being an easy read.

To force least, then skip with dropping at lo - bump, and end with dropping at MaxValue.

@charpov
Copy link
Author

charpov commented Oct 28, 2024

I like the simplest code, even if it forces more than necessary (and I meant to have this discussion in the PL, not here).

@som-snytt
Copy link

Since it doesn't matter analytically (and if I can convince myself; there is a case for minimal evaluation), I will take your advice and update the PR. It's good to practice simplicity: use it or lose it.

@SethTisue
Copy link
Member

@scala/collections

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants