Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TailRecM does not make sense for lists / streams #1329

Closed
alexandru opened this issue Aug 24, 2016 · 11 comments
Closed

TailRecM does not make sense for lists / streams #1329

alexandru opened this issue Aug 24, 2016 · 11 comments

Comments

@alexandru
Copy link
Member

alexandru commented Aug 24, 2016

The new tailRecM in FlatMap probably does not make sense for lists / streams.

I promise to read the paper (have some difficulty parsing Haskell/Purescript code), however this operation only makes sense for monads for which this can be implemented in a tail-recursive way (i.e. constant stack AND heap).

Looking for example at the List implementation the call stack is effectively replaced with a heap buffer that balloons in size with each iteration. This may be stack safe, but it is not tail recursive, which I think misses the point. Unless I'm not reading it right, the growth is exponential. And actually, the possibility of such a space leak is why we have stack-overflow errors (the idea being to crash a single thread instead of the whole process or system) and I cannot see the utility of such a dangerous operation on lists / streams.

And what this operation does is enforced by the tailRecMConsistentFlatMap law, because of which we cannot provide an alternative interpretation. Because I tried doing basically a .take(1) on all streams emitted by f.

My current problem is that in Monix I have to provide a Monad instance for Observable. And naturally I noticed the law-tests going from 2 seconds to 60 seconds, because for each observable emitted by f, we get an arbitrarily fresh observable that's then flat-mapped in that loop and observables are made of arbitrary lists of numbers and so on. And to give an idea of why this happens, I fired up a profiler and saw that the defaultTailRecM implementation was called over 4.5 million times, with over 35 million elements being streamed by the FlatMapLaws tests, which along with the tricks I'm doing to test this synchronously and cross-platform, it gets to be freakishly slow.

But besides this ... given its current laws, given that it is unsafe, I do NOT want to provide this function for any lists or streams I can think of. And I can't just provide the default implementation because that default is untestable. I cannot have a test that takes 60 seconds and reconfiguring ScalaCheck to emit smaller lists of numbers or to do less checks isn't OK at all, because I actually cared about the flatMap/monad tests.

@alexandru
Copy link
Member Author

alexandru commented Aug 24, 2016

Interestingly, Scalaz has this separated into BindRec, which I think is a good idea, but then they implement it for List as well.

I don't really understand this decision, maybe somebody can explain. Avoiding a stack overflow isn't a purpose in itself, having your process not blow up is.

@alexandru
Copy link
Member Author

I'm solving this for now by configuring ScalaCheck to weaken the tests (setMaxSize etc)

@johnynek
Copy link
Contributor

johnynek commented Aug 24, 2016

The List and Stream examples are correct but just like a recursive flatMap
on Lists can multiply the size of the List with each flatMap, you can get
exponential growth.

If you loop on Lists you need the mean size of the resulting list to be
less than 1. Some can be longer than one, but if the average is longer than
1 it may never terminate.

You can see the discussion about this. If you don't have tailRecM people
will just do recursive flatMaps. On the other hand, tailRecM won't blow the
stack (like almost every JVM program it may OOM), for any of the Monads in
cats.

We have so far seen three classes of scala monad that can implement
tailRecM safely.

  1. Strict/Iterable: these are collection-like. These can implement tailRecM
    in a tail recursive loop.
  2. Composed: this is a type that is composed with another Monad, where it
    can with constant stack space just use tailRecM on the inner monad (see
    WriterT).
  3. trampolined: Monads that have stack safe flatMap recursion can in a
    stack safe way use defaultTailRecM. Examples: Eval, Free, FreeT

Without tailRecM (or recursive flatMap) being safe, libraries like
iteratee.io can't safely be written since they require monadic recursion.

We made the switch in 0.7.0 to always have tailRecM since consistency with
flatMap is possible for all Monads but to introduce a marker trait when it
is certainly safe (RecursiveTailRecM). In this way you can write recursive
monadic calls once (using tailRecM) and which will be the safest
implementation for each monad, rather than writing all such methods twice
or banning non-standard safe monads from ever doing even a single monadic
recursion (when the caller may know the recursion is shallow enough to be
safe, but can't express that sort of knowledge at the typelevel).

It sounds like your issue is that Observable, like List has a
multiplicative flatMap and hence recursion can blow up depending on "size"
of the Observable at each flatMap call.

@kailuowang
Copy link
Contributor

kailuowang commented Aug 24, 2016

Thanks very much @johnynek. Just as a reminder, I think it will be valuable for this discussion, once concluded, to be added to monad doc and/or the FAQ (with some editing of course)

@johnynek
Copy link
Contributor

@alexandru the real issue may be this:

https://github.com/typelevel/cats/blob/master/laws/src/main/scala/cats/laws/FlatMapLaws.scala#L48

I made that change to exercise deeper recursion (but not too deep). Maybe that was a bad call. We could walk it back to smallN = 1 and rely on the equational reasoning that if bounce(1) = bounce(0).flatMap(f) we can induct to show that bounce(n) == bounce(n-1).flatMap(f).

With that change, you may never see the smallN = 3 case which may be blowing you up by creating millions of items.

@alexandru
Copy link
Member Author

alexandru commented Aug 24, 2016

Hi @johnynek, I appreciate your answer. To describe my problem in more details, please have patience with me :-)

I've measured what's going on, so to put some numbers on this, I have these ScalaCheck settings:

  • max size (of the generated lists): 100 (default)
  • min successful tests: 100
  • max discard ratio: 5
  • everything else default

With these settings, that property test is generating:

  • number of tailRecM calls: 200
  • f calls: 2,923,938
  • Left observable instances created: 52,771
  • Right observable instances created: 2,871,167
  • total number of A elements streamed: 186,170,384

Generating/streaming 186 million items in a test is way too much. And you could say that the max size is too high, but 100 is a low count for testing streams. So indeed, if you could do smallN = 1, that would probably be better.

Following is more personal opinion :-)

tailRecM won't blow the stack (like almost every JVM program it may OOM), for any of the Monads in cats.

I think we think of a different notion of stack. For me, a tail recursive algorithm is one that uses constant space, either stack or heap. Especially because in functional programming you can't rely on the normal C call-stack that much. From what I've heard Haskell manages its own "stack" by keeping linked lists in heap. And we are basically replacing Java's call stack by trampolines or thread-pools.

Either way, for streams, no tail recursive algorithm is possible for implementing flatMap or tailRecM. Take any perfect lazy stream implementation, the most efficient you can think of. This piece of code is going to eventually crash any process, regardless of the implementation of that stream, be it observables, iteratee enumerators, etc:

F.tailRecM(1) { a =>
  Stream.continuously(Left(a+1))
}

That's because, by my definition at least, this isn't tail recursive, as on each frame we aren't discarding the current frame and we keep work around, in our trampoline or our thread-pool or whatever. But it gets worse. Creating streams of just one single element cannot be tail-recursive either and it will also crash the process:

// This will also crash
F.tailRecM(1) { a => Stream( Left(a+1) ) }

Of course, this is counter intuitive, because the equivalent with a Task does not crash any process:

// This is tail recursive
F.tailRecM(1) { a => Task( Left(a+1) ) }

The reason is that any stream implementation needs a way to signal the end, as a final event, be it as a final hasNext question for Iterable, or returning a Halt or a Nil for pull-based streams, like that iteratee Enumerator, or an onComplete event for Observable. This means that you need to keep at least one pointer around on each level of this recursion, and you cannot discard it until the child stream you just created is finished - functioning exactly like a real call stack.

Without tailRecM (or recursive flatMap) being safe, libraries like iteratee.io can't safely be written since they require monadic recursion.

I understand that, but what's the use of shoving a List, or any kind of stream, in the F[_] of that Iteratee? That Iteratee needs a Task, or something that's tail recursive, not some stream. Would a stream even make sense?

My opinion here is that tailRecM should be an optional operation and not required to be present on the base Monad type. Because even though it can be described in terms of flatMap, it is dangerous.

Of course, I might be missing something. It does seem like an interesting way to generate data.

It sounds like your issue is that Observable, like List has a multiplicative flatMap and hence recursion can blow up depending on "size" of the Observable at each flatMap call.

No, Observable is lazy, space growth is linear, but it blows up exponentially in time. And it doesn't have a stack safe flatMap. I could fix it, but I consider it a feature, because a StackOverflowException is easier to investigate than an OOM.

Anyway, if anybody could point out to some docs where a good usage of tailRecM is exemplified for streams, I'd appreciate it. And I really do want to read that paper, promise I will :-)

Cheers,

@johnynek
Copy link
Contributor

What we mean by stack-safe is that it consumes a constant amount of jvm stack, not that it consumes O(1) memory.

Note that:

scala> import cats.instances.stream._
import cats.instances.stream._

scala> Monad[Stream].tailRecM(1) { a => Stream( Left(a+1) ) }

does not crash, in fact. It simply does not terminate (I have been running it for quite some time).

This also just spins:

scala> Monad[List].tailRecM(1) { a => List( Left(a+1) ) }

without crashing or OOMing.

This is not a lot different than the function:

@annotation.tailrec
def go[A, B](a: A, fn: A => Either[A, B]): B =
  fn(a) match {
    case Right(b) => b
    case Left(a1) => go(a1, fn)
  }

then calling go(1, { a => Left(a) }) It will not terminate, but it still uses constant stack space and memory.

This is precisely what tailRecM is ideally doing. In the same way that someone can write an unsafe recursive function, you can write a non-terminating call to tailRecM but I don't think that is justification to make it second class anymore than we aggressively penalize recursion in general.

Here is a case of using tailRecM:

def readAll(in: InputStream): Try[Vector[Array[Byte]]] = {
   val buffer = new Array[Byte](1024)
   def step(acc: Vector[Array[Byte]): Try[Either[Vector[Array[Byte]], Vector[Array[Byte]]]] =      
     Try(in.read(buffer)).map {
       case x if x < 0 => Right(acc)
       case x => Left(acc :+ buffer.slice(0, x))
     }

  Monad[Try].tailRecM(Vector.empty[Array[Byte]])(step)
}

Without tailRecM such monadic recursion is unsafe for everything in cats except for Free and Eval (and maybe one or two others).

This is also common in the iteratee library to do this kind of monadic recursion. It may not be common, or safe, for all monads to recursively flatMap or call tailRecM, but for most it actually is (again, all of them in cats are safe to use in this way).

Incidentally, using tailRecM with Try in iteratee nets a 29% speed increase over using a trampolined Free[Try, _] and 100% win over using a rerunnable Future. That kind of win is valuable: travisbrown/iteratee#117 (comment)

@alexandru
Copy link
Member Author

@johnynek I'm still not convinced about its utility for lists and I'd do more arguing about tail recursivity, but then I was wrong before, so I'll shut up for now.

That test fix is sufficient, thanks for it.

@johnynek
Copy link
Contributor

Here is a (contrived) example for list:

import cats.Monad

def iterateM[M[_]: Monad, T](m: M[T], depth: Int)(fn: T => M[T]): M[T] = {
  def step(a: (Int, T)): M[Either[(Int, T), T]] = {
    val (d, t) = a
    if (d <= 0) Monad[M].pure(Right(t))
    else Monad[M].map(fn(t)) { nextt => Left((d - 1, nextt)) }
  }
  Monad[M].flatMap(m) { t => Monad[M].tailRecM((depth, t))(step _) }
}

import cats.instances.list._

scala> iterateM[List, Int](List(1, 2, 3), 0) { x => (0 to x).toList }
res9: List[Int] = List(1, 2, 3)

scala> iterateM[List, Int](List(1, 2, 3), 1) { x => (0 to x).toList }
res10: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3)

scala> iterateM[List, Int](List(1, 2, 3), 2) { x => (0 to x).toList }
res11: List[Int] = List(0, 0, 1, 0, 0, 1, 0, 1, 2, 0, 0, 1, 0, 1, 2, 0, 1, 2, 3)

scala> iterateM[List, Int](List(1, 2, 3), 3) { x => (0 to x).toList }
res12: List[Int] = List(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 1, 2, 0, 0, 1, 0, 1, 2, 0, 1, 2, 3)

you can imagine other examples that are not as silly (like doing a search where we can find 0, 1, or more than one candidates to recursively try, and recursion on a list is nice there, using tailRecM, you can do that safely).

@alexandru
Copy link
Member Author

Thanks @johnynek, warming up to it :-)

I think this is one of those cases where users need some education. Hopefully some examples will happen soon in those docs.

@alexandru
Copy link
Member Author

Cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants