Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add combineAllOption to Foldable #2380

Merged
merged 11 commits into from
Nov 14, 2019
6 changes: 6 additions & 0 deletions core/src/main/scala/cats/syntax/foldable.scala
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ final class FoldableOps[F[_], A](val fa: F[A]) extends AnyVal {
def foldr[B](b: Eval[B])(f: (A, Eval[B]) => Eval[B])(implicit F: Foldable[F]): Eval[B] =
F.foldRight(fa, b)(f)

/**
* given a Monoid evidence for `A`, it returns None if the foldable is empty or combines all the `A`s if it's not
*/
def combineAllOption(implicit ev: Monoid[A], F: Foldable[F]): Option[A] =
if (F.isEmpty(fa)) None else Some(F.combineAll(fa))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be using reduceLeftOption and Semigroup.combine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't use Monoid.combineAll or Semigroup.combineAllOption we are missing the optimization. This is for cases when we can use internal mutability to aggregate many fast, which is pretty critical in some applications (e.g. spark/scalding).

If we just call out to combine we are destroying optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's tricky here, we don't want to impose Monoidand I wasn't aware there is optimization for Semigroup.combineAllOption (actually I still can't find it), but even it does, we are in a syntax extension, so unlike fold, we can't optimize combineAllOption in Foldable instances.

So to retain the optimization we probably need the keep the Monoid requirement which is rather unfortunate, we'd better document the rationale. Another option is to wait until Cats 2.0 with which we'll be able to add a combineAllOption to Foldable

In the meantime @johnynek would you mind point me to the where the Semigroup.combineAllOption's optimization is? we should probably document it somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, here is the method:

https://github.com/typelevel/cats/blob/master/kernel/src/main/scala/cats/kernel/Semigroup.scala#L39

note any semigroup can override that (as we do in other typeclasses for things like map2, product, etc.. in Applicative).

In algebird, which uses cats.kernel, we do this very frequently.

e.g.:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L582

https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala#L141

there are more. The point is the person implementing the Semigroup knows if there is a more efficient way to combine many of them, and should control that. Other typeclasses ideally should delegate to that to not undo the optimization.

Copy link
Contributor Author

@barambani barambani Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this optimization and the practice of overriding Semigroup's combineAllOption (and others) to address specific domain's performance gains. Thanks. What I can think about to exploit it is something like

def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
   if (F.isEmpty(fa)) None else ev.combineAllOption(F.toList(fa))

bit it still adds a traversal of the foldable for the conversion to TraversableOnce. I'm not sure it's acceptable.
Anyway I'm also ok to abort this and wait for Cats 2.0 like @kailuowang mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tough in this case since here you are trading off the potential added cost of materializing a list with the potential benefit of reducing an entire group together.

Many Foldables could actually create an Iterator to pass into combineAllOption on Semigroup, but since we don't have a method like that we can only make the List.

We could imagine:

def onIterator[B](fa: F[A])(itFn: Iterator[A] => B): B =
  itFn(toList(fa).iterator)

on Foldable in cats 2.0. In this world, if you can make an Iterator for the caller to consume. Alternatively, we could add Fold[A, B] to cats:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Fold.scala#L40

so we could have def foldWith[A, B](fa: F[A])(fold: Fold[A, B]): B on Foldable. Since folds ca potentially have internal mutable state, they can do much the same optimization of combineAllOption, although, not in parallel (since you can't split over trees).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot say if we could add Fold so I would leave the decision to others. About the other option, how could I use the onIterator if it were available ? Still with the semigroup combineAllOption or that would be alternative to depending on overriding that in Semigroup ? Because if onIterator wouldn't need to be overridden, could it be another syntax ? Thanks for the explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
   if (F.isEmpty(fa)) None else F.onIterator(ev.combineAllOption(_))

Copy link
Contributor Author

@barambani barambani Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Right. Still aiming at mitigating the possible cost of building a list, it could be delayed up to when the iterator is traversing it. Something like

def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
  if (F.isEmpty(fa)) None else ev.combineAllOption(toIterator)

def toIterator(implicit F: Foldable[F]): Iterator[A] =
  F.foldRight[A, Stream[A]](fa, Eval.later(Stream.empty)) {
    (a, eb) => eb map (Stream.cons(a, _))
  }.value.iterator

I'm not sure actually it be a step forward and is getting late. I will keep looking.


/**
* test if `F[A]` contains an `A`, named contains_ to avoid conflict with existing contains which uses universal equality
*
Expand Down
7 changes: 7 additions & 0 deletions tests/src/test/scala/cats/tests/FoldableSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,13 @@ abstract class FoldableSuite[F[_]: Foldable](name: String)(
}
}

test(s"Foldable[$name].combineAllOption") {
forAll { (fa: F[Int]) =>
val list = fa.toList
fa.combineAllOption should === (list.combineAllOption)
}
}

test(s"Foldable[$name].intercalate") {
forAll { (fa: F[String], a: String) =>
fa.intercalate(a) should === (fa.toList.mkString(a))
Expand Down