Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add short-circuiting left and right fold #224

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

paldepind
Copy link

This PR primarily adds a short-circuiting left and right fold to the Foldable specification. The changes proposed are adapted from my experiments in Jabz.

The problem

Fantasy Land currently specifies only a strict left fold. This has the following disadvantages:

  • Infinite data structures cannot implement the Foldable specification in any meaningful way. This is in contrast to Haskell's Foldable typeclass which works fine with infinite structures due to lazyness. Even JavaScript's Iterator specification can handle infinite structures.
  • Many operations that can be carried out on Foldables run with suboptimal performance with a strict fold. Consider for instance the following function:
function find(predicate, foldable) {
  return foldable.reduce((acc, a) => {
    if (isJust(acc)) {
      return acc;
    } else if (predicate(a) === true) {
      return just(a);
    } else {
      return nothing;
    }
  }, nothing);
}

It finds the first element in a Foldable that satisfies the predicate. But, it will always traverse the entire Foldable. Ideally, we would like the function to stop as soon as a matching element is found.

The solution

This PR adds two additional methods to the Foldable specification. The first is named shortFoldl and the second shortFoldr. They are added along with two laws and a description of their behavior. The functions are similar to normal folds with the difference that the folding function must return an object of the type:

type Result<A> = {
  done: boolean,
  value: A
}

The value property is what would typically be returned only and the done property indicates whether or not the fold should stop. Given this the above find function could be implemented like this:

function find(f, foldable) {
  return foldable.shortFoldl((_, a) => f(a) ? {done: true, value: just(a)} : {done: false, value: nothing}, nothing);
}

This implementation of find short-circuits as soon as a satisfying value is found. Furthermore, it will also work for infinite data-structures.

shortFoldr is added so that functions like findLast can also be implemented with optimal performance.

Other changes

This PR also adds a strict right fold to the Foldable specification. The new 2. law ensures that this right fold behaves symmetric to the left fold. The law is a bit tricky I think. But I've convinced myself that it is correct with the following concrete example

const id = a => a;
const f = (m, n) => m - n;
const u = [1, 2, 3, 4, 5];
const acc = 4;
console.log(u.reduceRight(f, acc));
console.log(u.reduce((a, b) => (c) => a(f(c, b)), id)(acc));

About the names

Since this is a breaking change anyway I have decided to rename reduce to foldl. If renaming reduce is controversial I suggest this alternative naming:

Current name in PR Alternative proposal
foldl reduce (i.e. no renaming)
foldr reduceRight
shortFoldl shortReduce
shortFoldr shortReducRight

But I prefer the current naming for the following reasons:

  • It better matches the name of the abstraction (Foldable)
  • foldl is explicit about the direction of the fold while reduce is not.
  • foldl and foldr has a nice symmetry that reflects the actual symmetry of the functions.

Why not use thunks

In Jabz I also experimented with versions of foldr and foldl made lazy by using thunks. However forcing a thunk must be in the form of a function invocation and this resulted in the lazy foldr and foldl not being stack safe (i.e. they would overflow the stack for large data-structures).

Final remarks

I'd love to hear what you all think of these changes.

@safareli
Copy link
Member

safareli commented Feb 8, 2017

... not being stack safe (i.e. they would overflow the stack for large data-structures).

This article about making function composition stack safe could be related (the stack issue might be "fixed"). What would variation using thunks looks like?

#### `shortFoldl`

```hs
shortFoldl :: Foldable f => f a ~> ((b, a) -> {done :: Boolean, value :: b}) ~> b -> b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using []/[x] rather than {done: false}/{done: true, value: x}? I find it a more elegant way to fake the Maybe a type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did indeed. I can see the appeal. But I choose an object for two reasons:

  1. Array are more expensive to create. I.e. it would lower performance.
  2. In Jabz I use an Either to signal when to stop. That is even nicer than a tuple IMO. But fortunately, with the current specification an Either implementation can include a done property which it is true for left and false for right. Then the specification will be directly compatible with that Either implementation. With that find could be implemented like this:
export function find<A>(f: (a: A) => boolean, t: Foldable<A>): Maybe<A> {
  return t.shortFoldl((acc, a) => f(a) ? left(just(a)) : right(acc), nothing);
}

@rjmk
Copy link
Contributor

rjmk commented Feb 8, 2017

I think this PR is awesome.

On the details of how to represent the alternative to the thunk, I'd much prefer an explicit function representation. I love @safareli's implementation & article

Disclaimer: I've not looked at the diff yet, only read the PR description

@paldepind
Copy link
Author

@safareli

Thank you for the link. I'll take a look at it 😄 The thunk based versions that I experimented with can be found here. There are two variants. One uses a plain function as a thunk (lazyFoldrLambda) and the other a thunk class. Neither are stack safe in their current implementation.

In either case I think a version using thunks would perform worse because a new function would have to be created on each invocation to the folding function. But I'm not sure. When I tried to benchmark it I blew the stack 😅

@joneshf
Copy link
Member

joneshf commented Feb 9, 2017

How does this differ from ChainRec?

@@ -338,27 +338,108 @@ the [Applicative](#applicative) and [Plus](#plus) specifications.

### Foldable

1. `u.reduce` is equivalent to `u.reduce((acc, x) => acc.concat([x]), []).reduce`
1. `u` is equivalent to `u.foldl((acc, x) => acc.concat([x]), [])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This doesn't work. I'll have to think about how to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be: u.foldl is equivalent to u.foldl((acc, x) => acc.concat([x]), []).foldl?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u.foldl((acc, x) => acc.concat([x]), []) evaluates to an array so the last foldl should be reduce.

This should be equivalent to the old law:
u.foldl is equivalent to u.foldl((acc, x) => acc.concat([x]), []).reduce.

@rjmk
Copy link
Contributor

rjmk commented Feb 9, 2017

How does this differ from ChainRec?

Not having to remain with in the HKT seems the most obvious difference to me. That is, the ability to fold to Integer instead of f Integer

@paldepind
Copy link
Author

@joneshf

How does this differ from ChainRec?

I don't fully understand ChainRec. But I think this is very different.

My understand is that ChainRec allows tail-recursive monads to be stack safe. This PR adds an alternative to a lazy foldr and a lazy foldl. Any structure that can implement foldr can implement shortFoldr and shortFoldl. ChainRec on the other hand can only be implemented by structures that have a chain method. I.e. monads more or less.

Examples of things that are foldable but cannot implement chain include data structures that needs to order their elements. For instance, a heap or a set implemented as a binary search tree. These aren't even functors so they can't implement chain.

@safareli
Copy link
Member

safareli commented Feb 13, 2017

In JS we have well defined Iteration protocol, which could be used to write efficient find you have mentiond (any structure which implements Iteration protocol could be an argument of the find and it will work perfectly).

My question is: The "gap" of iterating over some structure is already "filled" in JS and why should FL redefine it (and also merge it to Foldable)?

The main difference I see between shortFold{l,r} and Iteration protocol is that short* has sense of direction when Iteration is happening from only one direction. If you think it's useful, I think, proposing adding support for iteration direction to the Iteration protocol would be best (takeLast would need that for example).

If we were taking to add short circuiting support to Foldable structures in some strict and statically typed environment like purescript, then I think something like this should be done instead of "upgrading" Foldable.

Update:
In c++ there you can iterate on vector from both directions (link)

@joneshf
Copy link
Member

joneshf commented Feb 13, 2017

I'm sorry, I phrased my question really poorly. What I meant to ask is: why don't we make this a separate algebra rather than changing Foldable? Having to implement four things rather than one is not ideal.

@paldepind
Copy link
Author

paldepind commented Feb 17, 2017

@safareli By that logic, we should just remove Foldable entirely? Iterators can also be used to implement reduce.

As I mentioned in the original description of the PR this is about bridging the gap and making our Foldable as powerful as the Iterator protocol.

@joneshf

What I meant to ask is: why don't we make this a separate algebra rather than changing Foldable?

Since shortFoldr and shortFoldr can be derived from the strict folds I think they belong in the same algebra. All structures that can implement foldr can also implement shortFoldr. This, for instance, is similar to how any monad can implement both join and >>=. Having shortFoldr and foldr as two separate algebras would be as wrong as having join and >>= as two separate algebras. The methods are two sides of the same coin.

Having to implement four things rather than one is not ideal.

IMO we should ideally have many more. Haskell's Foldable typeclass has 16 methods. As it currently stands Fantasy Lands Foldable specification has very little utility. All functions that can be derived from it will perform as poorly as they would on a cons-list. For most data structures that is far below optimal and thus the derived functions are prohibitively expensive. I explain this in greater detail in my blog post here.

@joneshf
Copy link
Member

joneshf commented Feb 18, 2017

I must not be understanding your proposal. I still don't see how this differs from Chain and ChainRec. Every Chain should be able to implement ChainRec, but it doesn't always make sense to do so.

Every Foldable should be able to implement shortFoldl and shortFoldr, but it doesn't always make sense to do so. Why should we force everyone to implement three new functions if they want to stay compliant with Foldable just to appease the data types where it makes sense to have short circuiting?

As I understand the proposal, it would mean data types like Identity a, Tuple a b, Maybe a, Either a b, etc, would have to implement three additional functions that give no advantage over the one they were implementing before. The distinction between left and right also is not an advantage for these data types. It's not limited to these common data types either. Anything with a small finite number of polymorphic values gets very little benefit from this change for quite a bit of additional work to stay compliant. The fewer values that are polymorphic, the lower the benefit.

The proposal sounds great for recursive data types (whether infinite or not). I definitely think we should add this stuff. But I don't think it's worth the burden to every data type.

My hesitation comes from having defined the interface for Foldable in PureScript. It has three functions you have to implement, and we've been saddled with that decision for years. It makes sense for some data types, but it's not a worthwhile burden to impose on all data types.

@paldepind
Copy link
Author

I must not be understanding your proposal.

I think you do 😄 But I think we're looking at this a bit differently. I will try to do a better job at explaining my point of view.

Every Chain should be able to implement ChainRec, but it doesn't always make sense to do so.

Then I think that Chain and ChainRec should be merged.

Every Foldable should be able to implement shortFoldl and shortFoldr, but it doesn't always make sense to do so. Why should we force everyone to implement three new functions if they want to stay compliant with Foldable just to appease the data types where it makes sense to have short circuiting?

Because if we have both a Foldable and a ShortFoldable algebra there will be an arbitrary distinction between different implementations of what is conceptually the same abstraction. A function like find is conceptually defined for all foldables. Including Tuble a b, Maybe a, etc. But if shortFoldl is in a separate algebra then it will just work with the subset of foldables that also happen to implement ShortFold.

In my opinion, there is a very big difference between "find works for all foldables" and "find works for some foldables". The first is a successful abstraction where the second is leaking technical details through to the end user. The first is a simple API where the second is unnecessarily complex.

Anything with a small finite number of polymorphic values gets very little benefit from this change for quite a bit of additional work to stay compliant.

Since shortFoldr and shortFoldr can be derived, implementing them doesn't have to be "quite a bit of additional work". Jabz includes a deriving mechanism inspired by Haskell. It ensures that implementations only have to provide a minimal complete definition of the abstraction they're implementing. So even though Jabz requires foldables to have 8 methods implementing foldable is as simple as this:

@foldable
class MyFoldable {
  constructor(...) { ... }
  foldr(initial, f) { ... }
}

This gets us the best of both worlds:

  • There is just one Foldable abstraction. Everything that is conceptually a foldable can be used in all places where it makes sense.
  • It is very easy for implementations to implement Foldable and they are given the opportunity to override derived methods to the extent that it makes sense for them to do so.

Again, a foldable that only includes foldr or foldl is of very limited utility. In practice, it means that there is very little one can do with a foldable. Haskell's foldable and it's 16 methods ensures that everything one can conceptually do with a foldable is also viable in practice. Again, I invite you to read my blog post about Jabz where I make my case in greater detail.

My hesitation comes from having defined the interface for Foldable in PureScript. It has three functions you have to implement, and we've been saddled with that decision for years. It makes sense for some data types, but it's not a worthwhile burden to impose on all data types.

I think that problem is confined to PureScript. Does it not support default method implementations like Haskell? Default method implementations ensure that type classes can include additional methods with no downsides. The deriving mechanism in Jabz achieves the same thing and it can easily be adapted to Fantasy Land.

@gabejohnson
Copy link
Member

gabejohnson commented Mar 27, 2017

@paldepind @joneshf why not change Foldable to be defined in terms of foldl and foldr both of which return a Result? Then there are only two methods to implement and reduceLeft and reduceRight can be derived from them and not be part of the spec.

@paldepind paldepind mentioned this pull request Jun 3, 2017
@joneshf
Copy link
Member

joneshf commented Jun 3, 2017

I'm sorry. I don't want to be a blocker here. I just have remorse over what happened with Foldable in PS. But as you stated, this isn't PS.

If others think it's a good addition, let's add it!

@safareli
Copy link
Member

safareli commented Jun 4, 2017

remorse over what happened with Foldable in PS

what?

@joneshf
Copy link
Member

joneshf commented Jun 4, 2017

My hesitation comes from having defined the interface for Foldable in PureScript. It has three functions you have to implement, and we've been saddled with that decision for years. It makes sense for some data types, but it's not a worthwhile burden to impose on all data types.

@paldepind
Copy link
Author

@joneshf

I don't think that's the real issue. The real issue is that PureScript does not support default method implementations. If it had that feature the problem would be solved.

Foldable has a lot of useful derived methods. But if implementations are not given the opportunity to implement these themselves in an efficient way they are pretty useless in practice.

Haskell's Foldable typeclass has 16 methods and it's no problem since they support default methods implementations. In JavaScript we can get the same thing with mixins.

@joneshf
Copy link
Member

joneshf commented Jun 4, 2017

In any case, like you said, this isn't PureScript. So, we don't have the same problems as they do.

@safareli
Copy link
Member

@safareli By that logic, we should just remove Foldable entirely? Iterators can also be used to implement reduce.

That's a good idea, but it is a superclass of Traversable so we need to have it.

As I mentioned in the original description of the PR this is about bridging the gap and making our Foldable as powerful as the Iterator protocol.

Why do we need to make Foldable more powerful, when the power we want already exists in Iterators? you can just reuse bunch of iterator based functions, instead of redefining them.

"This isn't PureScript", this is javascript and Iterators are pretty common and I don't think we we should reinvent what already is present in javascript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants