Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index related helpers to Traverse #1761

Merged

Conversation

andyscott
Copy link
Contributor

@andyscott andyscott commented Jul 7, 2017

This adds traverseWithIndex, mapWithIndex and zipWithIndex to Traverse.

  • traverseWithIndexM is akin to traverse, but includes the index of the value in the structure F. This requires the effect to be monadic.
  • mapWithIndex is akin to map, but includes the index of the value in the structure F.
  • zipWithIndex returns F, with values tupled with their indices, just like Scala collection's zipWithIndex.

Instances for List, Vector, and Stream have overridden mapWithIndex and zipWithIndex that delegate to related underlying collections methods. mapWithIndex on List and Vector uses .iterator to avoid building an intermediate collection when first zipping with indices.

*/
def stateTraverse[S, A, B](fa: F[A])(f: A => State[S, B]): State[S, F[B]] =
State[S, F[B]](s =>
traverse[State[S, ?] , A, B](fa)(f).run(s).value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the additional State wrapping of run(s).value, right?

This is equal to traverse[State[S, ?] , A, B](fa)(f) and with SI 2712 fixed, we don't even need to specify the types, so it is just traverse(fa)(f).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow up on @peterneyens comment, since this is just equivalent to traverse, is there any reason to have a separate stateTraverse?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm confused why this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't!

PSA: Don't code while deliriously tired.

@codecov-io
Copy link

codecov-io commented Jul 7, 2017

Codecov Report

Merging #1761 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1761      +/-   ##
==========================================
+ Coverage   94.17%   94.21%   +0.03%     
==========================================
  Files         256      256              
  Lines        4208     4218      +10     
  Branches       93       97       +4     
==========================================
+ Hits         3963     3974      +11     
+ Misses        245      244       -1
Impacted Files Coverage Δ
core/src/main/scala/cats/instances/list.scala 100% <100%> (ø) ⬆️
core/src/main/scala/cats/instances/stream.scala 94.82% <100%> (+0.18%) ⬆️
core/src/main/scala/cats/Traverse.scala 100% <100%> (ø) ⬆️
core/src/main/scala/cats/instances/vector.scala 97.87% <100%> (+0.09%) ⬆️
core/src/main/scala/cats/Foldable.scala 100% <0%> (+1.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb84fd4...9789c2d. Read the comment docs.

@andyscott andyscott changed the title Add stateTraverse and indexed to Traverse Add indexed to Traverse Jul 7, 2017
@andyscott andyscott force-pushed the feature/stateTraverse-and-indexed branch from e3aa1ff to 98b6808 Compare July 7, 2017 16:45
*/
def stateTraverse[S, A, B](fa: F[A])(f: A => State[S, B]): State[S, F[B]] =
State[S, F[B]](s =>
traverse[State[S, ?] , A, B](fa)(f).run(s).value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm confused why this is needed.


test("indexed consistent with zipWithIndex") {
forAll { (fa: List[String]) =>
Traverse[List].indexed(fa) should === (fa.zipWithIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try this for a very long list (say 500k elements)? I'm concerned State with traverse can fail for us. I think I have seen issues like that before. I don't want to leave a landmine for someone.

I think we may be able to work around by using tailRecM on State though, but let's see if we need to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a test for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Went ahead and created a dedicated file for testing Traverse instead of piggybacking on the tests in ListTests.scala.

@andyscott andyscott force-pushed the feature/stateTraverse-and-indexed branch 2 times, most recently from edaaf4b to eef6979 Compare July 7, 2017 17:31
@andyscott andyscott force-pushed the feature/stateTraverse-and-indexed branch from eef6979 to 50ba29c Compare July 7, 2017 17:34
@johnynek
Copy link
Contributor

johnynek commented Jul 7, 2017

👍

@kailuowang
Copy link
Contributor

seems problematic on scalaJS

Security context: 0x1784390b4629
1: flatMap__F1__Lcats_Eval [/home/travis/build/typelevel/cats/tests/.js/target/scala-2.10/cats-tests-test-fastopt.js:~49091] [pc=0x323fb4f2641] (this=0x3b070c22ee61 <a $c_Lcats_Eval$$anon$5 with map 0x14937c2e6199>,f=0xe6fa70d8d09 <a $c_sjsr_AnonFunction1 with map 0x14937c2a97e1>)
2: map__F1__Lcats_Eval [/home/travis/build/typelevel/cats/tests/.js/target/scala-2.10/cats-tests-test-fasto...
[error] FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

@andyscott
Copy link
Contributor Author

andyscott commented Jul 7, 2017

I did some quick testing. On the JVM, everything is fine. However, on JS the performance of .indexed is nonlinear. It's only apparent once the collections get really big.

For testing, I used collections from 50,000 to 500,000 entries by increments of 50,000. The tests run reasonably quick for < 100,000 items in the collection. After 300,000 items, the tests get extremely slow or fail.

Currently I haven't looked for an underlying problem. For the time being, I have adjusted the tests so that JS is uses a smaller collection. It passed locally-- hopefully it does the same on the build server.

@kailuowang
Copy link
Contributor

kailuowang commented Jul 7, 2017

Do we know that 100,000 isn't enough to test against stack safety for JVM? Now the 3 tests take more than 30 seconds, not sure if that's necessary

[info] - Traverse[List].indexed stack safety (12 seconds, 943 milliseconds)
[info] - Traverse[Stream].indexed stack safety (15 seconds, 11 milliseconds)
[info] - Traverse[Vector].indexed stack safety (12 seconds, 585 milliseconds)

@andyscott andyscott force-pushed the feature/stateTraverse-and-indexed branch from b0122c9 to 0eee566 Compare July 7, 2017 23:06
@edmundnoble
Copy link
Contributor

@andyscott @kailuowang Afaik you actually need at most 70000 (see my latest merged PR).

@andyscott
Copy link
Contributor Author

@edmundnoble Aah! Right now it's been switched to 100000. I'll switch it to 70000 :).

ase enter the commit message for your changes. Lines starting
@andyscott andyscott force-pushed the feature/stateTraverse-and-indexed branch from 0eee566 to 4413058 Compare July 7, 2017 23:26
@edmundnoble
Copy link
Contributor

Can you also give me something like indexed which takes a function (A, Int) => B and produces an F[B]?

@andyscott
Copy link
Contributor Author

andyscott commented Jul 7, 2017

something like...

// untested
def mapWithIndex[A, B](fa: F[A])(f: (A, Int) => B): F[B] = 
  traverse(fa)(a => State((s: Int) => (s + 1, f(a, s)))).runA(0).value

@edmundnoble
Copy link
Contributor

Yeah sure.

@andyscott
Copy link
Contributor Author

andyscott commented Jul 7, 2017

Done. I also updated indexed to be implemented in terms of mapWithIndex.

@andyscott andyscott changed the title Add indexed to Traverse Add mapWithIndex and indexed to Traverse Jul 7, 2017
@andyscott andyscott force-pushed the feature/stateTraverse-and-indexed branch from 8c6e3e6 to 83a0883 Compare July 7, 2017 23:54
@andyscott
Copy link
Contributor Author

andyscott commented Jul 10, 2017

Any objections to also adding traverseWithIndex?

def traverseWithIndex[G[_]: Monad, A, B](fa: F[A])(f: (A, Int) => G[B]): G[F[B]] =
  traverse(fa)(a => StateT((s: Int) => f(a, s).map(b => (s + 1, b)))).runA(0)

@edmundnoble
Copy link
Contributor

That's fine with me but I think in the long run it could be better, as soon as we have a Sum new type for that Monoid you can change the constraint to G[_]: Applicative and use WriterT.

@andyscott andyscott changed the title Add mapWithIndex and indexed to Traverse Add index related helpers to Traverse Jul 10, 2017
* The behavior is consistent with the Scala collection library's
* `zipWithIndex` for collections such as `List`.
*/
def indexed[A](fa: F[A]): F[(A, Int)] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to call this zipWithIndex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we override this for List Vector and Stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do.

}

test(s"Traverse[$name].mapWithIndex") {
forAll { (fa: F[Int]) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take ((Int, Int)) => Int as a function and do:

fa.mapWithIndex((a, i) => fn((a, i))).toList should === (fa.toList.zipWithIndex.map(fn))


test(s"Traverse[$name].traverseWithIndex") {
forAll { (fa: F[Int]) =>
fa.traverseWithIndex((a, i) => Option((a, i))).map(_.toList) should === (Option(fa.toList.zipWithIndex))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have a stronger test, like take fn: ((Int, Int)) => Option[Int] and then

val left = fa.traverseWithIndex((a, i) => fn((a, i))).map(_.toList)
val right = (fa.toList.zipWithIndex match {
  case Nil => Some(Nil)
  case h :: tail => tail.foldM(h :: Nil) { case (l, ai) => fn(ai).map(_ :: l) }.map(_.reverse)
})
left should === right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update incoming. I switched from Option to Tuple2 to make the stronger test more straightforward.

* Akin to [[map]], but also provides the value's index in structure
* F when calling the function.
*/
def mapWithIndex[A, B](fa: F[A])(f: (A, Int) => B): F[B] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we override this for List, Vector and Stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing

@andyscott
Copy link
Contributor Author

andyscott commented Jul 11, 2017

@johnynek I think I've addressed everything.

Using views seemed to be the best way to override mapWithIndex, as to avoid creating an intermediate collection for List and Vector traverse instances. I can switch it to something else, if desired.

Also: I added an "underlying" proxy so I could get coverage of the default method implementations. This also has the very minor benefit of being a really bad benchmark for comparing performance of the default implementation vs the overridden one. I consistently get slightly faster results for the overridden methods:

[info] TraverseListCheck:
[info] - Traverse[List].zipWithIndex (93 milliseconds)
[info] - Traverse[List].mapWithIndex (25 milliseconds)
[info] - Traverse[List].traverseWithIndex (38 milliseconds)

[info] TraverseListCheckUnderlying:
[info] - Traverse[List (underlying)].zipWithIndex (129 milliseconds)
[info] - Traverse[List (underlying)].mapWithIndex (38 milliseconds)
[info] - Traverse[List (underlying)].traverseWithIndex (32 milliseconds)

My benchmarking warmup consists of repeatedly running the one test over and over until the values settle ¯\(ツ)/¯.

Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Just one minor comment about naming, but I don't have a better suggestion at this moment.


// proxies a traverse instance so we can test default implementations
// to achieve coverage using default datatype instances
private def proxyTraverse[F[_]: Traverse]: Traverse[F] = new Traverse[F] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nice.

* Akin to [[traverse]], but also provides the value's index in
* structure F when calling the function.
*/
def traverseWithIndex[G[_], A, B](fa: F[A])(f: (A, Int) => G[B])(implicit G: Monad[G]): G[F[B]] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only slight reservation with the name here is that traverse takes an Applicative[G] and implies often some parallelism, where this is sequential and requires Monad[G] since the applicative for StateT still requires Monad[G] (I think).

Copy link
Contributor Author

@andyscott andyscott Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that I had to require Monad. I suppose this could also be implemented with just Applicative if it's done with two passes on the underlying structure-- traverse(zipWithIndex(f))(ai => fa(ai._1, ai._2)). But I also don't like having to make two passes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard my previous comment that you can't do this. You can use the Group[Int] and WriterT instead of State.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can relax to Applicative that would be ideal.

Copy link
Contributor Author

@andyscott andyscott Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't sort out how to make it work with WriterT. What did you have in mind @edmundnoble?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I can't find a way to do this. I can calculate the length though :P.

@@ -82,6 +82,12 @@ trait VectorInstances extends cats.kernel.instances.VectorInstances {
G.map2Eval(f(a), lgvb)(_ +: _)
}.value

override def mapWithIndex[A, B](fa: Vector[A])(f: (A, Int) => B): Vector[B] =
fa.view.zipWithIndex.map(ai => f(ai._1, ai._2)).toVector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we benchmark with fa.iterator.zipWithIndex.map(..., I think .iterator may be faster than .view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An official benchmark in the benchmark submodule? Or will a quick local benchmark work, updating the code according?

@andyscott
Copy link
Contributor Author

andyscott commented Jul 14, 2017

@johnynek I renamed traverseWithIndex to traverseWithIndexM and added a note about traversing with index using applicative. I also switched .view to .iterator after a bit of investigation and more trivial benchmarking locally.

@kailuowang
Copy link
Contributor

merged with 3 sign-offs

@kailuowang kailuowang merged commit 27a7399 into typelevel:master Jul 19, 2017
@kailuowang kailuowang modified the milestone: 1.0.0-MF Jul 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants