Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require an Order instance for NonEmptyList's groupBy function #1964

Merged
merged 5 commits into from
Oct 13, 2017

Conversation

igstan
Copy link
Contributor

@igstan igstan commented Oct 11, 2017

This addresses: #1959

I didn't see a way to require just an Eq instance for B while preserving the performance characteristics. A mutable.Map uses more than just universal equality, it also uses the built-in JVM hashing. So, what I did was to switch to a mutable.TreeMap which requires a scala.math.Ordering instance, and we get that by constraining B to have a cats.Order instance.

@igstan igstan changed the title Require and Order instance for NonEmptyList's groupBy function Require an Order instance for NonEmptyList's groupBy function Oct 11, 2017
@kailuowang
Copy link
Contributor

kailuowang commented Oct 11, 2017

Hmmm, Order helps performance but is over strict. How about we use the newly added Hash which is what map uses anyway right?
we can use the hashcode from Hash for map keys.

@johnynek
Copy link
Contributor

We could use Hash but we don’t yet have containers that use Hash.

@kailuowang
Copy link
Contributor

@johnynek I suggested that we could use the hashcode from Hash as key in Map, the extra cost will be map calling the hashCode() on the int hashcode, WDYT?

@johnynek
Copy link
Contributor

we have to return a Map[K, NonEmptyList[V]] right? So if we key by the hash, how do we return that? Do you want to implement a little wrapper that wraps Map[Int, List[(K, NonEmptyList[V])]] and then fetches it out?

This seems a bit complex to me, but it could work. So something like:

class MapFromHashBuckets[K, V](h: Hash[K], m: Map[Int, List[(K, NonEmptyList[V])]) extends Map[K, V] {
  def get(k: K): Option[NonEmptyList[V]] = m.get(h.hash(k)).flatMap {
    case Nil => None
    case candidates => candidates.collectFirst { case (k0, vs) if h.eqv(k, k0) => vs}
  }
}

then we just need to implement the other 3 methods of Map.

I guess it might not be so bad if we have tests. Probably better to use LongMap or something to avoid boxing the Int.

@kailuowang
Copy link
Contributor

kailuowang commented Oct 11, 2017

Update: Please disregard. the two maps below aren't consistent as @igstan pointed out.

Not sure if we need to return a wrapper, since there is already an intermediate map of Map[B, Builder] , just need to do the conversion in the last step, something like

def groupBy[B](f: A => B)(implicit B: Hash[B]): Map[B, NonEmptyList[A]] = {
    val m = mutable.LongMap.empty[(B, mutable.Builder[A, List[A]])]
    for { elem <- toList } {
     val b = f(elem)
      m.getOrElseUpdate(B.hash(b), (b, List.newBuilder[A]))._2 += elem
    }
    val b = immutable.Map.newBuilder[B, NonEmptyList[A]]
    for { (k, bs) <- m.values } {
      val head :: tail = bs.result // we only create non empty list inside of the map `m`
      b += (k, NonEmptyList(head, tail)))
    }
    b.result
  }

Not sure if this' going to be more performant than the wrapper, but I don't know how I feel about adding tests against a Map either.

@igstan
Copy link
Contributor Author

igstan commented Oct 11, 2017

I was going to propose the exact same thing that @kailuowang did in the last snippet. However, I'm thinking now that whatever constraint we're going to put on B (Hash or Order), it must be used for both maps we're producing, mutable and immutable. In the last snippet, the immutable Map uses implicitly universal equality and hashing. I don't think we want that, right?

@kailuowang
Copy link
Contributor

@igstan ah you are right. I guess we have to return a wrapper or a new implementation of Map if we really want to do this.

@igstan
Copy link
Contributor Author

igstan commented Oct 11, 2017

@kailuowang I'm actually thinking of overriding elemHashCode for both kind of maps. That would allow us to use our own hashing function.

@kailuowang
Copy link
Contributor

kailuowang commented Oct 11, 2017

Sounds like we are moving into creating our own version of Map and MapBuilder just for this groupBy method. On one hand, these could be useful, on the other hand, to be honest, I am not sure if they should go to cats.core. stew/Dog already has a Map that uses Eq and Order. Maybe this groupBy method should be moved to Dog?

@johnynek
Copy link
Contributor

why don't we do the Order based one and punt the Hash based one until we have a Map that uses Hash.

@kailuowang
Copy link
Contributor

kailuowang commented Oct 11, 2017

The Order based one isn't consistent either, the map it returns uses universal hashing and equality.

@igstan
Copy link
Contributor Author

igstan commented Oct 11, 2017

Right. I can change to an immutable TreeMap for that as well.

@igstan
Copy link
Contributor Author

igstan commented Oct 11, 2017

BTW, I've tried the overriding version and can't do it because immutable.HashMap is sealed.

@johnynek
Copy link
Contributor

johnynek commented Oct 11, 2017 via email

@igstan
Copy link
Contributor Author

igstan commented Oct 11, 2017

I'm going to force-push the changes as soon as I finish running the tests locally.

@igstan
Copy link
Contributor Author

igstan commented Oct 11, 2017

Force-pushed.

@igstan
Copy link
Contributor Author

igstan commented Oct 12, 2017

Not sure why the build is failing, but seems to be a spurious one.

@LukaJCB
Copy link
Member

LukaJCB commented Oct 12, 2017

There doesn't seem to be a mutable.TreeMap in Scala 2.10.x or 2.11.x.

@igstan
Copy link
Contributor Author

igstan commented Oct 12, 2017

Oh, silly me. I didn't see that error message when I looked earlier. Suddenly this "low-hanging fruit" bug fix seems to be very high :)

@igstan
Copy link
Contributor Author

igstan commented Oct 12, 2017

Ok, so here's a summary of the issues so far.

If we want to go ahead with Order, we can't for Scala < 2.12 as there's no mutable.TreeMap, but there is an immutable.TreeMap.

If we want to go ahead with Hash, we can't for Scala >= 2.12, as we can't override immutable.HashMap to use our own hashing function. But we can do this override for mutable.HashMap.

Not sure how to reconcile these. I'm out of ideas right now.

@igstan
Copy link
Contributor Author

igstan commented Oct 12, 2017

Well, we could of course drop the mutable.TreeMap and build the immutable map directly. Not sure how much that will affect performance, though.

@kailuowang
Copy link
Contributor

That brings back my point of such map already exists in dog

@igstan
Copy link
Contributor Author

igstan commented Oct 12, 2017

Then maybe groupBy should be removed altogether from cats? And let people use toList if they want to groupBy on it?

@codecov-io
Copy link

codecov-io commented Oct 12, 2017

Codecov Report

Merging #1964 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1964      +/-   ##
==========================================
+ Coverage   96.07%   96.08%   +<.01%     
==========================================
  Files         273      273              
  Lines        4539     4541       +2     
  Branches      122      119       -3     
==========================================
+ Hits         4361     4363       +2     
  Misses        178      178
Impacted Files Coverage Δ
core/src/main/scala/cats/NonEmptyTraverse.scala 100% <ø> (ø) ⬆️
core/src/main/scala/cats/data/NonEmptyList.scala 100% <100%> (ø) ⬆️
core/src/main/scala/cats/syntax/list.scala 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46d0b17...2f0a165. Read the comment docs.

@igstan
Copy link
Contributor Author

igstan commented Oct 12, 2017

I've changed to use just an immutable TreeMap. I don't think we're losing anything with these changes. The previous version still had to functionally build an immutable map while traversing the mutable one. However, I kept the mutable List builder, so things should be even.


m.mapValues(v => NonEmptyList.fromListUnsafe(v.result))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, mapValues is by-name and full of surprises and not something you would ever want to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative would be to construct another TreeMap. Calling map instead of mapValues wouldn't work because we'd have to call toMap on it at the end, and that uses the universal equality and hashing we're trying to get rid off.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I never had problems with mapValues, which is why I used it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another surprise of .mapValues is that the result is not Serializable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use Functor[Map].map(m)(v => NonEmptyList.fromListUnsafe(v.result))? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, of course meant:

Functor[Map[B, ?]].map(m)(v => NonEmptyList.fromListUnsafe(v.result))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think was wrong when I said we can't use map instead of mapValues. The result of TreeMap.map is a TreeMap as well, which means the ordering is preserved. I'll change to that.

I don't think we need to redundant Functor[Map[B, ?]] summoning here.

LukaJCB
LukaJCB previously approved these changes Oct 12, 2017

m.map { case (k, v) => (k, NonEmptyList.fromListUnsafe(v.result)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we ascribe a type to this to make sure we are not converting to using hashCode and equals at the end?

 m.map { case (k, v) => (k, NonEmptyList.fromListUnsafe(v.result)) } : TreeMap[B, NonEmptyList[A]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. One further thought. Shall we also change the return type to SortedMap? That would explain why we require an Order instance for B.

* scala> val nel = NonEmptyList.of(12, -2, 3, -5)
* scala> nel.groupBy(_ >= 0)
* res0: Map[Boolean, cats.data.NonEmptyList[Int]] = Map(false -> NonEmptyList(-2, -5), true -> NonEmptyList(12, 3))
* }}}
*/
def groupBy[B](f: A => B): Map[B, NonEmptyList[A]] = {
val m = mutable.Map.empty[B, mutable.Builder[A, List[A]]]
def groupBy[B](f: A => B)(implicit B: Order[B]): Map[B, NonEmptyList[A]] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add to the scaladoc a brief description of why Order is required?

@johnynek
Copy link
Contributor

johnynek commented Oct 12, 2017 via email

johnynek
johnynek previously approved these changes Oct 12, 2017
kailuowang
kailuowang previously approved these changes Oct 12, 2017
@@ -27,6 +27,6 @@ final class ListOps[A](val la: List[A]) extends AnyVal {
* }}}
*/
def toNel: Option[NonEmptyList[A]] = NonEmptyList.fromList(la)
def groupByNel[B](f: A => B): Map[B, NonEmptyList[A]] =
def groupByNel[B : Order](f: A => B): Map[B, NonEmptyList[A]] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the type here also be SortedMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. Working on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... current status: yak-shaving inside NonEmptyTraverse doctests, where SortedMap needs an Apply instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed something that addresses this. I've fixed the NonEmptyTraverse doctests by upcasting to a vanilla Map for which there's an Apply instance. I'm not completely satisfied with this approach, but it's the best I've got right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making an Apply and Traversable for SortedMap is probably out of scope, so I think that's okay.

@igstan igstan dismissed stale reviews from kailuowang and johnynek via 2f0a165 October 12, 2017 22:22
@kailuowang kailuowang merged commit c121f4f into typelevel:master Oct 13, 2017
@kailuowang kailuowang added this to the 1.0.0-RC1 milestone Oct 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants