Add distinct method on NonEmptyList and NonEmptyVector #1243

Tvaroh · 2016-07-29T11:56:13Z

Initially submitted as #1240 (see the discussion there for details).

Adds distinct method to NonEmptyList and NonEmptyVector that uses Order typeclass instance to keep track of duplicates (Scala's immutable TreeSet under the hood).

frosforever · 2016-07-29T13:34:36Z

LGTM super useful!

johnynek · 2016-07-29T18:00:19Z

core/src/main/scala/cats/data/NonEmptyList.scala

+  /**
+   * Remove duplicates. Duplicates are checked using `Order[_]` instance.
+   */
+  def distinct(implicit O: Order[A]): NonEmptyList[A] = {


only question: there is an O(N^2) algorithm that uses only Eq[A] I wonder is it makes sense to also implement that?

we could use something like: https://github.com/non/algebra/blob/master/core/src/main/scala/algebra/Priority.scala

(or we could use Xor for that) and have something like:

def distinct(implicit oe: Xor[Eq[A], Order[A]]): NonEmptyList[A] = oe match { case Xor.Right(ord) => // do the tree set which is `O(log N)` per check case Xor.Left(eq) => // do a "listset" approach of checking each item, this incurs `O(N)` per check }

If we had a Hash[A] type that potentially extended Eq[A] we could even have something like:

def distinct(implicit oe: Xor[Eq[A], Xor[Order[A], Hash[A]]]): NonEmptyList[A]

which would prefer to use hash sets, then tree sets, then list sets.

This is perhaps best served with different method names so readers can be more clear which complexity they get, but it is interesting that the semantics of the method don't care.

Is Priority recursive, i.e. can support > 2 implicits? Regarding hash sets and Hash typeclass, since hash sets implementations in Scala are based on Object.hashcode() it would require some wrapping of the elements, wouldn't it?

This is perhaps best served with different method names so readers can be more clear which complexity they get

I strongly agree with this statement. Especially since IntelliJ has a nasty habit of telling people that imports are unused when they are only used to bring in implicit instances. It would be really unfortunate if deleting an import brought the Hash or Order instance out of scope and some code went from O(n) to O(n^2).

I don't feel a strong need to add an O(n^2) version that requires Eq[A] instead of Order[A], because O(n^2) operations are often impractical, and I suspect that most of the cases in which you want to do something like this you probably can have an Order[A] available. I could definitely see some value in a hash-based approach, but that's a higher barrier given what's currently in Cats.

Personally I'm pretty happy to go forward with this approach and leave open the possibility of another approach in the future.

codecov-io · 2016-08-03T20:07:26Z

Current coverage is 90.60% (diff: 100%)

Merging #1243 into master will increase coverage by 0.02%

@@             master      #1243   diff @@
==========================================
  Files           243        243          
  Lines          3288       3298    +10   
  Methods        3231       3237     +6   
  Messages          0          0          
  Branches         54         58     +4   
==========================================
+ Hits           2978       2988    +10   
  Misses          310        310          
  Partials          0          0

Powered by Codecov. Last update 3fb3ed3...5f2483f

non · 2016-08-12T16:52:21Z

I agree with @ceedubs -- for now let's just have .distinct require Order and go from there.

# Conflicts: # tests/src/test/scala/cats/tests/NonEmptyVectorTests.scala

Tvaroh · 2016-08-12T17:30:44Z

I've resolved conflicts. Let's merge this before any other appear. Frankly, it's counter-motivating for contributing when a PR hangs for some weeks with no reason.

johnynek · 2016-08-12T17:40:06Z

👍

sorry for the delay.

kailuowang · 2016-08-12T17:52:05Z

I apologize in advance @Tvaroh for possible further delay but I have a thought that I have to speak out.
Does the duplication between NonEmptyList and NonEmptyVector bother anyone other than me? Could this be implemented as in Reducible? Or does it call for a CanBuildFrom 😭 ? I mean for now the duplication looks minor but what if people keep finding other useful scala.collection methods? I just want to make sure that we didn't accidentally start our own collection lib without planing ahead.

johnynek · 2016-08-12T18:04:38Z

can we merge this one and add an issue to maybe make a generic one using Foldable and MonoidK? Would that be okay @kailuowang ?

Tvaroh · 2016-08-12T18:08:12Z

Will be happy to generify this if needed.

johnynek · 2016-08-12T18:12:31Z

So, I think something like:

  def distinct[A](f: F[A])(implicit ord: Order[A], mk: MonoidK[F]): F[A] =

on Foldable[F] and maybe a:

def distinct[A](f: F[A])(implicit ord: Order[A], semigroup: SemigroupK[F]): F[A]

on Reducible[F] is possible but it will be less efficient than the ones you have here, so we should specialize the implementation for Reducible[NonEmptyList] and Reducible[NonEmptyVector] (and probably Foldable[List] Foldable[Vector]).

johnynek · 2016-08-12T18:23:24Z

actually, since you can't promise the MonoidK is a certain form, I don't think you can ever be really efficient there.

I think we should separate the generic versions, from this PR. The ones in the current PR are very efficient, which is nice.

Tvaroh · 2016-08-12T18:24:26Z

I agree (about separating generification from this PR).

kailuowang · 2016-08-12T18:28:06Z

👍 . I am okay with having a separate issue for generic distinct.

Tvaroh · 2016-08-12T18:34:59Z

Thank you. Should I create an issue to discuss generic stuff there?

Tvaroh · 2016-08-12T18:53:13Z

@johnynek I think we also need A: Applicative[F] implicit to be able to do:

val as: F[A] = ??? // distinct values
mk.combineK(as, A.pure(a))) // in foldLeft body

Not sure if it could be made more efficient.

johnynek · 2016-08-12T18:55:09Z

@Tvaroh there is MonadCombine. Can we think of a lawful way do define ApplicativeCombine?

Tvaroh · 2016-08-12T18:58:56Z

MonadCombine is fine, thanks. So, on Foldable it could be something like this:

def distinct[A](fa: F[A])(implicit O: Order[A], F: MonadCombine[F]): F[A] = {
  implicit val ord = O.toOrdering

  val (_, result) = foldLeft(fa, (TreeSet.empty[A], F.empty[A])) { case (acc@(elementsSoFar, as), a) =>
    if (elementsSoFar(a)) acc else (elementsSoFar + a, F.combineK(as, F.pure(a)))
  }

  result
}

Tvaroh · 2016-08-12T19:05:12Z

Though we don't have MonadCombine for NonEmptyList or NonEmptyVector.

johnynek · 2016-08-12T19:06:26Z

yeah, no empty. We need something like MonadSemigroupK....

maybe we are barking up the wrong tree here.

Tvaroh · 2016-08-12T19:07:29Z

Also MonoidK.empty cannot be implemented for "non empty" collections.

Add distinct method on NonEmptyList and NonEmptyVector

56a375e

Tvaroh mentioned this pull request Jul 29, 2016

Add distinct method to NonEmptyList and NonEmptyVector. #1240

Closed

johnynek reviewed Jul 29, 2016
View reviewed changes

ceedubs closed this Aug 3, 2016

ceedubs reopened this Aug 3, 2016

ceedubs added the in progress label Aug 3, 2016

Merge branch 'master' into distinct-on-nel

5f2483f

# Conflicts: # tests/src/test/scala/cats/tests/NonEmptyVectorTests.scala

johnynek merged commit cb7e2df into typelevel:master Aug 12, 2016

stew removed the in progress label Aug 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add distinct method on NonEmptyList and NonEmptyVector #1243

Add distinct method on NonEmptyList and NonEmptyVector #1243

Tvaroh commented Jul 29, 2016

frosforever commented Jul 29, 2016

johnynek Jul 29, 2016

Tvaroh Jul 30, 2016

ceedubs Jul 30, 2016

codecov-io commented Aug 3, 2016 •

edited

Loading

non commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016

kailuowang commented Aug 12, 2016 •

edited

Loading

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016 •

edited

Loading

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

kailuowang commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016 •

edited

Loading

Add distinct method on NonEmptyList and NonEmptyVector #1243

Add distinct method on NonEmptyList and NonEmptyVector #1243

Conversation

Tvaroh commented Jul 29, 2016

frosforever commented Jul 29, 2016

johnynek Jul 29, 2016

Choose a reason for hiding this comment

Tvaroh Jul 30, 2016

Choose a reason for hiding this comment

ceedubs Jul 30, 2016

Choose a reason for hiding this comment

codecov-io commented Aug 3, 2016 • edited Loading

Current coverage is 90.60% (diff: 100%)

non commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016

kailuowang commented Aug 12, 2016 • edited Loading

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016 • edited Loading

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

kailuowang commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

Tvaroh commented Aug 12, 2016

johnynek commented Aug 12, 2016

Tvaroh commented Aug 12, 2016 • edited Loading

codecov-io commented Aug 3, 2016 •

edited

Loading

kailuowang commented Aug 12, 2016 •

edited

Loading

johnynek commented Aug 12, 2016 •

edited

Loading

Tvaroh commented Aug 12, 2016 •

edited

Loading