-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distinct method on NonEmptyList and NonEmptyVector #1243
Conversation
LGTM super useful! |
/** | ||
* Remove duplicates. Duplicates are checked using `Order[_]` instance. | ||
*/ | ||
def distinct(implicit O: Order[A]): NonEmptyList[A] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only question: there is an O(N^2)
algorithm that uses only Eq[A]
I wonder is it makes sense to also implement that?
we could use something like: https://github.com/non/algebra/blob/master/core/src/main/scala/algebra/Priority.scala
(or we could use Xor
for that) and have something like:
def distinct(implicit oe: Xor[Eq[A], Order[A]]): NonEmptyList[A] = oe match {
case Xor.Right(ord) => // do the tree set which is `O(log N)` per check
case Xor.Left(eq) => // do a "listset" approach of checking each item, this incurs `O(N)` per check
}
If we had a Hash[A]
type that potentially extended Eq[A]
we could even have something like:
def distinct(implicit oe: Xor[Eq[A], Xor[Order[A], Hash[A]]]): NonEmptyList[A]
which would prefer to use hash sets, then tree sets, then list sets.
This is perhaps best served with different method names so readers can be more clear which complexity they get, but it is interesting that the semantics of the method don't care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Priority
recursive, i.e. can support > 2 implicits? Regarding hash sets and Hash
typeclass, since hash sets implementations in Scala are based on Object.hashcode()
it would require some wrapping of the elements, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps best served with different method names so readers can be more clear which complexity they get
I strongly agree with this statement. Especially since IntelliJ has a nasty habit of telling people that imports are unused when they are only used to bring in implicit instances. It would be really unfortunate if deleting an import brought the Hash
or Order
instance out of scope and some code went from O(n)
to O(n^2)
.
I don't feel a strong need to add an O(n^2)
version that requires Eq[A]
instead of Order[A]
, because O(n^2)
operations are often impractical, and I suspect that most of the cases in which you want to do something like this you probably can have an Order[A]
available. I could definitely see some value in a hash-based approach, but that's a higher barrier given what's currently in Cats.
Personally I'm pretty happy to go forward with this approach and leave open the possibility of another approach in the future.
Current coverage is 90.60% (diff: 100%)@@ master #1243 diff @@
==========================================
Files 243 243
Lines 3288 3298 +10
Methods 3231 3237 +6
Messages 0 0
Branches 54 58 +4
==========================================
+ Hits 2978 2988 +10
Misses 310 310
Partials 0 0
|
I agree with @ceedubs -- for now let's just have |
# Conflicts: # tests/src/test/scala/cats/tests/NonEmptyVectorTests.scala
I've resolved conflicts. Let's merge this before any other appear. Frankly, it's counter-motivating for contributing when a PR hangs for some weeks with no reason. |
👍 sorry for the delay. |
I apologize in advance @Tvaroh for possible further delay but I have a thought that I have to speak out. |
can we merge this one and add an issue to maybe make a generic one using |
Will be happy to generify this if needed. |
So, I think something like: def distinct[A](f: F[A])(implicit ord: Order[A], mk: MonoidK[F]): F[A] =
on def distinct[A](f: F[A])(implicit ord: Order[A], semigroup: SemigroupK[F]): F[A] on |
actually, since you can't promise the I think we should separate the generic versions, from this PR. The ones in the current PR are very efficient, which is nice. |
I agree (about separating generification from this PR). |
👍 . I am okay with having a separate issue for generic distinct. |
Thank you. Should I create an issue to discuss generic stuff there? |
@johnynek I think we also need val as: F[A] = ??? // distinct values
mk.combineK(as, A.pure(a))) // in foldLeft body Not sure if it could be made more efficient. |
@Tvaroh there is |
def distinct[A](fa: F[A])(implicit O: Order[A], F: MonadCombine[F]): F[A] = {
implicit val ord = O.toOrdering
val (_, result) = foldLeft(fa, (TreeSet.empty[A], F.empty[A])) { case (acc@(elementsSoFar, as), a) =>
if (elementsSoFar(a)) acc else (elementsSoFar + a, F.combineK(as, F.pure(a)))
}
result
} |
Though we don't have |
yeah, no maybe we are barking up the wrong tree here. |
Also |
Initially submitted as #1240 (see the discussion there for details).
Adds
distinct
method toNonEmptyList
andNonEmptyVector
that usesOrder
typeclass instance to keep track of duplicates (Scala's immutableTreeSet
under the hood).