Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should symdiff care about item multiplicity? #34686

Closed
StefanKarpinski opened this issue Feb 6, 2020 · 6 comments
Closed

should symdiff care about item multiplicity? #34686

StefanKarpinski opened this issue Feb 6, 2020 · 6 comments
Labels
bug Indicates an unexpected problem or unintended behavior collections Data structures holding multiple items, e.g. sets minor change Marginal behavior change acceptable for a minor release

Comments

@StefanKarpinski
Copy link
Member

See #34675. It came up that symdiff([1, 2, 3, 2]) == [1, 3] which struck me as... unexpected. I'm opening an issue to discuss whether this is a behavior we actually want.

@StefanKarpinski StefanKarpinski added needs decision A decision on this change is needed collections Data structures holding multiple items, e.g. sets minor change Marginal behavior change acceptable for a minor release labels Feb 6, 2020
@pcjentsch
Copy link
Contributor

I feel like this is a correctness issue that should addressed sooner rather than later. I ran into it back in march.

At minimum, I think the documentation should address that the results of this function will not be correct if the arguments have duplicates.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label May 16, 2022
@oscardssmith
Copy link
Member

triage agrees that this is a bug.

@StefanKarpinski
Copy link
Member Author

This example is just nonsense, for example:

julia> symdiff([1, 2, 2], [2, 3, 3, 4])
3-element Vector{Int64}:
 1
 2
 4

@StefanKarpinski
Copy link
Member Author

To expand on that a bit: every other set function that accepts vectors treats them as sets in the sense that an item is in the set if and only if it's in the vector, disregarding multiplicity; why should symdiff be the only one that cares about multiplicity? It's also weird that symdiff as currently implemented doesn't care about which input something is in, only how many times it occurs total in its inputs.

@pcjentsch
Copy link
Contributor

pcjentsch commented Jul 5, 2022

@StefanKarpinski to clarify, you are saying that the behaviour should be:

symdiff(a::AbstractVector, b::AbstractVector) = collect(symdiff(Set(a), Set(b)))

right?

Not that this is a good implementation, just a correct one.

@JeffBezanson JeffBezanson added bug Indicates an unexpected problem or unintended behavior and removed needs decision A decision on this change is needed triage This should be discussed on a triage call labels Aug 4, 2022
@JeffBezanson
Copy link
Member

Yes that is correct. Might be nice for it to preserve order, but that's more of a detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior collections Data structures holding multiple items, e.g. sets minor change Marginal behavior change acceptable for a minor release
Projects
None yet
Development

No branches or pull requests

4 participants