Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complement type, possible approach to #1032. #4892

Closed
wants to merge 1 commit into from

Conversation

StefanKarpinski
Copy link
Member

Paired on this with @punkrockpolly, see:

https://github.com/punkrockpolly/Playing-with-Julia/blob/master/negatedindex.jl

The premise of the Complement type is that it abstracts the complement of a collection – primarily that

x in c <=> !(x in c.collection)

This commit punts on indexing for more than two dimensions because that turns out to be an incredibly invasive change to huge amounts the multidimensional array indexing code.

Paired on this with @punkrockpolly, see:

https://github.com/punkrockpolly/Playing-with-Julia/blob/master/negatedindex.jl

The premise of the Complement type is that it abstracts the
complement of a collection – primarily  that

	`x in c` <=> `!(x in c.collection)`

This commit punts on indexing for more than two dimensions because
that turns out to be an incredibly invasive change to huge amounts
the multidimensional array indexing code.
@JeffBezanson
Copy link
Member

I think the only reasonable way to do this is as I described in #1032. Currently all the indexing code simply iterates over index objects (for i in idx), and instead we'd have to iterate over intersect(idx, 1:size(A,n) (after bounds checks).

Other than that, the idea of a collection that contains every possible value except certain ones scares me a little.

@StefanKarpinski
Copy link
Member Author

It's not really a collection – it's not iterable. The only thing you can really do with it is check for containment. The only reason I stuck it in collections is because it wraps collection objects.

@JeffBezanson
Copy link
Member

Well, that's fair, I can get over my fear :)

I love the idea of using intersect for this, but it's not quite perfect since ideally it could handle all "indexification" tasks including converting boolean arrays with find, and converting float to int, etc.

@StefanKarpinski
Copy link
Member Author

bump. I'd like to merge this and see how it pans out.

@nalimilan
Copy link
Member

For me it's a big +1. For NamedArrays, we would also add !(s::String) = Complement(s) to complement the AbstractArray method, and that would fit very nicely in the general design.

@StefanKarpinski
Copy link
Member Author

Agreed. I originally had that in there and took it out to keep the PR minimal. Easy to add back.

@StefanKarpinski
Copy link
Member Author

We could ditch Colon and render a bare : as Complement([]) instead.

@nalimilan
Copy link
Member

This pull request lacks support for indexing general Arrays with Complement objects. Something like this provides the general support for it (since Complement can be mixed with other types...):

indices(v::AbstractArray, dim::Integer, I::Union(Real, AbstractVector)) = I
indices(v::AbstractArray, dim::Integer, I::Complement) = getindex(v, filter(i -> in(i, I), 1:size(v, dim)))

getindex(v::AbstractArray, I::Union(Real, AbstractVector, Complement)...) = getindex(v, [indices(v, dim, I[dim]) for dim in 1:length(I)]...)

But:

  1. Is that the best way to do this?
  2. How can we generate efficient versions of all the (Real, AbstractVector, Complement) combinations for two and three dimensions arrays? We will also need this in NamedArrays, with an additional complication due to the special handling of String indexing.

@StefanKarpinski
Copy link
Member Author

Yes, indexing code needs a lot of work to make this work more generically. @punkrockpolly and I started down that path but it didn't look like there was any end in sight, so we just kind of gave up. At least by having the Complement type, limited uses of negated indexing could start to be made to work, e.g. in DataFrames.

[pao: fix typo in @punkrockpolly's username]

@nalimilan
Copy link
Member

OK, I see. High-performance functions can probably wait for even longer then.

@punkrockpolly
Copy link

any interest in making this work for a limited number of dimensions? (pending a rewrite of general array indexing off in the distant future)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants