RFC: make indexable collection API more uniform (keys, values, pairs) #22907

JeffBezanson · 2017-07-21T20:58:53Z

The first commit adds pairs, which is supposed to return a key-value iterator for any indexable collection, and adds methods of keys and values for arrays and sets.

The second commit is just a demo of this, generalizing findmin and findmax so that the identity findmax(array) == findmax(Dict(pairs(array))) holds. This is a breaking change, since previously these functions only returned indices 1:n.

Discussion questions:

Do we want fallbacks values(x) = x and (maybe) keys(x) = countfrom(1)? This would allow us to keep the old behavior of findmin on generic iterators with the same code. Or, we need some way to identify iterators that support keys and values, or we could decide that findmin only works on indexable collections.
keys for a 1-d array returns an integer range, but for other arrays returns a CartesianRange. Seems ok to me?
Do we want CartesianIndex to be the "official" array index type, or should we deprecate it and just use tuples?

tkelman · 2017-07-21T21:30:12Z

How is pairs any different than enumerate ?

JeffBezanson · 2017-07-21T21:32:42Z

enumerate(x) is exactly equivalent to zip(countfrom(1), x); it does not look at the indices of x when x is indexable.

timholy · 2017-07-22T12:13:18Z

Do we want CartesianIndex to be the "official" array index type, or should we deprecate it and just use tuples?

How do you feel about supporting t1 + t2 and t1 - 1? If you're reluctant to do that, then we need to keep CartesianIndex.

timholy · 2017-07-22T12:30:55Z

base/set.jl

@@ -8,6 +8,9 @@ mutable struct Set{T} <: AbstractSet{T}
 end
 Set() = Set{Any}()

+# a set iterates its values
+values(s::AbstractSet) = s


Is keys(s) supposed to return an error?

getindex is

So the new findmin now errors on sets? I think that makes sense.

andyferris · 2017-07-22T22:21:01Z

Would it be better to use indices or something to match getindex?

JeffBezanson · 2017-08-24T04:59:12Z

So I think keys and eachindex are basically the same function, the only potential difference being that eachindex returns "efficient" indices that might differ from "canonical" indices. This difference is a bit unfortunate. If we keep both, the key question is what kind of indices to return from functions like indmin. I would favor using the official CartesianIndex indices for all such purposes, and reserving eachindex as an optimization for when the indices won't be observed. @timholy @mbauman does that make sense?

nalimilan · 2017-08-24T09:42:54Z

If we keep both, the key question is what kind of indices to return from functions like indmin. I would favor using the official CartesianIndex indices for all such purposes, and reserving eachindex as an optimization for when the indices won't be observed.

That's an issue that affects all find/search functions, not only indmin/indmax. Currently all these functions return a linear index, which is convenient for general iterables because it's just an integer value. OTOH returning a cartesian index for AbstractArray objects would mean that these functions would also have to return one-element tuples or CartesianIndex objects even for one-dimensional arrays and other iterables, which would be quite inconvenient.

See also #14086 (comment) and following comments, and the short mention about this (second bullet of the section) in the Search & Find Julep.

mbauman · 2017-08-24T16:32:12Z

My gut reaction would be to:

Keep CartesianIndex objects. They're very handy.
Have keys always return integers for vectors.
Have keys always return CartesianIndices for multidimensional arrays.
Use eachindex to get the efficient index type for multidimensional arrays when you don't care what that index type is.

I think this sort of scheme would be okay propagating through to the find/etc APIs. You would opt-in to a linear index for a multidimensional array by simply doing a findmax(reshape(A, :)).

JeffBezanson · 2017-08-24T18:45:09Z

I would be ok with that.

We could also add something like Indexed(keys, vals), which lets you pair an index iterator with a value iterator for the purpose of e.g. calling indmin on an arbitrary iterator. Or this could be accomplished via

pairs(x::Zip2) = x

allowing zip to be used for this purpose.

Related: #20684

JeffBezanson · 2017-08-24T21:06:13Z

OK, I'm starting to get a handle on this. Here's what I did:

keys returns an integer range for vectors and a CartesianRange for n-d arrays
Make keys the basic function for getting the indices of something. eachindex falls back to it, with other methods as optimizations.
Use @timholy 's code for enumerate to implement pairs more efficiently for arrays, and make enumerate(::IndexStyle, a) a method of pairs instead.
keys can also accept an IndexStyle argument, in which case it's the same as eachindex.
findmin, findmax, indmin, and indmax return the same values as keys

However, I have not tackled the findmin(A, region) case yet. I assume this should change too?

timholy · 2017-08-24T21:12:55Z

You could probably get rid of linearindices in favor of keys. And I favor changes to findmin to return the same thing as eachindex.

JeffBezanson · 2017-08-24T21:20:30Z

You could probably get rid of linearindices in favor of keys.

They don't do the same thing --- linearindices always returns an iterator that yields integers.

And I favor changes to findmin to return the same thing as eachindex.

This is a possible design, but it conflicts with a generic definition of findmin in terms of keys and values (or pairs), unless we fully merge eachindex and keys. It would be nice to have fewer functions, but I worry about using linear indices too much, and getting different types of indices if you change the representation of an array.

timholy · 2017-08-24T21:55:19Z

Right on linearindices. I had someone missed that keys would return CartesianRange for n-d.

While you're at this and thinking about enumerate, any thoughts on the annoyance that arose from the fact that enumerate once returned something that was both a counter and a valid index, but now that array indexing is more flexible you have to distinguish between these? #17978 and linked PR/issue therein. It seems terrible to have two different functions that do almost the same thing, yet the current state is a little weird, too.

JeffBezanson · 2017-08-24T22:06:43Z

I think enumerate(x) is just zip(countfrom(1), x), and should perhaps be deprecated to that (except that it's convenient and quite heavily used). Callers that use the integer as an index should use pairs instead.

andyferris · 2017-08-25T11:48:08Z

To offer a dissenting voice here - have we considered going the other direction, making indices better and removing keys? I mean, we have getindex on an Associative, so why don't Associatives have indices?

Or, another way to think of this is that If the set of things that I can index with are called keys, then surely I would get/set a value via getkey and setkey! rather than getindex/setindex!. However, I feel that "indices" are a better generic term to describe the indices of multidimensional arrays, associatives, and other containers. For whatever reason, "keys" gives me the distinct feel of hash maps and database tables, or else encryption.

Note that I do strongly support that we unify associatives and arrays and other containers that use getindex in some way. I also feel that indices(multidimensionalarray) should have always been some iterable of actual indices like CartesianRange.

StefanKarpinski · 2017-08-25T15:00:20Z

I think enumerate(x) is just zip(countfrom(1), x), and should perhaps be deprecated to that (except that it's convenient and quite heavily used).

Please don't do that – enumerate is so useful in working with collections, and it's a nice little bit of familiarity for Python users that I think it would would be really sad to lose. Aside: I've started using enumerate in a handy pattern for printing separators between collection elements:

for (i, x) in enumerate(itr)
    i > 1 && print(io, ", ")
    print(io, x)
end

It doesn't require the length of the iterable, which the way I was doing this before did.

JeffBezanson · 2017-08-25T15:01:12Z

I agree, I would be perfectly happy to rename keys to indices. (see #23434)

Especially with dictionaries, people very often use "key/value" terminology, so it's nice to have keys and values as a pair of functions. However, keys could just be an alias for indices. But of course, if dictionaries iterated their values then we wouldn't need values at all.

JeffBezanson · 2017-08-25T15:02:08Z

I agree re: enumerate as well; I use it a lot and would not want to actually deprecate it.

iamed2 · 2017-08-25T15:08:48Z

Idea: move enumerate to Base.Iterators. That way it's clear that it's a generic iterator function and not related to indices.

JeffBezanson · 2017-09-01T16:24:10Z

OK, I think this is ready to go. The function is still called keys, but we can decide separately on renaming it.

I've updated all findmin and findmax methods to return CartesianIndex for non-vectors.

StefanKarpinski · 2017-09-01T19:04:35Z

What's up with the build failures?

JeffBezanson · 2017-09-06T04:45:54Z

In a way that's a good sign, since it means the new behavior gives what that code actually wanted, without needing to call ind2sub.

martinholters · 2017-09-11T06:44:53Z

Would it make sense to add a deprecated ind2sub(::Tuple, ::CartesianIndex) to make this less breaking?

index

Makes the return type change of #22907 less breaking by allowing the common pattern `ind2sum(size(a), indmax(a))` to still work.

Makes the return type change of #22907 less breaking by allowing the common pattern `ind2sub(size(a), indmax(a))` to still work.

…23708) Makes the return type change of #22907 less breaking by allowing the common pattern `ind2sub(size(a), indmax(a))` to still work.

Jutho · 2017-10-13T09:26:35Z

+1 for this change but type inference on (f)indmin for tuples is completely lost. This should clearly be fixed (and tested).
v0.6:

julia> @code_typed indmin((1,2,3,4))
CodeInfo(:(begin 
        SSAValue(0) = $(Expr(:invoke, MethodInstance for findmin(::NTuple{4,Int64}), :(Base.findmin), :(a)))
        return (Base.getfield)(SSAValue(0), 2)::Int64
    end))=>Int64

v0.7:

julia> @code_typed indmin((1,2,3,4))
CodeInfo(:(begin
        SSAValue(0) = $(Expr(:invoke, MethodInstance for findmin(::NTuple{4,Int64}), :(Base.findmin), :(a)))::Tuple{Any,Any}
        return (Base.getfield)(SSAValue(0), 2, true)
    end)) => Any

Jutho · 2017-10-13T09:38:59Z

Ok so I looked at the new implementation of this PR. The obvious solution is to also use the IndexValue iterator for tuples. Is the type bound on A in IndexValue{I,A<:AbstractArray} really necessary?

A second question about the following lines:

pairs(::IndexLinear,    A::AbstractArray) = IndexValue(A, linearindices(A))
pairs(::IndexCartesian, A::AbstractArray) = IndexValue(A, CartesianRange(indices(A)))

# faster than zip(keys(a), values(a)) for arrays
pairs(A::AbstractArray)  = pairs(IndexCartesian(), A)
pairs(A::AbstractVector) = pairs(IndexLinear(), A)

Why not just pairs(A::AbstractArray) = IndexValue(A, eachindex(A)). Or does the eachindex iterator need to go away in time?

Jutho · 2017-10-13T09:46:53Z

Actually also for Dicts the current behavior, while nicer than the previous, is completely type unstable. The use of generators for this does not seem like a good idea, unless the type inference of generators can be fixed. In general I find myself not using generators as much as I would have liked because of the associated type instabilities they bring along.

Jutho · 2017-10-13T10:31:00Z

Ok here are a few possible solutions that seem to work: I can prepare a PR for any of them, once a concensus is reached:

Solution 1:
Replace

pairs(collection) = Generator(=>, keys(collection), values(collection))

with

pairs(collection) = Generator(=>, zip(keys(collection), values(collection)))

and add a definition Pair(p::Tuple{Any,Any}) = Pair(p[1],p[2])

This seems to fix the type instability with dicts and tuples.

Solution 2:
Widen type restriction in IndexValue iterator and make pairs(::Tuple) use IndexValue. Solves the problem for tuples, not for dicts, unless ...
Solution 3

Replace IndexValue iterator with a KeyValue iterator that is used for AbstractArray, Tuple and Dict (which corresponds to the goal of this PR to make these more uniform and reduce the distinction between them).

Jutho · 2017-10-14T21:06:30Z

Actually, solution 4 is even simpler:

pairs(collection) = (k=>v for (k,v) in zip(keys(collection), values(collection))

This looks like it should be identical to the current definition

pairs(collection) = Generator(=>, keys(collection), values(collection))

yet the former is inferable, the latter is not. Not sure why that is, but I expect that changing the definition to the first one is something everybody could agree on?

JeffBezanson · 2017-10-14T21:54:37Z

I think this can be fixed by adding an extra Generator constructor. I'll try it.

Jutho · 2017-10-14T21:57:06Z

I've just tried adding Generator(f,I1,I2) = ... explicitly and didn't work.

pairs(collection)  = Generator(kv->(kv[1]=>kv[2]), zip(keys(collection), values(collection)))

seems to work though. I was just preparing a PR.

reported in #22907

JeffBezanson · 2017-10-14T22:08:06Z

Should be fixed by #24145.

reported in #22907

GiggleLiu · 2018-09-26T03:21:46Z

So what should I use instead of ind2sub to get the true index?

StefanKarpinski · 2018-09-26T03:32:57Z

Please ask questions on the Julia discourse discussion forum.

JeffBezanson added the collections Data structures holding multiple items, e.g. sets label Jul 21, 2017

timholy reviewed Jul 22, 2017

View reviewed changes

JeffBezanson mentioned this pull request Jul 27, 2017

findfirst and findnext for general iterables #15755

Closed

JeffBezanson force-pushed the jb/keyvalue branch from 51be6ac to 4931ded Compare August 24, 2017 04:39

JeffBezanson added the arrays [a, r, r, a, y, s] label Aug 24, 2017

JeffBezanson force-pushed the jb/keyvalue branch 2 times, most recently from 25915e7 to 68ed301 Compare August 24, 2017 20:58

JeffBezanson force-pushed the jb/keyvalue branch from 68ed301 to 7c1c298 Compare September 1, 2017 16:22

JeffBezanson force-pushed the jb/keyvalue branch from 7c1c298 to 8a7f46e Compare September 1, 2017 18:17

JeffBezanson force-pushed the jb/keyvalue branch from 8a7f46e to 5f7efae Compare September 1, 2017 19:08

ararslan added the breaking This change will break code label Sep 6, 2017

ararslan mentioned this pull request Sep 6, 2017

ind2sub doc example is broken #23600

Closed

martinholters added a commit to HSU-ANT/ACME.jl that referenced this pull request Sep 13, 2017

Workaround JuliaLang/julia#22907, indmax no longer returning a linear

63cd5ec

index

martinholters added a commit to HSU-ANT/ACME.jl that referenced this pull request Sep 13, 2017

Workaround JuliaLang/julia#22907, indmax no longer returning a linear

52bef2f

index

martinholters added a commit to HSU-ANT/ACME.jl that referenced this pull request Sep 14, 2017

Workaround JuliaLang/julia#22907, indmax no longer returning a linear

ea29dfc

index

martinholters added a commit that referenced this pull request Sep 14, 2017

Add deprecated ind2sub(::NTuple{N,Integer}, ::CartesianIndex{N})

ab28882

Makes the return type change of #22907 less breaking by allowing the common pattern `ind2sum(size(a), indmax(a))` to still work.

martinholters mentioned this pull request Sep 14, 2017

Add deprecated ind2sub(::NTuple{N,Integer}, ::CartesianIndex{N}) #23708

Merged

martinholters added a commit that referenced this pull request Sep 21, 2017

Add deprecated ind2sub(::NTuple{N,Integer}, ::CartesianIndex{N})

5d322a9

Makes the return type change of #22907 less breaking by allowing the common pattern `ind2sub(size(a), indmax(a))` to still work.

mbauman pushed a commit that referenced this pull request Sep 22, 2017

Add deprecated ind2sub(::NTuple{N,Integer}, ::CartesianIndex{N}) (#…

942b843

…23708) Makes the return type change of #22907 less breaking by allowing the common pattern `ind2sub(size(a), indmax(a))` to still work.

andyferris mentioned this pull request Oct 6, 2017

Julep: Generalize indexing with and by Associatives #24019

Open

JeffBezanson added a commit that referenced this pull request Oct 14, 2017

fix inference of e.g. Generator(=>, x, y)

755012d

reported in #22907

JeffBezanson mentioned this pull request Oct 14, 2017

fix inference of e.g. Generator(=>, x, y) #24145

Merged

JeffBezanson added a commit that referenced this pull request Oct 16, 2017

fix inference of e.g. Generator(=>, x, y) (#24145)

c7cbebf

reported in #22907

iamed2 mentioned this pull request Dec 5, 2017

Implement pairs like PR 22907 JuliaLang/Compat.jl#428

Merged

bkamins mentioned this pull request Dec 23, 2017

Fix failure due to calling pairs() on keyword arguments on Julia 0.6 JuliaData/DataFrames.jl#1329

Merged

mbauman mentioned this pull request Feb 7, 2018

keys, eachindex and linearindices consistency #25901

Open

drewrobson mentioned this pull request Jun 27, 2018

Add a 2-arg flavor, argmax(f, domain) #27613

Closed

tkf mentioned this pull request Aug 24, 2019

RFC: AbstractSparseMatrixCSC interface #33054

Closed

4 tasks

RFC: make indexable collection API more uniform (keys, values, pairs) #22907

RFC: make indexable collection API more uniform (keys, values, pairs) #22907

Conversation

JeffBezanson commented Jul 21, 2017

tkelman commented Jul 21, 2017

JeffBezanson commented Jul 21, 2017

timholy commented Jul 22, 2017

timholy Jul 22, 2017

Choose a reason for hiding this comment

tkelman Jul 22, 2017

Choose a reason for hiding this comment

timholy Jul 22, 2017

Choose a reason for hiding this comment

andyferris commented Jul 22, 2017

JeffBezanson commented Aug 24, 2017

nalimilan commented Aug 24, 2017

mbauman commented Aug 24, 2017

JeffBezanson commented Aug 24, 2017

JeffBezanson commented Aug 24, 2017

timholy commented Aug 24, 2017

JeffBezanson commented Aug 24, 2017

timholy commented Aug 24, 2017

JeffBezanson commented Aug 24, 2017

andyferris commented Aug 25, 2017 • edited Loading

StefanKarpinski commented Aug 25, 2017

JeffBezanson commented Aug 25, 2017

JeffBezanson commented Aug 25, 2017

iamed2 commented Aug 25, 2017

JeffBezanson commented Sep 1, 2017

StefanKarpinski commented Sep 1, 2017

JeffBezanson commented Sep 6, 2017

martinholters commented Sep 11, 2017

Jutho commented Oct 13, 2017

Jutho commented Oct 13, 2017

Jutho commented Oct 13, 2017 • edited Loading

Jutho commented Oct 13, 2017

Jutho commented Oct 14, 2017

JeffBezanson commented Oct 14, 2017

Jutho commented Oct 14, 2017

JeffBezanson commented Oct 14, 2017

GiggleLiu commented Sep 26, 2018

StefanKarpinski commented Sep 26, 2018

andyferris commented Aug 25, 2017 •

edited

Loading

Jutho commented Oct 13, 2017 •

edited

Loading