RFC: Change indexin sentinel to `nothing` #25662

garrison · 2018-01-20T22:31:04Z

This changes indexin's sentinel to nothing, following #25472 (comment). Also includes the improvements in #23845 (which this pull request now supercedes).

~~At the moment, inference seems to be broken for this, as the first example in the doctest results in Array{Any,1}, hence the WIP title.~~

nalimilan · 2018-01-20T23:01:29Z

It's not an inference issue, it's due to #25553. I guess an intermediate solution would be to pre-allocate the resulting array with the correct type.

nalimilan · 2018-01-20T23:11:32Z

BTW, regarding the support for any iterators evoked at #23845, you could use the recently-introduced _pairs function (see e.g. at #25655).

garrison · 2018-01-21T03:07:57Z

This PR (and #23845) generalize indexin so the first argument can be any iterable. If I understand you correctly, you are suggesting using _pairs to also allow the second argument to be any iterable (i.e., bdict = Dict(j => i for (i, j) in _pairs(b))). This could indeed be a possibility. One thing to note is that the phrase "highest index" in the docstring should be changed to "last index" as well, since what really matters is the order the indices are iterated in, not the relative ordering of their values.

nalimilan · 2018-01-21T15:02:54Z

If I understand you correctly, you are suggesting using _pairs to also allow the second argument to be any iterable (i.e., bdict = Dict(j => i for (i, j) in _pairs(b))).

Yes, but the other/main advantage of using _pairs instead of eachindex would be that the most "natural" index type would be returned (e.g. cartesian indices for matrices and higher-dimensional arrays instead of linear indices). This would be consistent with what other find* functions now do.

nalimilan · 2018-01-21T15:10:18Z

Also, it would make sense to add a third argument indicating the value to return when there is no match. It would default to nothing, but it could be set e.g. to 0, which is sometimes useful. R's equivalent match function supports it (and defaults to NA, which is kind of equivalent to nothing in the present case given that R doesn't have a scalar nothing). That would also make the replacement trivial to write in Compat.

While we're at it, I wonder whether returning the smallest/first index wouldn't be more natural than the highest/last. The only reason to return the last index seems to be that the implementation is somewhat easier (indeed it's really simple), but it shouldn't be hard nor slow to keep the first index if it's already in the dict. Thoughts?

garrison · 2018-01-21T16:46:28Z

It looks like indexin was originally meant to be a substitute for Matlab's ismember function, which itself returns the lowest index. I agree that it is more logical, the main downside being that it is an additional breaking change.

yurivish · 2018-01-21T17:02:31Z

Is it conceivable that an indexable container can be indexed by nothing?

garrison · 2018-01-21T17:38:23Z

Is it conceivable that an indexable container can be indexed by nothing?

I believe this would be addressed by @nalimilan's proposed third argument.

nalimilan · 2018-01-21T18:12:56Z

A dict could have nothing as a valid key, but I'm not sure what's the question.

yurivish · 2018-01-21T18:51:37Z

Sorry, I should have been clearer but didn't want to come across as presumptuous. 😄 I was trying to understand if there was any reason why using an Optional (Union{Nothing, Some{T}}) wouldn't make sense.

The idea would be to force the user to handle the case where the value isn't present explicitly via unwrapping, rather implicitly by getting an error at the point when a value they get is passed to a method that doesn't accept Nothing as an argument.

Unlike propagation with missing, the software engineer's null typically wants to be handled close to its origin, and forcing the user to unwrap a Some means that they've also explicitly considered what to do in the nothing case.

The idea of a third argument would also work, but it would be nice if all of the methods that have this kind of potential-for-null were handled in similar ways.

nalimilan · 2018-01-21T19:41:08Z

OK. The use of Some has been discussed at JuliaLang/Juleps#47, and we've decided against it because it would be too annoying when working with arrays. In general we're quite consistent in using simply Union{T, Nothing} everywhere, and Union{Some{T}, Nothing} only in cases where Nothing<:T.

yurivish · 2018-01-21T21:31:17Z

The conversation in that issue was enlightening. Thanks for the explanation and the link!

JeffBezanson · 2018-01-22T20:09:12Z

Should #23845 be closed?

JeffBezanson · 2018-01-22T20:12:52Z

base/array.jl


-julia> indexin(a,b)
+julia> indexin(a, b)
 6-element Array{Int64,1}:


Element type needs to change.

JeffBezanson · 2018-01-22T20:13:20Z

base/array.jl

-    bdict = Dict(zip(b, 1:length(b)))
-    [get(bdict, i, 0) for i in a]
+function indexin(a, b::AbstractArray)
+    bdict = Dict(zip(b, eachindex(b)))


Should use keys(b).

Or even _pairs, see my comment above.

pairs returns index=>value pairs; this needs them the other way around.

Ah, yes. I guess we could define _keys similar to _pairs, but for now using keys should be enough, we can always support non-AbstractVector arguments later.

garrison · 2018-01-26T03:04:17Z

Also, it would make sense to add a third argument indicating the value to return when there is no match.

Why do this for indexin but not for findfirst, findlast, findnext, and findprev? Perhaps it would make sense to add an optional final argument to all of these in a subsequent pull request?

While we're at it, I wonder whether returning the smallest/first index wouldn't be more natural than the highest/last.

In the interest of keeping this PR non-breaking (and non-controversial), I'm going to keep the current behavior here. But I would have no objection to this in a follow-up pull request.

nalimilan

Sure, let's go with that and discuss other changes after. I think the first match vs. last match change has the highest priority since it's breaking.

nalimilan · 2018-01-26T09:55:44Z

base/array.jl

-    [get(bdict, i, 0) for i in a]
+function indexin(a, b::AbstractArray)
+    bdict = Dict(zip(b, keys(b)))
+    map(i -> get(bdict, i, nothing), a)


Until #25553 this is going to return an Array{Any}, but I guess it's OK as a temporary situation. We could specify the element type in advance, but then it would always be Union{T, Nothing} even in the absence of nothing.

One benefit of map is that it returns a scalar if passed a scalar argument (see here for previous discussion). So I think it makes sense to keep this as is and rely on #25553.

Also, I would have expected tests to fail due to the (currently) inaccurate types in the doctests, but it seems these doctests are not actually being tested.

We could specify the element type in advance, but then it would always be Union{T, Nothing} even in the absence of nothing.

I (obviously) don't understand inference, but I would have expected/hoped that the return type would always be Union{T, Nothing}. (In particular, I changed the type of the second doctest as well so this would be the case, despite lack of nothing in the output.) If this is not expected, then the return type of this function will not be inferred, no?

Inference isn't involved here, we just use typejoin to widen the element type as new elements are encountered. So no, the element type will always reflect what the array actually contains. This decision has been made so that inference can change without affecting user-visible behavior. We could force always returning a Union{T, Nothing} array, but I'm not sure that's a good idea given that map behaves differently.

Seems like this may be an appropriate place to type-hint it as Union{valuetype(bdict), Nothing}

You mean, just adding a type assertion? Wouldn't inference be able to figure that anyway?

Though @StefanKarpinski seemed to have a different opinion. Do you really want that behavior? At first returning a scalar made sense to me too, but thinking a bit more about the purpose of this function it's not so clear.

Anyway it would be easy to add a special method for Number if we want.

Looking into this more, I'm honestly not sure why Number is treated differently than other data types for map, collect, etc. First, there is (at least in my opinion) an impedance mismatch between map and collect:

julia> map(i->i, 3) 3 julia> collect(3) 0-dimensional Array{Int64,0}: 3

Additionally, I find this funny because either of these operations on a Char would result in a 1-d Array.

Also pinging @mbauman, as he had the opinion as well that it should return a scalar. As for me, I'm no longer convinced that it should (or is worth worrying about).

Looks like Char is not HasShape even though ndims is defined for it.

As for me, I'm no longer convinced that it should (or is worth worrying about).

After re-reading that issue and this one, I think I would agree. It stinks that we've never managed to remove iteration from numbers, but that's the real problem here. We've papered over that in piecemeal cases as we've noticed it. So my 👍 was just to continue that "papering over," but I'm becoming less convinced that's the direction we should go. And of course, this ties into the big "what's a scalar" broadcast issue.

Better discuss this in a separate issue? I'd rather do whatever sounds reasonable here and see how we can find a consistent general pattern later.

garrison · 2018-01-28T22:23:14Z

Sure, let's further discuss the above outdated-diff thread in a separate issue.

nalimilan · 2018-02-11T21:56:37Z

I've filed #25998 to return the first rather than the last matching index.

andreasnoack · 2018-02-15T08:08:08Z

We need a NEWS entry for this silently breaking change

ararslan · 2018-02-15T19:45:58Z

#26067 for news

garrison added 3 commits January 20, 2018 17:12

Improve indexin docstring and tests

b73d122

Make indexin first argument accept any iterable

bf27146

Improve formatting of indexin doctest/example

8aca8ce

garrison added the search & find The find* family of functions label Jan 20, 2018

JeffBezanson reviewed Jan 22, 2018

View reviewed changes

base/array.jl Outdated

julia> indexin(a,b)

julia> indexin(a, b)

6-element Array{Int64,1}:

Copy link

Member

JeffBezanson Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Element type needs to change.

JeffBezanson reviewed Jan 22, 2018

View reviewed changes

JeffBezanson added this to the 1.0 milestone Jan 23, 2018

garrison force-pushed the jrg/indexin-sentinel branch from 8d97a94 to ca1423c Compare January 26, 2018 03:01

nalimilan approved these changes Jan 26, 2018

View reviewed changes

garrison changed the title ~~WIP: Change indexin sentinel to nothing~~ RFC: Change indexin sentinel to nothing Jan 26, 2018

Change indexin sentinel to nothing

412dffa

garrison force-pushed the jrg/indexin-sentinel branch from ca1423c to 412dffa Compare January 28, 2018 22:04

JeffBezanson approved these changes Jan 28, 2018

View reviewed changes

JeffBezanson merged commit da3b862 into master Jan 29, 2018

JeffBezanson deleted the jrg/indexin-sentinel branch January 29, 2018 01:50

nalimilan mentioned this pull request Feb 2, 2018

API consistency review #20402

Closed

19 tasks

nalimilan mentioned this pull request Feb 11, 2018

Change indexin() to return first rather than last matching index #25998

Merged

andreasnoack added the needs news A NEWS entry is required for this change label Feb 15, 2018

andreasnoack mentioned this pull request Feb 15, 2018

Clean up the log JuliaLang/METADATA.jl#13367

Merged

ararslan mentioned this pull request Feb 15, 2018

Add a NEWS entry for find* sentinel change #26067

Merged

KristofferC removed the needs news A NEWS entry is required for this change label Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Change indexin sentinel to `nothing` #25662

RFC: Change indexin sentinel to `nothing` #25662

garrison commented Jan 20, 2018 •

edited

Loading

nalimilan commented Jan 20, 2018 •

edited

Loading

nalimilan commented Jan 20, 2018

garrison commented Jan 21, 2018

nalimilan commented Jan 21, 2018

nalimilan commented Jan 21, 2018

garrison commented Jan 21, 2018

yurivish commented Jan 21, 2018

garrison commented Jan 21, 2018

nalimilan commented Jan 21, 2018

yurivish commented Jan 21, 2018 •

edited

Loading

nalimilan commented Jan 21, 2018

yurivish commented Jan 21, 2018

JeffBezanson commented Jan 22, 2018

JeffBezanson Jan 22, 2018

JeffBezanson Jan 22, 2018

nalimilan Jan 22, 2018

JeffBezanson Jan 22, 2018

nalimilan Jan 22, 2018

garrison commented Jan 26, 2018

nalimilan left a comment

nalimilan Jan 26, 2018

garrison Jan 26, 2018

nalimilan Jan 26, 2018

vtjnash Jan 26, 2018

nalimilan Jan 26, 2018

nalimilan Jan 28, 2018

garrison Jan 28, 2018

JeffBezanson Jan 28, 2018

mbauman Jan 28, 2018

nalimilan Jan 28, 2018

garrison commented Jan 28, 2018

nalimilan commented Feb 11, 2018

andreasnoack commented Feb 15, 2018

ararslan commented Feb 15, 2018

RFC: Change indexin sentinel to nothing #25662

RFC: Change indexin sentinel to nothing #25662

Conversation

garrison commented Jan 20, 2018 • edited Loading

nalimilan commented Jan 20, 2018 • edited Loading

nalimilan commented Jan 20, 2018

garrison commented Jan 21, 2018

nalimilan commented Jan 21, 2018

nalimilan commented Jan 21, 2018

garrison commented Jan 21, 2018

yurivish commented Jan 21, 2018

garrison commented Jan 21, 2018

nalimilan commented Jan 21, 2018

yurivish commented Jan 21, 2018 • edited Loading

nalimilan commented Jan 21, 2018

yurivish commented Jan 21, 2018

JeffBezanson commented Jan 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrison commented Jan 26, 2018

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrison commented Jan 28, 2018

nalimilan commented Feb 11, 2018

andreasnoack commented Feb 15, 2018

ararslan commented Feb 15, 2018

RFC: Change indexin sentinel to `nothing` #25662

RFC: Change indexin sentinel to `nothing` #25662

garrison commented Jan 20, 2018 •

edited

Loading

nalimilan commented Jan 20, 2018 •

edited

Loading

yurivish commented Jan 21, 2018 •

edited

Loading