-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Change indexin sentinel to nothing
#25662
Conversation
It's not an inference issue, it's due to #25553. I guess an intermediate solution would be to pre-allocate the resulting array with the correct type. |
This PR (and #23845) generalize |
Yes, but the other/main advantage of using |
Also, it would make sense to add a third argument indicating the value to return when there is no match. It would default to While we're at it, I wonder whether returning the smallest/first index wouldn't be more natural than the highest/last. The only reason to return the last index seems to be that the implementation is somewhat easier (indeed it's really simple), but it shouldn't be hard nor slow to keep the first index if it's already in the dict. Thoughts? |
It looks like |
Is it conceivable that an indexable container can be indexed by |
I believe this would be addressed by @nalimilan's proposed third argument. |
A dict could have |
Sorry, I should have been clearer but didn't want to come across as presumptuous. 😄 I was trying to understand if there was any reason why using an Optional ( The idea would be to force the user to handle the case where the value isn't present explicitly via unwrapping, rather implicitly by getting an error at the point when a value they get is passed to a method that doesn't accept Unlike propagation with The idea of a third argument would also work, but it would be nice if all of the methods that have this kind of potential-for-null were handled in similar ways. |
OK. The use of |
The conversation in that issue was enlightening. Thanks for the explanation and the link! |
Should #23845 be closed? |
base/array.jl
Outdated
|
||
julia> indexin(a,b) | ||
julia> indexin(a, b) | ||
6-element Array{Int64,1}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Element type needs to change.
base/array.jl
Outdated
bdict = Dict(zip(b, 1:length(b))) | ||
[get(bdict, i, 0) for i in a] | ||
function indexin(a, b::AbstractArray) | ||
bdict = Dict(zip(b, eachindex(b))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use keys(b)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even _pairs
, see my comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pairs
returns index=>value pairs; this needs them the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. I guess we could define _keys
similar to _pairs
, but for now using keys
should be enough, we can always support non-AbstractVector
arguments later.
8d97a94
to
ca1423c
Compare
Why do this for
In the interest of keeping this PR non-breaking (and non-controversial), I'm going to keep the current behavior here. But I would have no objection to this in a follow-up pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's go with that and discuss other changes after. I think the first match vs. last match change has the highest priority since it's breaking.
base/array.jl
Outdated
[get(bdict, i, 0) for i in a] | ||
function indexin(a, b::AbstractArray) | ||
bdict = Dict(zip(b, keys(b))) | ||
map(i -> get(bdict, i, nothing), a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until #25553 this is going to return an Array{Any}
, but I guess it's OK as a temporary situation. We could specify the element type in advance, but then it would always be Union{T, Nothing}
even in the absence of nothing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit of map
is that it returns a scalar if passed a scalar argument (see here for previous discussion). So I think it makes sense to keep this as is and rely on #25553.
Also, I would have expected tests to fail due to the (currently) inaccurate types in the doctests, but it seems these doctests are not actually being tested.
We could specify the element type in advance, but then it would always be Union{T, Nothing} even in the absence of nothing.
I (obviously) don't understand inference, but I would have expected/hoped that the return type would always be Union{T, Nothing}
. (In particular, I changed the type of the second doctest as well so this would be the case, despite lack of nothing
in the output.) If this is not expected, then the return type of this function will not be inferred, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inference isn't involved here, we just use typejoin
to widen the element type as new elements are encountered. So no, the element type will always reflect what the array actually contains. This decision has been made so that inference can change without affecting user-visible behavior. We could force always returning a Union{T, Nothing}
array, but I'm not sure that's a good idea given that map
behaves differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this may be an appropriate place to type-hint it as Union{valuetype(bdict), Nothing}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, just adding a type assertion? Wouldn't inference be able to figure that anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though @StefanKarpinski seemed to have a different opinion. Do you really want that behavior? At first returning a scalar made sense to me too, but thinking a bit more about the purpose of this function it's not so clear.
Anyway it would be easy to add a special method for Number
if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this more, I'm honestly not sure why Number
is treated differently than other data types for map
, collect
, etc. First, there is (at least in my opinion) an impedance mismatch between map
and collect
:
julia> map(i->i, 3)
3
julia> collect(3)
0-dimensional Array{Int64,0}:
3
Additionally, I find this funny because either of these operations on a Char
would result in a 1-d Array.
Also pinging @mbauman, as he had the opinion as well that it should return a scalar. As for me, I'm no longer convinced that it should (or is worth worrying about).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Char
is not HasShape
even though ndims
is defined for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for me, I'm no longer convinced that it should (or is worth worrying about).
After re-reading that issue and this one, I think I would agree. It stinks that we've never managed to remove iteration from numbers, but that's the real problem here. We've papered over that in piecemeal cases as we've noticed it. So my 👍 was just to continue that "papering over," but I'm becoming less convinced that's the direction we should go. And of course, this ties into the big "what's a scalar" broadcast issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better discuss this in a separate issue? I'd rather do whatever sounds reasonable here and see how we can find a consistent general pattern later.
nothing
nothing
ca1423c
to
412dffa
Compare
Sure, let's further discuss the above outdated-diff thread in a separate issue. |
I've filed #25998 to return the first rather than the last matching index. |
We need a NEWS entry for this silently breaking change |
#26067 for news |
This changes
indexin
's sentinel tonothing
, following #25472 (comment). Also includes the improvements in #23845 (which this pull request now supercedes).At the moment, inference seems to be broken for this, as the first example in the doctest results inArray{Any,1}
, hence the WIP title.