-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add support for keys(g::Generator) (fixes #27612) #27640
Conversation
This gives us pairs(g), argmin(g), argmax(g), findmin(g), and findmax(g)
Should we also define |
Yes, I feel this exact tension that you described well: #27612 (comment)
Also note that
|
Maybe we could support keys and indexing just for |
Maybe I'm misunderstanding, but in every case I've tried (including the examples you wrote) I can do this:
Doesn't that mean As for "well-ordered", if the order of iteration is not defined and/or not deterministic, then this would return the first maximizer encountered. I suppose the worry here that it is hard to say what "first" means in this case? Edit: I was thinking this could work on |
I think generally if a method takes an iterable why shouldn't it support |
By independently iterable, I mean that you can iterate julia> g = (x for x in eachline("NEWS.md"))
Base.Generator{Base.EachLine,getfield(Main, Symbol("##11#12"))}(getfield(Main, Symbol("##11#12"))(), Base.EachLine(IOStream(<file NEWS.md>), getfield(Base, Symbol("##295#296"))(Core.Box(IOStream(<file NEWS.md>))), false))
julia> first(g.iter)
"Julia v0.7.0 Release Notes"
julia> first(g)
"=========================="
julia> collect(g.iter);
julia> collect(g)
0-element Array{Any,1} |
Aha! Thanks for clarifying. |
Maybe |
An alternative would be to define Edit: In other words, don't define |
I'm torn on this one. The functionality seems very useful but I'm not sure whether generators have indices. I'm also not sure |
All We could imagine having a special case for iterators which are based on A more limited approach would be to add a convenience wrapper iterator |
…tion of pairs(g::Generator)
The discussion here has prompted me to think about what a generator is. Here's an attempt: (sorry it's so long!) I briefly thought a generator is a lazy Dict, but that's wrong. Iterating a Dict yields Pair{K, V} iterates, and So a generator is more like an array. I think Unlike in #25999, where find* functions had to be restricted because not all collections have keys/indices, there is no need to restrict find* functions to a "safe" subset of generators because we always have As an experiment, I have updated this PR so that *1: @mbauman made a good point that some iterables modify underlying state and are not great candidates to be a stable keys collection. It would be great to have a way to restrict the keys collection to collections that are independently iterable, but I think this is currently hard to specify. FWIW, one can also get into trouble even applying *2: @nalimilan's suggestions (e.g. "inherit the indices of the original collection" and inventing linear indices like before #25999) remind me of the alternative way that some people were interpreting a 2-arg argmax discussed in #27613. Perhaps some people regard the expression |
That's a really interesting point.
I came here to mention that they indices might also be However, that re-indexing operation is perhaps already expressed by |
Yes, it's interesting. Right now, I am worrying about the fact that |
We could punt the Base.keys(g::Base.Generator) = keys(g.iter)
Base.getindex(g::Base.Generator, i) = g.f(g.iter[i]) This fits quite nicely with the design we're trying out in #27038. |
That's not right though, since – for the OP use case – the goal here is to treat |
After the discussion, I believe |
I think One follow-on observation derived from this is that it seems like lazy transforms may be able to always preserve their original key set (Generator -> g.iter, Pairs -> p.itr, Enumerate -> Count, Zip -> Zip(keys), ValueIterator -> keys(iter.dict)), but can't be certain to provide random access. But eager transforms (map, collect, for(each), tuple-splat / Core._apply) always re-index them, guaranteeing efficient random-access as 1:n (but sometimes also preserving the shape of higher-order dimensions, if defined and applicable), but no longer providing the same (And there always seem to be counter-examples to any theory. For example, |
It's not really a question of right or wrong — it's simply trying to decide how we define Given its multidimensional behavior and symmetry with comprehensions, I'd say it's more array-like. |
Yes, definitely array-like. But since dictionaries and arrays both have keys, I've been looking for a definitive statement about their differences. So far, I know that a dictionary has eltype Pair{K, V} whereas an array keeps its keys separate from its values (e.g. iterates only its values even though it also has keys). And arrays have dimensions. But none of this precludes arrays having nonstandard keys (e.g. OffsetArray), although it does seem like there is a desire to keep indexing as predictable / well-behaved as possible. Perhaps this PR violates the spirit, if not the letter, of the dict/array divide. Edit: Ok, spirit and letter, as pointed out below. |
Arrays require their My definition in #27640 (comment) is indeed wrong, though, because —as you note — dictionaries iterate over their pairs whereas arrays iterate over the values: Base.keys(g::Base.Generator) = keys(g.iter)
Base.getindex(g::Base.Generator, i) = g.f(g.iter[i])
julia> g = (i for i in Dict((:a=>1,:b=>2)))
Base.Generator{Dict{Symbol,Int64},getfield(Main, Symbol("##7#8"))}(getfield(Main, Symbol("##7#8"))(), Dict(:a=>1,:b=>2))
julia> g[:a]
1
julia> g[:b]
2
julia> collect(g)
2-element Array{Pair{Symbol,Int64},1}:
:a => 1
:b => 2 |
Thanks, that was very helpful! Well in that case a generator isn't guaranteed to be array-like because it is possible that neither |
What about just pointing people to packages like MappedArrays (or ZippedArrays, or ShiftedArrays, etc.) |
I think, most nearly, the difference is that for an Array, the key (aka index) associated with an element is defined by its order within that container (and thus it's sensible to say the array has dimensions, since the keys are ordered), whereas with a Dict, the key is instead a computation on the element. Like the definition for |
A good rule of thumb is that queries about an object should only deal with the semantics of that specific object, and not look too deeply inside its constituent objects. So if |
Another interesting hint:
If With this PR, it would make sense to write In summary: As a lazy object, a generator is a precursor to a collection, but we are afforded the choice of collection (e.g. |
I don't think it's that simple. Currently |
Triage agrees that generators don't and shouldn't have indices. Regarding the
|
Thanks everyone for weighing in, I learned a lot! The way forward is clear, and the 2-arg form will satisfy my use case very nicely. |
As discussed here and here, this PR proposes that
argmax(f(x) for x ∈ itr)
should work and return anx
initr
that maximizesf(x)
. This is closely related to the mathematical definition ofargmax
, and consistent with other current 1-arg usages ofargmax
, such asargmax(A::AbstractArray)
andargmax(d::Dict)
as pointed out here.*: As pointed out by @Nosferican, this PR is not precisely the same as the mathematical definition of argmax, because it follows the existing pragmatic convention of returning a single maximizer instead of all of them. Discussion of this point can continue here.
For context, I also considered whether it would make sense to have a 2-arg flavor of
argmax(f, itr)
that returns anx
initr
that maximizesf(x)
. Helpful feedback from @ararslan and @martinholters persuaded me that this 2-arg syntax should be reserved for a different purpose, explained here.The strategy taken here is to define
keys(g::Generator)
. This gives uspairs(g)
,argmin(g)
,argmax(g)
,findmin(g)
, andfindmax(g)
for free. Here's a demo:If there is support for this PR, then I will add tests and update the docs as well. I should probably check the performance of this solution as well.