-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate AbstractString interface from iteration protocol #26133
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -247,6 +247,60 @@ setindex!(v::Pairs, value, key) = (v.data[key] = value; v) | |
get(v::Pairs, key, default) = get(v.data, key, default) | ||
get(f::Base.Callable, collection::Pairs, key) = get(f, v.data, key) | ||
|
||
""" | ||
Iterators.Next(values::A, idx::eltype(I)=firstindex(values), itr::I=eachindex(values)) where {A,I} | ||
|
||
Returns a tuple of a value and the subsequent index. This iterator is useful for the | ||
implementation of variably-length encoded arrays where decoding the element and | ||
obtaining the offset or index of the next element generally involve the same computation. | ||
|
||
A default implementation is provided that simply iterates over `eachindex` and uses | ||
`getindex` to obtain the value corresponding to the index. It is allowed (and encouraged) | ||
to overload iteration for a specific `Next{A}` in order to provide a more efficient | ||
implementation that computes both in one step. | ||
|
||
The index in the last tuple will generally be equivalent to `lastindex(values)+1` | ||
though users should only rely on the fact that it is `> lastindex(values)` to allow | ||
implemntations the flexibility to choose a different value. | ||
|
||
The `idx` argument provides a means by which to resume this iterator from a given index. | ||
The first value returned by the `Next` iterator should correspond to the element at `idx`. | ||
Please note that if you override iteration for `Next{A}` and your iteration state is not | ||
the next index, you will have to additionally overload `Next(data::A, idx, itr::I)` for | ||
four `A`. | ||
|
||
# Examples: | ||
|
||
julia> first(Next(['a','b','c'])) | ||
('a', 2) | ||
|
||
julia> first(Next(['a','b','c'], 3)) | ||
('c', 4) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a somewhat confusing example. Would a more typical usage example be possible? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what confuses you about it? The purpose of the second example was to show that the last tuple will have an out-of-bounds index. Would adding |
||
""" | ||
struct Next{A, I} | ||
data::A | ||
itr::I | ||
Next{A}(data::A, itr::I) where {A, I} = new{A, I}(data, itr) | ||
end | ||
Next(data, idx, itr) = Rest(Next{typeof(data)}(data, itr), idx) | ||
Next(data, idx) = Next(data, idx, eachindex(data)) | ||
Next(data) = Next{typeof(data)}(data, eachindex(data)) | ||
|
||
start(lip::Next) = start(lip.itr) | ||
done(lip::Next, state) = done(lip.itr, state) | ||
function next(lip::Next, state) | ||
nidx = ns = next(lip.itr, state) | ||
# A bit awkward now, done for consistency with the new iteration protocol | ||
done(lip.itr, ns) && (nidx = lastindex(lip.itr)+1) | ||
(lip.data[ns], nidx), ns | ||
end | ||
|
||
length(lip::Next) = length(lip.itr) | ||
eltype(::Type{Next{A, I}}) where {A, I} = Tuple{eltype(A), eltype(I)} | ||
|
||
IteratorSize(::Type{<:Next{I}}) where {I} = IteratorSize(I) | ||
IteratorEltype(::Type{<:Next{I}}) where {I} = IteratorEltype(I) | ||
|
||
# zip | ||
|
||
abstract type AbstractZipIterator end | ||
|
@@ -1070,6 +1124,7 @@ end | |
function fixpoint_iter_type(itrT::Type, valT::Type, stateT::Type) | ||
nextvalstate = Base._return_type(next, Tuple{itrT, stateT}) | ||
nextvalstate <: Tuple{Any, Any} || return Any | ||
nextvalstate === Union{} && return Union{} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mostly? Prevents things from erroring in the wrong place if you mess up how to do iteration. |
||
nextvalstate = Tuple{ | ||
typejoin(valT, fieldtype(nextvalstate, 1)), | ||
typejoin(stateT, fieldtype(nextvalstate, 2))} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# A specialized iterator for EachIndex of strings | ||
struct EachStringIndex{T<:AbstractString} | ||
s::T | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Somewhat tangential, but while we're moving this type, let's spell the field out as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we have |
||
end | ||
keys(s::AbstractString) = EachStringIndex(s) | ||
|
||
length(e::EachStringIndex) = length(e.s) | ||
first(::EachStringIndex) = 1 | ||
last(e::EachStringIndex) = lastindex(e.s) | ||
eltype(::Type{<:EachStringIndex}) = Int | ||
|
||
# Iteration over StringNext | ||
# | ||
# Any new subtype of AbstractString, should override | ||
# | ||
# next(::StringNext{MyString}, state) | ||
# | ||
# to provide iteration over the string and its indices. All other iteration methods, | ||
# including iteration over strings, iteration over pairs, indexing into string, | ||
# iteration over indicies alone are derived from this method. | ||
|
||
const StringNext{T<:AbstractString} = Iterators.Next{T, EachStringIndex{T}} | ||
StringNext(x::T) where {T<:AbstractString} = Next(x) | ||
StringNext(x::T, idx) where {T<:AbstractString} = Next(x, idx) | ||
StringNext(x::T, idx, itr) where {T<:AbstractString} = Next(x, idx, itr) | ||
|
||
start(sp::StringNext) = 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it be The more places the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Part of the goal of this change is to separate the assumption currently baked into the string code that indices and itartion state of strings are the same thing. We want indices to always be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I've been trying out various string implementations as a way to understand what the abstract interface contract is. If Is it also to be assumed that I have a julia> s = LazyJSON.SplicedString("Foo", " ", "Bar")
"Foo Bar"
julia> [keys(s)...]
7-element Array{Int64,1}:
1
2
3
1099511627777
2199023255553
2199023255554
2199023255555
julia> [(i >> 40, i & (2^40-1)) for i in keys(s)]
7-element Array{Tuple{Int64,Int64},1}:
(0, 1)
(0, 2)
(0, 3)
(1, 1)
(2, 1)
(2, 2)
(2, 3) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'd be fine relaxing this constraint, but I don't care too much.
This one I do care about.
I mean, you can do various kinds of encoding here, this one included, but I'd rather go the direction where strings can have non-standard index kinds, but all still need to support linear indexing no matter how slow it is. |
||
function done(s::StringNext, i) | ||
if isa(i, Integer) | ||
return i > ncodeunits(s.data) | ||
else | ||
throw(MethodError(done, (s, i))) | ||
end | ||
end | ||
function next(s::StringNext, i) | ||
if isa(i, Integer) && !isa(i, Int) | ||
return next(s, Int(i)) | ||
else | ||
throw(MethodError(next, (s, i))) | ||
end | ||
end | ||
|
||
# Derive iteration over pairs from `StringNext` | ||
const StringPairs{T<:AbstractString} = Iterators.Pairs{Int, Char, EachStringIndex{T}, T} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to @Keno this is:
There should probably be a comment to that effect. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can it be written something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I don't think that will work since there is no actual |
||
StringPairs{T}(x::T) where {T<:AbstractString} = Iterators.Pairs(x, eachindex(x)) | ||
StringPairs(x::T) where {T<:AbstractString} = StringPairs{T}(x) | ||
|
||
Iterators.pairs(s::AbstractString) = StringPairs(s) | ||
|
||
start(e::StringPairs) = (firstindex(e.data), start(StringNext(e.data))) | ||
done(e::StringPairs, (idx, state)) = done(StringNext(e.data), state) | ||
function next(s::StringPairs, (idx, state)) | ||
((c, nidx), state) = next(StringNext(s.data), state) | ||
Pair(idx, c), (nidx, state) | ||
end | ||
|
||
# Derive reverse pair iteration. | ||
# N.B. String implementers may wish to override | ||
# | ||
# next(s::Iterators.Reverse{<:StringPairs}, idx) | ||
# | ||
# to provide efficient variable-length reverse decoding | ||
Iterators.reverse(s::StringPairs) = Iterators.Reverse(s) | ||
|
||
start(e::Iterators.Reverse{<:StringPairs}) = ncodeunits(e.itr.data)+1 | ||
done(e::Iterators.Reverse{<:StringPairs}, idx) = idx == firstindex(e.itr.data) | ||
function next(s::Iterators.Reverse{<:StringPairs}, idx) | ||
tidx = thisind(s.itr.data, idx-1) | ||
(c, nidx) = first(Next(s.itr.data, tidx)) | ||
Pair(tidx, c), tidx | ||
end | ||
|
||
function prev(s::AbstractString, idx) | ||
(i, c), _ = next(Iterators.Reverse(StringPairs(s)), idx) | ||
(c, i) | ||
end | ||
|
||
|
||
# Derive iteration over strings from `StringNext` | ||
start(s::AbstractString) = start(StringNext(s)) | ||
done(s::AbstractString, state) = done(StringNext(s), state) | ||
function next(s::AbstractString, state) | ||
((c, _), state) = next(StringNext(s), state) | ||
(c, state) | ||
end | ||
|
||
eltype(::Type{<:AbstractString}) = Char | ||
sizeof(s::AbstractString) = ncodeunits(s) * sizeof(codeunit(s)) | ||
firstindex(s::AbstractString) = 1 | ||
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)) | ||
|
||
function getindex(s::AbstractString, i::Integer) | ||
@boundscheck checkbounds(s, i) | ||
@inbounds return isvalid(s, i) ? first(first(Next(s, i))) : string_index_err(s, i) | ||
end | ||
|
||
getindex(s::AbstractString, i::Colon) = s | ||
# TODO: handle other ranges with stride ±1 specially? | ||
# TODO: add more @propagate_inbounds annotations? | ||
getindex(s::AbstractString, v::AbstractVector{<:Integer}) = | ||
sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v)) | ||
getindex(s::AbstractString, v::AbstractVector{Bool}) = | ||
throw(ArgumentError("logical indexing not supported for strings")) | ||
|
||
function get(s::AbstractString, i::Integer, default) | ||
# TODO: use ternary once @inbounds is expression-like | ||
if checkbounds(Bool, s, i) | ||
@inbounds return s[i] | ||
else | ||
return default | ||
end | ||
end | ||
|
||
# Derive iteration over indices from `StringNext` | ||
start(e::EachStringIndex) = start(StringPairs(e.s)) | ||
done(e::EachStringIndex, state) = done(StringPairs(e.s), state) | ||
function next(e::EachStringIndex, state) | ||
((idx, _), state) = next(StringPairs(e.s), state) | ||
(idx, state) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "for four
A
" mean here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo