length() isn't required for iterators #42

arghhhh · 2015-03-27T16:43:53Z

According to the Julia language docs, iterators do not need to support the length() method. The length() method is expected of "General Collections". The biggest problem with length() is that some iterators have infinite length() and this cannot be represented by any Integer.

Often iterators are functions of other iterators, and so their length is a function of other iterators. The technique of selectively declaring the length() method when it is known, and using applicable() to test for its presence doesn't work well in this case - how can you selectively define length() dependent on whether an argument type has it declared?

The cleanest easy solution to this is not to declare length for any iterators. A better solution would be to allow length() to return :unknown or :infinite for cases where the length cannot be determined without iterating through the entire sequence (eg reading tokens from a file), or when it is known to be unbounded. A set of basic math operators would need to be provided for combining finite lengths with either unknown or infinite lengths. But this would affect Julia beyond just this package.

julia> collect( product( chain( [1], [2] ), [ 3 ] ) )
ERROR: length has no method matching length(::Chain)

The text was updated successfully, but these errors were encountered:

simonster · 2015-03-27T18:48:23Z

Iterators that are functions of other iterators could be parametrized by whether they have a known length (e.g. Chain{false} has unknown length and Chain{true} has known length), and then we could define length only for Chain{true}. I believe we could do this without introducing type instability if we use a staged function to test the arguments for applicability of length when the iterator is constructed.

arghhhh · 2015-03-27T20:12:49Z

Interesting. This is a different solution to the one I was thinking of. The disadvantage is the baggage of having to define true and false versions etc. (I define a lot of iterators - they represent signals in signal processing applications) As for type stability, I believe that in some ways this is just moving the problem around - the test for length being available still needs to be made and this isn't that different that code the compiler put put in testing the type. My assumption is that length can be hoisted out of loops, so efficiency shouldn't be affected much. I use multiple return types a lot - outside of loops, and only for "good" reasons - I figure that the compiler generated code for testing this is as least as efficient as the code I would write to effect the same thing - effectively tagged unions - and I can have smaller and neater code exploiting this.

For the record, this is where I was going. It makes a lot more sense to use singleton types than the symbols I originally suggested - and then I can add specializations for the + operator.

type UnknownLength
end
+( lhs::UnknownLength, rhs ) = UnknownLength()
+( lhs, rhs::UnknownLength ) = UnknownLength()
+( lsh::UnknownLength, rhs::UnknownLength ) = UnknownLength()

Then chain() could just use sum() over map() of length() - and if any were UnknownLength() then the result is UnknownLength(). I have also in the past had the concept of infinite length which could be handled the same way.

However, I'm modifying my application code to not assume the presence of length() to avoid all these issues.

simonbyrne · 2015-07-01T15:40:05Z

A quick thought: since infinite length will typically be determined for a whole iterator type, what if we were to define length(::MyInfiniteIterator) = Inf?

Although it might seem a bit odd at first to use a float in this context, there are some advantages:

it would always compare longer than any finite-length chain, so the definition of Zip2 would work as expected, and we could avoid the need for TakeStrict by defining

length(t::Take) = min(length(t.xs), n)

Similarly, we could have

length(c::Chain) = sum([length(i) for i in c.xss])

We could distinguish between infinite-length iterators (whose behaviour is fairly predictable), and unknown-length iterators (whose behaviour can cause problems if they finish too early), which would continue to throw a MethodError.

stevengj · 2015-07-01T15:48:17Z

length(::MyInfiniteIterator) = Inf seems sensible to me.

simonbyrne · 2015-07-01T15:53:00Z

Also, since Inf - n == Inf for any finite n, this would also work for Drop:

length(d::Drop) = max(length(d.xs) - d.n, 0)

…for infinite iterators. See JuliaCollections/Iterators.jl#42

ScottPJones · 2015-07-01T17:30:10Z

I'd worry that the type instability this introduces would cause more problems than this really solves.

stevengj · 2015-07-01T18:16:46Z

This doesn't neecessarily introduce any type instability. length(::MyInfiniteIterator) would always return a Float64 (Inf), and length(::Array) would still always return an Int.

Type instability doesn't mean that a given function (length) must always return the same type. To be stable, you only need to return the same type for the same argument types.

It is true that length(::Any) would now return an unknown type, but that would only matter when type inference has already failed (so that the compiler doesn't know the type of the object you are taking the length of).

ScottPJones · 2015-07-01T18:22:02Z

Wouldn't we want to check if there are any places where code is using length(::Any)?
I have a suspicion that it might show up in places like ODBC, or DataFrames, where it is frequent to have rows with many different types, so you end up with a Vector{::Any}.

…for infinite iterators. See JuliaCollections/Iterators.jl#42

jdlangs · 2015-07-03T16:05:48Z

I think there should just be two different interfaces: an AbstractFiniteIterator where length is defined and AbstractIterator where its just start, next, and done. "Composite" iterators like Chain and Product should be parameterized on the class of iterator they hold and define length{I<:AbstractFiniteIterator}(c::Chain{I}) = ....

Is it possible to determine a single type parameter from a typejoin of the parameters in the constructor arguments?

ScottPJones · 2015-07-03T16:28:54Z

@phobon's suggestion sounds reasonable to me

ScottPJones · 2015-07-04T02:25:05Z

I've had to design iterators that could handle unknown or essentially infinite streams of data, for example,
you don't want to have to iterate over terabytes of a B+tree to find a length, and by the time you've done that, the length might not even be accurate any longer, if other processes are adding/deleting to/from the database.

simonbyrne · 2015-07-04T09:56:08Z

@ScottPJones I don't think anyone is proposing to make length iterate over the collection (for one thing, this would cause all sorts of problems with streaming data where you only get to iterate once).

ScottPJones · 2015-07-04T12:29:59Z

My point, muddled as it was, was just that in some cases, like the one I mentioned, it was impossible to determine in advance whether the iteration was of unknown length, or infinite, even for the same structure. Same thing is true for iterating over a filesystem. I didn't think people were proposing making length iterate of the entire collection, just that conceptually, to me, @phobon's idea seems correct, not changing length to try to return an out-of-band value. If you really want length to return an out-of-band value, why not just return -1 (since length returns an Int usually instead of a UInt which in and of itself is conceptually strange)

oxinabox · 2016-02-02T02:56:57Z

I think returing Inf would be unsemantic,
because you can have an iterator of unknown (and unknowable) length that is certainly finite:
For example

type ValuesThatSumToAtLeast100
end

Base.start(b::ValuesThatSumToAtLeast100) = 1
Base.done(b::ValuesThatSumToAtLeast100,state) = state>=100
function Base.next(b::ValuesThatSumToAtLeast100,state)
    value = rand(1:10)
    value, state+value
end

ValuesThatSumToAtLeast100() |> collect

This iterators length is always between 10 and 100 elements long

Better than returning some out of band value like Inf or -1
is having different parameterised types, as suggested in #42 (comment)
, or different interfaces as suggested in #42 (comment)

This is also more julian, since it makes it easy to define things using multiple dispatch later.

Keno · 2016-05-18T05:17:37Z

This is fixed on 0.5, since I declared Product as SizeUnknown, which is the only thing we can do, since the iteratorsize is a type property and we do no parameterize on types.

simonbyrne added a commit to JuliaLang/julia that referenced this issue Jul 1, 2015

Define length methods for various iterators, using Inf as length …

298204e

…for infinite iterators. See JuliaCollections/Iterators.jl#42

simonbyrne mentioned this issue Jul 1, 2015

RFC: Define length methods for various iterators, using Inf as length for infinite iterators JuliaLang/julia#11977

Closed

simonbyrne added a commit to JuliaLang/julia that referenced this issue Jul 1, 2015

Define length methods for various iterators, using Inf as length …

2369580

…for infinite iterators. See JuliaCollections/Iterators.jl#42

simonbyrne added a commit to JuliaLang/julia that referenced this issue Jul 1, 2015

Define length methods for various iterators, using Inf as length …

a1ce5b0

…for infinite iterators. See JuliaCollections/Iterators.jl#42

lmshk mentioned this issue Apr 17, 2016

no method matching length(::Iterators.Chain) #56

Closed

Keno closed this as completed May 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

length() isn't required for iterators #42

length() isn't required for iterators #42

arghhhh commented Mar 27, 2015

simonster commented Mar 27, 2015

arghhhh commented Mar 27, 2015

simonbyrne commented Jul 1, 2015

stevengj commented Jul 1, 2015

simonbyrne commented Jul 1, 2015

ScottPJones commented Jul 1, 2015

stevengj commented Jul 1, 2015

ScottPJones commented Jul 1, 2015

jdlangs commented Jul 3, 2015

ScottPJones commented Jul 3, 2015

ScottPJones commented Jul 4, 2015

simonbyrne commented Jul 4, 2015

ScottPJones commented Jul 4, 2015

oxinabox commented Feb 2, 2016

Keno commented May 18, 2016

length() isn't required for iterators #42

length() isn't required for iterators #42

Comments

arghhhh commented Mar 27, 2015

simonster commented Mar 27, 2015

arghhhh commented Mar 27, 2015

simonbyrne commented Jul 1, 2015

stevengj commented Jul 1, 2015

simonbyrne commented Jul 1, 2015

ScottPJones commented Jul 1, 2015

stevengj commented Jul 1, 2015

ScottPJones commented Jul 1, 2015

jdlangs commented Jul 3, 2015

ScottPJones commented Jul 3, 2015

ScottPJones commented Jul 4, 2015

simonbyrne commented Jul 4, 2015

ScottPJones commented Jul 4, 2015

oxinabox commented Feb 2, 2016

Keno commented May 18, 2016