Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Define length methods for various iterators, using Inf as length for infinite iterators #11977

Closed
wants to merge 3 commits into from

Conversation

simonbyrne
Copy link
Contributor

See JuliaCollections/Iterators.jl#42 for some discussion.

@@ -134,6 +135,7 @@ end
take(xs, n::Int) = Take(xs, n)

eltype(it::Take) = eltype(it.xs)
length(it::Take) = min(length(it.xs), n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

length(it::Take) = min(length(it.xs), it.n)

?

@quinnj
Copy link
Member

quinnj commented Jul 1, 2015

Is it weird that length() normally returns an Int, but Inf is a float?

@simonbyrne simonbyrne force-pushed the sb/iterator-length branch from 298204e to 2369580 Compare July 1, 2015 17:01
@simonbyrne
Copy link
Contributor Author

@quinnj Yes, I agree its a bit weird---that's why I made it an RFC---but it does seem to work well for these cases. It would be good to know of possible downsides.

@quinnj
Copy link
Member

quinnj commented Jul 1, 2015

I'd worry about cases where one might do

x = 0
for i = 1:length(itr)  # --> creates float range
    x += i  # --> type instabilities
end

Not sure if that's a huge case or even a plausible workflow with some iterators, but just comes to mind.

@ScottPJones
Copy link
Contributor

Would it work to use typemax instead?
For floats, that gives Inf, and has no type instability problems.

@jdlangs
Copy link
Contributor

jdlangs commented Jul 1, 2015

I think having any sort of length definition for infinite iterators is a potentially very messy area.

In the motivating example of collect( product( chain( [1], [2] ), [ 3 ] ) ), why is length(::Chain) not just defined using the definition @simonbyrne provided later? Then it would error out if you created a chain with anything infinite.

@mbauman
Copy link
Member

mbauman commented Jul 1, 2015

I agree that there needs to be some solution here, but I'm not sure this is it. min(Inf, 1) -> 1.0. Perhaps some type IntegerInfinity <: Integer? It'll still have some instability problems, but at least we could try to do a better job of keeping integers where they're expected.

@simonbyrne
Copy link
Contributor Author

typemax has problems with BigInts, but is also deceptive in general.

The min(1,Inf) === 1.0 is a problem, but we could use x < y ? x : y instead: technically, it is type unstable, but it should be optimised out in almost all cases.

I do think this is a better idea than a separate integer infinity, but if we did want to go down that route, we could use ℵ₀?

@jdlangs
Copy link
Contributor

jdlangs commented Jul 1, 2015

Note that if length can return some form of infinity, this collect method will have to also be changed.

@ScottPJones
Copy link
Contributor

IEEE decimal floating point formats also have their own "Inf" and "NaN"s, just to be aware of.

@simonbyrne simonbyrne force-pushed the sb/iterator-length branch from 2369580 to a1ce5b0 Compare July 1, 2015 21:10
@simonbyrne
Copy link
Contributor Author

@phobon Actually, collect is a good point: if the length truly is infinite, we can throw an error straight away, otherwise if it is simply unknown we can apply the push! based collect method.

So it turns out 0.4 can't optimise out x::Int < Inf: 0.3 could, but this was broken by #9030 when I changed the functions to optimise 0 < x::Float.

@StefanKarpinski
Copy link
Member

LOL @ ℵ₀

I think this change kind of makes sense. A good example is zipping a finite iterable with an infinite one, whose length can be computed as the min over the zipped iterables, giving the correct answer naturally.

@JeffBezanson
Copy link
Member

JeffBezanson commented Jul 2, 2015 via email

@ScottPJones
Copy link
Contributor

I think @JeffBezanson is on the right track there. Having any length return a Float just didn't seem right.

@garborg
Copy link
Contributor

garborg commented Jul 2, 2015

I'd guess it's not worth the hassle of dealing with, but since traits, etc., are on the table for dealing with length properties, I'll bring it up in case someone thinks it's important:

Iterators often have a known minlength or maxlength when they're passed an iterable of undefined length, and like with Inf (but less commonly), access to that property enables downstream Iterators to know their length (take on something with a minlength, drop on something with a maxlength, etc.).

@simonbyrne simonbyrne force-pushed the sb/iterator-length branch from a977e34 to 6914d68 Compare July 3, 2015 10:06
@simonbyrne
Copy link
Contributor Author

Okay, I realised another deficiency with this approach: type inference is only run before the branch elimination, so julia can't tell that f(x::Int) = x < Inf ? x : Inf can only return an Int.

The only way around this would seem to be make this infinity its own type. So in that vein I've created an AlephNull type. I've intentionally limited the available operations, as my intention is that this be its only real use (e.g. there is no -ℵ₀ for example).

The only deficiency that comes to mind is that it is impossible to define a type stable *(::Integer, ::AlephNull) operation, which would complicate its use in the Product iterator in Iterators.jl.

@simonbyrne simonbyrne added the maths Mathematical functions label Jul 3, 2015
@simonbyrne
Copy link
Contributor Author

As it includes a cardinal number, I feel that this PR now truly deserves the "maths" label.

@IainNZ
Copy link
Member

IainNZ commented Jul 3, 2015

-1000 to this. If I tried to ask the length of an infinite thing I'd just like to get an error thrown straight away. Return Inf is just being sneaky with semantics, and this new type so much deeper understanding to reason about.

@jdlangs
Copy link
Contributor

jdlangs commented Jul 3, 2015

I still am not clear why infinite iterators need a length method at all. As Jeff implied, the concept of length doesn't mean anything for infinite collections. The motivating example given in the Iterators.jl issue is that Chain doesn't have a length method, but I don't see why it couldn't be added. What other problems arise from not having infinite length methods?

In short, it seems to me that returning infinity and throwing a MethodError are both runtime errors in some sense. However, the former error can continue to spread around the system while the latter immediately forces you to correct the problem, so it seems like the correct solution to me.

@jdlangs
Copy link
Contributor

jdlangs commented Jul 3, 2015

Another issue is that seems to be moving towards requiring length for any iterable. However, there will always be examples where the length is totally indeterminate and there is no value that can be returned for those except some sort of label which is just a replacement for an error.

@simonbyrne
Copy link
Contributor Author

The main purpose of this PR is to draw a distinction between:

  • iterators with known infinite length
  • iterators with unknown (but typically finite) length

The reason is that they act in a very different way: when used in zip, take, etc, infinite length iterators can have a known finite length, and so can be used in comprehensions. To take an example: suppose that convergents(pi) is an iterator over the convergents of pi. Then I can calculate the error of the first 10 convergents via

[abs(c - big(pi)) for c in take(convergents(pi),10)]

On the other hand the output of filter has an unknown length, and so applying take to this will still have an unknown length, and hence can't be used in a comprehension.

@jdlangs
Copy link
Contributor

jdlangs commented Jul 3, 2015

Interesting example there. But since there's no possibility of an infinite comprehension, you could argue that you are required to call collect on your iterator to make it work in that situation.

@simonbyrne
Copy link
Contributor Author

I'm not sure I understand: once I apply take, the iterator is finite with known length, so I should be able to use a comprehension without collect.

@jdlangs
Copy link
Contributor

jdlangs commented Jul 3, 2015

Right, what I was suggesting is that there's no problem as things currently are where take doesn't have a length. Since the comprehension needs to visit every element anyway, and it has to end at some point, it doesn't seem unreasonable to me that you have to add the collect call to make it work in the comprehension.

@JeffBezanson
Copy link
Member

I really really must insist that Inf is not useful as a length. take is more of a corner case; imagine all code that says something like Array(Int, length(x)). Such code will throw an error anyway. Is the solution to add a method for Array that can somehow construct infinite arrays, or to go back and use a different code path that doesn't require length (or throw a meaningful error if impossible)?

@jdlangs
Copy link
Contributor

jdlangs commented Jul 3, 2015

I'll copy-paste my comment from Iterators.jl here since I think it's actually more relevant to this discussion:


I think there should just be two different interfaces: an AbstractFiniteIterator where length is defined and AbstractIterator where its just start, next, and done. "Composite" iterators like Chain and Product should be parameterized on the class of iterator they hold and define length{I<:AbstractFiniteIterator}(c::Chain{I}) = ....

Is it possible to determine a single type parameter from a typejoin of the parameters in the constructor arguments?

@simonbyrne
Copy link
Contributor Author

@phobon Which type would Filter(x -> x>0, randn(10)) be?

@simonbyrne
Copy link
Contributor Author

@JeffBezanson I disagree that take is a corner case: it's essential when working with infinite length iterators, and in fact, that was my whole motivation for this. I wanted to be able to use take(::MyInfiniteIterator) in a comprehension, and I didn't see why that should be a problem.

I don't quite see the problem with the Array(Int, length(x)) code: with the current PR, it will just throw a MethodError:

julia> Array(Int,AlephNull())
ERROR: MethodError: `convert` has no method matching convert(::Type{Array{T,N}}, ::Type{Int64}, ::AlephNull)

which is presumably what you would want: it certainly beats the current behaviour of collect(countfrom(1)).

I concede that AlephNull type is a bit esoteric, but it does seem to me like an elegant solution to the problem.

@jdlangs
Copy link
Contributor

jdlangs commented Jul 9, 2015

In all cases, what you want is to use is not a length defined for the infinite iterator itself, but for the wrapper iterator that will end up being finite, right? That seems to me like something that should be done with type parameterizations, since the AlephNull is more or less only being used as label for a special type.

Your point about infinite iterators vs unknown iterators is well-taken though. Maybe there should only be a type distinction for infinite iterators that would let you define, e.g., length{I<:AbstractInfInterator}(t::Take{I}) = t.n. I also briefly suggested this in #11749 and maybe should have pushed for it a bit more.

@mschauer
Copy link
Contributor

I also think using Inf or AlephNull is not the direction to go. It is expected that ``length` fails sometimes. For example for the simple reason that Uints have 2^64 possible values, but a set of Uints can have 2^64+1 possible lengths. This number is not representable as a float at all, so falling back to float does not help. It is also equal to not AlephNull. The solution so far is to fail

julia> length(1:typemax(UInt64))
0xffffffffffffffff

julia> length(0:typemax(UInt64))
ERROR: OverflowError()
 in length at range.jl:289

and to not call length if not really needed, for example by writing isempty(r) instead of length(r) == 0. But I also think that infinite iterators deserves a trait or an abstract type.

@ScottPJones
Copy link
Contributor

I thought length() in Julia always returned an Int.
It seems a bit strange, because Julia could, for example, return a UInt128 instead of getting an OverflowError, i.e. length(typemin(Int):typemax(Int)) gives an error, instead of Int128(2)^64.

@hayd
Copy link
Member

hayd commented Sep 23, 2015

Similar to the other issue's example you may (reasonably?) want to do:

[i for i in zip(1:10, typemin(Int64):typemax(Int64))]

@eschnett
Copy link
Contributor

In addition to "infinity", you may want to have a notion of "indeterminate or unknown" length, i.e. iterators where the number of iterations is not or cannot be known beforehand. For example, you may be modelling the interaction between two threads, or reading characters from a socket, or a data structure where determining the length is as expensive as iterating over it and thus not worthwhile.

@hayd
Copy link
Member

hayd commented Oct 23, 2015

Is the conclusion: we should flesh this out with traits once we have #13222 ?

@JeffBezanson
Copy link
Member

Yes I think so; I don't want lengths to be infinite.

@tkelman tkelman deleted the sb/iterator-length branch February 22, 2016 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.