Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should eltype(1..2) be Float64? (currently it is Int) #115

Open
zsunberg opened this issue Aug 18, 2022 · 33 comments · May be fixed by #150
Open

Should eltype(1..2) be Float64? (currently it is Int) #115

zsunberg opened this issue Aug 18, 2022 · 33 comments · May be fixed by #150

Comments

@zsunberg
Copy link
Contributor

It seems quite strange to me that eltype(1..2) is Int because the elements in the mathematical set [1,2] are real numbers, and real numbers are usually represented by Float64.

Happy to make a PR if people support this.

@zsunberg
Copy link
Contributor Author

This has caused confusion in JuliaReinforcementLearning/CommonRLSpaces.jl#12.

@dlfivefifty
Copy link
Member

Why Float64?

julia> big(π) in 0..4
true

I think eltype is just a poorly defined concept here....

@hyrodium
Copy link
Collaborator

I'm not sure we need eltype function for non-iterable objects such as ClosedInterval.

The most consistent way would be rename eltype(::Interval) to boundstype(::Interval), I guess.

@dlfivefifty
Copy link
Member

I once had a crazy idea of implementing iterate using nextfloat. This would make sense of eltype (so iterate(0..1) would be equivalent to [0,1]). Unfortunately there are a lot of floats so it was a completely useless idea!

So I think removing eltype would be wise. But this also effects DomainSets.jl. @daanhb thoughts?

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

I think the ship has sailed, up to some details. The choice in this package is that an interval is defined by the membership function a <= x <= b, regardless of what the types of the endpoints and of x are. This is the most sensible mathematical definition of the continuous interval. In developing DomainSets, so far anything that would lead to restrictions on types over mathematical flexibility or correctness has turned out to be a bad idea. So, first of all, I'm happy with an interval defined by integers containing real numbers (and anything else that can be compared to an integer).

Personally, I'm also happy with having eltype. Consider this example:

julia> A = [1,2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> eltype(A)
Int64

julia> 2.0  A
true

What IntervalSets does is just the continuous analogue of that, largely. For the example above, nothing else would make sense. It would be weird to make the eltype be Float64. It would also be bizarre if 2.0 isn't a member, since after all 2.0 == 2.

Of course it is annoying that 1..2 has eltype set to Int by default, because usually one wants it to be Float64. From a user-friendliness perspective I could imagine changing that default, as long as the construction with Ints also remains possible, given that the focus of this package is really on continuous sets. It is safe to guess every user runs into this issue at some point.

In DomainSets the T in Domain{T} is crucial in many ways and I'd prefer to just call it what it is, the eltype :-)

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

Perhaps we could have 1..2 turn into 1.0..2.0, because the notation implies the intention of a continuous set, while leaving Interval(1,2) as it is?

@hyrodium
Copy link
Collaborator

Perhaps we could have 1..2 turn into 1.0..2.0, because the notation implies the intention of a continuous set, while leaving Interval(1,2) as it is?

We currently support Date type, so converting Int to Float64 seems inconsistent.

julia> using Dates, IntervalSets

julia> Date(2022,08,12) .. Date(2022,09,12)
2022-08-12..2022-09-12

julia> Date(2022,09,10) in ans
true

@zsunberg
Copy link
Contributor Author

We currently support Date type, so converting Int to Float64 seems inconsistent.

One possible rule would be: if the bounds are <: Real, call float on them.

@hyrodium
Copy link
Collaborator

hyrodium commented Aug 18, 2022

How about Rational? I think this should not be converted with float.

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

Perhaps we could have 1..2 turn into 1.0..2.0, because the notation implies the intention of a continuous set, while leaving Interval(1,2) as it is?

We currently support Date type, so converting Int to Float64 seems inconsistent.

julia> using Dates, IntervalSets

julia> Date(2022,08,12) .. Date(2022,09,12)
2022-08-12..2022-09-12

julia> Date(2022,09,10) in ans
true

Thanks, that's a cool example.

@zsunberg
Copy link
Contributor Author

zsunberg commented Aug 18, 2022

I'm not sure we need eltype function for non-iterable objects such as ClosedInterval.

I think this is the core issue. eltype is part of the iteration interface and continuous sets don't have iteration. Perhaps we need a new function like element_type for continuous sets. Or maybe people should just use Random.gentype.

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

I'm not sure we need eltype function for non-iterable objects such as ClosedInterval.

I think this is the core issue. eltype is part of the iteration interface. Perhaps we need a new function like element_type for continuous sets. Or maybe people should just use Random.gentype.

I somewhat agree, but there is a dual issue to this one in which people discuss why element_type is not just called eltype :-)
In practice, the worst offender really is just 1.5 in 1..2, because it is so counterintuitive to have eltype(1..2) be Int.

@zsunberg
Copy link
Contributor Author

How about Rational? I think this should not be converted with float.

Agreed. How about if the bounds are <: Integer, call float.

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

For some perspective, in DomainSets we settled on the following. A Domain{T}, which is the supertype of Interval{T}, has eltype T. The statement x::S in d::Domain{T} can be true if:

  • (i) S and T promote to a type U,
  • (ii) x can be converted to U and
  • (iii) d can be converted to Domain{U}.

That is more or less what happens here, as the inequality a <= x would typically promote a and x to U anyway. That scheme seems to work well. The counterintuitive aspect, with respect to the name eltype, is (iii). That is where an Interval{Int} might behave like an Interval{Float64}. In other circumstances it could also behave like Interval{BigFloat} or even something else. That does not mean it does not have an element type, it just possibly has multiple ones and there will never be a single right choice.

@dlfivefifty
Copy link
Member

What about boundary_eltype ?

@hyrodium
Copy link
Collaborator

hyrodium commented Aug 18, 2022

Agreed. How about if the bounds are <: Integer, call float.

We still have counterexamples HalfInteger and N0f8. 😁
We don't have an abstract type for a dense subset of Real, so we don't have a proper way to check the given type should be converted to float.

What about boundary_eltype ?

Or boundstype for short? (#115 (comment))

@zsunberg
Copy link
Contributor Author

We still have counterexamples HalfInteger and N0f8

Those are not Integers (unless I am mistaken). Keep in mind that this is only for the syntactic sugar of ... You could always construct Interval(1,2) manually to avoid any conversion.

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

I don't think that, as a user of IntervalSets, I would ever care for the type of the endpoints of the domain. I want to know what the elements are like - it is a set after all - and the first thing I'd do is abuse boundstype for that purpose :-)

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

On second thought, something like boundstype probably does make sense for this package - at least it is well defined. In that case, if an eltype is to remain (it seems entirely unused within IntervalSets itself...) perhaps it is more free to differ from T in cases where it is convenient.

@dlfivefifty
Copy link
Member

Note replacing 1..2 with float(1)..float(2) is a very bad idea because it changes the definition of the set:

julia> b = typemax(Int); a = b-1;

julia> a..b
9223372036854775806..9223372036854775807

julia> float(a)..float(b)
9.223372036854776e18..9.223372036854776e18

julia> a in (float(a)..float(b))
false

ApproxFun has something called prectype which is the only place eltype is usedL:

https://github.com/JuliaApproximation/ApproxFunBase.jl/blob/7f435e277c1bd0d8222e6defd51ddbb7e0df3e53/src/Domain.jl#L12

To me the T in Domain{T} is dictating the "precision" of the definition of a set, in the case of the interval the precision of the endpoints. Which ApproxFun then uses to assume the precision of functions on the set.

@daanhb
Copy link
Contributor

daanhb commented Aug 18, 2022

(sidenote, DomainSets also has a prectype, might it be the same? I probably did copy the name from ApproxFun. It is defined in general here and for domains here. There is also a numtype with a similar role, the basis numeric type used in T.)

@zsunberg
Copy link
Contributor Author

zsunberg commented Aug 18, 2022

I think it would make a lot of sense to introduce boundstype, and, if eltype is defined, it should be the type that would naturally be returned if one were to write an iterator. (which we know from @dlfivefifty's comment above would be Float64 😄 )

@hyrodium
Copy link
Collaborator

I want to know what the elements are like - it is a set after all - and the first thing I'd do is abuse boundstype for that purpose :-)

In what practical situations is the interval "eltype" used?

This has caused confusion in JuliaReinforcementLearning/CommonRLSpaces.jl#12.

This example is just around tests, and it can be replaced with

@test eltype(tp) <: Tuple{Real, Real}

@zsunberg
Copy link
Contributor Author

zsunberg commented Aug 19, 2022

In what practical situations is the interval "eltype" used?

Whenever an algorithm needs to deal with elements of some set but is only provided with the set itself, i.e.

function f(set)
    d = Dict{eltype(set), Float64}()
    # fill up d
end

A concrete example is in reinforcement learning, a user might specify the action space as -1..1, and then the algorithm needs to deal with actions. For instance, here, I infer the action type based on the action set: https://github.com/JuliaPOMDP/QuickPOMDPs.jl/blob/master/src/quick.jl#L166 (sorry, that code is pretty hard to read out of context with _call)

It is true that most of the time you can avoid needing eltype, but it sometimes requires alot more thinking.

@zsunberg
Copy link
Contributor Author

(which we know from @dlfivefifty's #115 (comment) would be Float64 smile )

Oh shoot, I misinterpreted the comment above, so it wouldn't necessarily be Float64. But I think Float64 would be better than Int, because in my use case above, every element of 1..2 can be converted to a Float64, but convert(Int, 1.5) will error.

@daanhb
Copy link
Contributor

daanhb commented Aug 19, 2022

Note replacing 1..2 with float(1)..float(2) is a very bad idea because it changes the definition of the set:

This is a fair point. However, through promotion, the package currently does similar conversions anyway:

julia> (1..2)  (1.5..2.5)
1.0..2.5

And Base does it too for ranges between integers, if it is clear that elements will be reals:

julia> range(0, 1, length=100)
0.0:0.010101010101010102:1.0

julia> typeof(ans)
StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}

@daanhb
Copy link
Contributor

daanhb commented Aug 19, 2022

In what practical situations is the interval "eltype" used?

As @dlfivefifty also mentioned, one important goal is to convey the expected numerical accuracy (which leads to prectype). When working with domains in function approximation, I typically want to discretize them. In order to pre-allocate memory for that, I would like to know the type of the grid points. Finally, in more complicated domains of DomainSets, the T also conveys information about the structure (of product domains, for example, where T might be a vector or a tuple).

To answer the original question of how this issue affects DomainSets: I think it doesn't. We just define T to be the eltype of Domain{T} anyway, which would cover intervals as a special case even if it is removed here. Based on the arguments here I am not currently inclined to change it (though I'm always up for a nice conceptual debate of course).

I think a counter question might be in which kind of situations it matters that 1..2 has integer endpoints. If I wanted an interval of integers, I would use 1:2.

@dlfivefifty
Copy link
Member

But I think Float64 would be better than Int, because in my use case above, every element of 1..2 can be converted to a Float64

But in my example above b = typemax(Int); a = b - 1; a..b converting to Float64 gives only a single element...

@zsunberg
Copy link
Contributor Author

But in my example above b = typemax(Int); a = b - 1; a..b converting to Float64 gives only a single element...

Well, if you construct a set like that, maybe you deserve to get only one element 😜 ! Just kidding.

@dlfivefifty
Copy link
Member

I know you are just kidding (and so am I!) but it is important to be consistent: arbitrarily converting to floats will just add unnecessary confusion because at some point someone will want an interval containing large ints.

I'm wondering if you would be better off just calling float(eltype(d)) in the offending code that triggered the issue?

@daanhb
Copy link
Contributor

daanhb commented Aug 21, 2022

It's not really an arbitrary conversion though. Nobody argues for disallowing Interval{T} with integer T. Instead there is a question of how easy that should be (and in particular whether it should be the default).
Suggestions here are:

  • remove eltype, introduce boundstype, which returns the T of Interval{T}. The reasoning is that boundstype is precise, whereas the eltype of a continuous set is ambiguous
  • continue to have eltype of Interval{T} return T, but make it easier to construct an interval of floats by default. In particular, let a..b and Interval(a,b) promote a and b to floats if they are <: Integer, but allow Interval{Int}(a,b). The reasoning is that, if eltype continues to be defined, it is frequently more useful for it to be Float64 than Integer.

In the second scenario there would be a visual clue that something happened to inform the user:

julia> 1..2
1.0..2.0

This is comparable to:

julia> 1:0.01:2
1.0:0.01:2.0

You don't want to have the endpoints of a range have a different type from the elements of the range.

Since this package aims to be a "minimal foundation", the first option may make more sense. In that case I would define eltype in DomainSets. On the other hand, quoting from the first line of the readme: "This package represents intervals of an ordered set." In Julia I'd expect any kind of set to implement eltype.

Finally, it makes little sense for eltype to return anything other than T. So the third option is to do nothing. The user has control over choosing T when making the interval, and the meaning of T is that it is the intended element type of the interval for the use cases of that user.

(Sidenote, if it stays, perhaps eltype can be defined for Domain{T} rather than only for intervals. Again, per the docstring of Domain{T}, it is "a subset of type T".)

@zsunberg
Copy link
Contributor Author

zsunberg commented Aug 21, 2022

After working on JuliaApproximation/DomainSets.jl#112, here are my thoughts:

  1. You guys really like abstract types 😂
  2. eltype(::Type{<:Domain{T}}) = T is a foundational concept in this ecosystem, as @daanhb said above and we probably should not change that.
  3. Given (2) Interval{Int}(1,2) is a very strange object. rand(Interval{Int}(1,2)) should should probably return an Int. There are some other weird things like isempty(Interval(typemax(Int), (typemax(Int)-1))) currently returns true.
  4. Given (3) most people are probably thinking Interval(1.0, 2.0) when they type 1..2
  5. Given (4), The syntactic sugar 1..2 should expand to Interval(1.0, 2.0). If people want to go for a wild ride, they should still be able to construct Interval{Int}(1, 2).

Thus I think I agree with @daanhb 's second suggestion above.

@hyrodium
Copy link
Collaborator

Btw, Intervals.jl has the same eltype problem.

julia> using Intervals

julia> 1..2
Interval{Int64, Closed, Closed}(1, 2)

julia> eltype(1..2)
Int64

julia> eltype(1..2.)
Float64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants