Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is length defined on numbers? #11769

Closed
samuela opened this issue Jun 19, 2015 · 37 comments
Closed

Why is length defined on numbers? #11769

samuela opened this issue Jun 19, 2015 · 37 comments
Milestone

Comments

@samuela
Copy link
Contributor

samuela commented Jun 19, 2015

I discovered from a typo today that the length method is defined on numbers:

julia> length(5)
1

julia> length(pi)
1

It's defined here: https://github.com/JuliaLang/julia/blob/master/base/number.jl#L12. Why is length defined on numbers? This seems counterintuitive. Numbers aren't iterable in any way after all.

@joehuchette
Copy link
Member

They are in Julia:

julia> for a in 1
       @show a
       end
a = 1

@samuela
Copy link
Contributor Author

samuela commented Jun 19, 2015

AHH, BUT WHYYY?

@nalimilan
Copy link
Member

It's considered practical by some people. The question is rather whether this has drawbacks or not. Else there's no reason to remove that.

FWIW, please use the mailing list for questions in the future, and keep GitHub for bug reports.

@Keno
Copy link
Member

Keno commented Jun 19, 2015

FWIW, I consider this a mistake, it has bitten me way too many times.

@carnaval
Copy link
Contributor

I also find it a bad idea, but github may not be the place to revive the debate

@samuela
Copy link
Contributor Author

samuela commented Jun 19, 2015

This just reeks of MATLAB to me.

@jakebolewski
Copy link
Member

@samuela the supported iteration / indexing interface is to basically treat scalars as zero dimensional arrays.

@Keno
Copy link
Member

Keno commented Jun 19, 2015

It is way too easy to miss a colon in a for loop and only do one iteration - I remember vividly how I spent a week debugging why my locality sensitive hash wasn't local - I had for i = x instead of for i = 1:x for the number of rounds boom.

@nalimilan
Copy link
Member

Yeah, I'm not a great fan of this either. :-/

@JeffBezanson
Copy link
Member

Yes I really think we should experiment with removing iteration of numbers. I think the array indexing redesign might make this possible.

@johnmyleswhite
Copy link
Member

It is way too easy to miss a colon in a for loop and only do one iteration

This is the bug I most often hit these days.

@mbauman
Copy link
Member

mbauman commented Jun 19, 2015

Here was my last 15-minute attempt at making numbers non-iterable: #10331 (comment)

@carlobaldassi
Copy link
Member

Just to balance the opinions here, I'll mention I use this feature a lot and find it quite handy.

@joehuchette
Copy link
Member

@carlobaldassi what does a typical use look like?

@carlobaldassi
Copy link
Member

@joehuchette : for example: I have functions which explore ranges of parameters for collecting data. Most of the time I need to change one parameter or two, and keep the rest at some value. So I just have function signatures like f(A, B, C) and in the code there are loops which look like for a in A, b in B, c in C. Then I can call either f(1.0, 2.0, 0.1:0.1:1.0) or f(1:0.5:10, 2.0, 1.0) etc.
This is just the first example which comes to mind and may not be the best; in general I have lots of places in the code I've written which use the iterable interface with the assumption that I can pass scalars to them.

@samuela
Copy link
Contributor Author

samuela commented Jun 19, 2015

Why not f([1], [2], 0.1:0.1:1.0) or f(1:0.5:10, [2.0], [1.0]) then?

@ScottPJones
Copy link
Contributor

@carlobaldassi What does that do to the code generated? Doesn't that cause problems of type instability, if the argument could either be a scalar or a range? Instead of [1], which I would think would also have type instability problems, would 1.0:1.0 work? A little more typing, but if it makes the code run faster... (I guess you'd need to look at @code_native to see for sure). Of course, I may be all wet on this!

@carlobaldassi
Copy link
Member

@samuela; Here's a line I just entered in the REPL minutes ago:

julia> span(α=0.75,S=0.991:0.001:0.999, x=5.935451778194593, q0=0.653270041192449,q1=0.9806204541529127,δq=0.0007726138979231578,q̂0=0.140152147115123,q̂1=0.5386512290281629,δq̂=0.030249748497618967,Ŝ=0.48377607373470927)

I suppose you can see how it might become annoying to add [] around each entry which might be iterated over. Of course, this is not "nice" code which would get into julia base or a public package — it's a rather dirty and hands-on data exploration task. The code is also kind of fluid and follows the results and ideas which come up as the task proceeds, so what's iterated may change (often).

Anyway, as I said, it's probably not the best example, just one possible way to show how my life is a little easier because numbers are iterable.

@ScottPJones : this is not a hot loop, it's just the interface. Each iteration takes at least tens of seconds, at worst hours (and I can assure you I've been optimizing that quite aggressively). As I said, it's basically an API for me interacting with the REPL, and you may see that I already type plenty.

@ScottPJones
Copy link
Contributor

OK, thanks for the explanation... with all this math stuff, I feel it's way over my head sometimes! 😀

@timholy
Copy link
Member

timholy commented Jun 20, 2015

The biggest advantage of our current system is that it allows you to use a single algorithm for vectors or scalars without having to allocate an array to pass a scalar. There are many instances where simply wrapping that scalar as [s] would cause an unacceptable performance hit (we're not talking 20% here, it's many-fold slower execution). For what it's worth, here's a trivial demo:

julia> function sumelts(A)
           s = 0.0
           for a in A
               s += a
           end
           s
       end
sumelts (generic function with 1 method)

julia> function timeit1(n)
           S = 0.0
           for i = 1:n
               S += sumelts(i)
           end
           S
       end
timeit1 (generic function with 1 method)

julia> function timeit2(n)
           S = 0.0
           for i = 1:n
               S += sumelts([i])
           end
           S
       end
timeit2 (generic function with 1 method)

# Warmup suppressed

julia> @time timeit1(10^6)
  12.690 milliseconds (6 allocations: 176 bytes)
5.000005e11

julia> @time timeit2(10^6)
  94.928 milliseconds (1999 k allocations: 93742 KB, 21.42% gc time)
5.000005e11

While on balance I favor this feature, at the same time I am aware that it can cause problems. If folks decide it's simply too bug-prone, I will learn to live with its removal and write two implementations of algorithms where I am exploiting this feature.

@Keno
Copy link
Member

Keno commented Jun 20, 2015

I know it's not idea, but what about an iterator that for scalars makes them iterable and for everything else just passes through. That way in the limited cases where this is useful, it is still possible (albeit a little ugly), while avoiding the trap in the common case.

@timholy
Copy link
Member

timholy commented Jun 20, 2015

Like this?

julia> immutable Container{T}
           val::T
       end

julia> Base.start(c::Container) = 1
start (generic function with 47 methods)

julia> Base.next(c::Container, index::Int) = (c.val, 2)
next (generic function with 60 methods)

julia> Base.done(c::Container, index::Int) = index != 1
done (generic function with 49 methods)

julia> function timeit3(n)
           S = 0.0
           for i = 1:n
               S += sumelts(Container(i))
           end
           S
       end
timeit3 (generic function with 1 method)

julia> @time timeit3(10^6)
  75.680 milliseconds (3000 k allocations: 46875 KB, 13.54% gc time)
5.000005e11

Sniff.

@johnmyleswhite
Copy link
Member

What about a range? Is that going to cause numeric problems?

function timeit4(n)
    S = 0.0
    for i in 1:n
        S += sumelts(i:i)
    end
    S
end

@Keno
Copy link
Member

Keno commented Jun 20, 2015

immutable Container{T}
   val::T
end
Base.start{T<:Number}(c::Container{T}) = false
Base.next{T<:Number}(c::Container{T}, index::Bool) = (c.val, true)
Base.done{T<:Number}(c::Container{T}, index::Bool) = index
Base.start(c::Container) = start(v.cal)
Base.next(c::Container,i) = next(v.cal,i)
Base.done(c::Container,i) = done(v.cal,i)
julia> @time timeit1(10^6)
   5.942 milliseconds (6 allocations: 176 bytes)
5.000005e11
julia> @time timeit3(10^6)
   4.753 milliseconds (6 allocations: 176 bytes)
5.000005e11

It is possible that we have a codegen regression within the past week, there have been a number of changes to codegen, but on my week old version of julia the generated code is identical.

@timholy
Copy link
Member

timholy commented Jun 20, 2015

Much better---amazing what a difference a small tweak makes. @johnmyleswhite, your version is also faster than what I posted, though worse than @Keno's if you use Float64 rather than Int.

Assuming this solution generalizes to less-trivial functions, I have no substantive objections to changing this.

@mbauman
Copy link
Member

mbauman commented Jun 20, 2015

This is effectively what I did to simplify setindex! with broadcasting scalars (except instead of only iterating once they repeat forever): multidimensional.jl:304-307.

@JeffBezanson
Copy link
Member

I tried @timholy 's code and I don't see any allocation. Maybe a slow definition was polluting the method cache somehow?

Another good option is to use a 1-tuple (S += sumelts((i,))). It's just as fast and doesn't require a new type.

@Keno
Copy link
Member

Keno commented Jun 20, 2015

@JeffBezanson well, there's the concern on not wanting to type those three extra characters at the REPL every time.

@JeffBezanson
Copy link
Member

Libraries could also use

containerize(x::Number) = (x,)
containerize(y) = y

instead of a new type.

@nalimilan
Copy link
Member

FWIW, one reason not to treat scalars as collections is the inconsistency with indexing dropping or not (trailing) dimensions. For example, if x is a matrix, x[:,i] can be either a vector or a matrix depending on whether i is a scalar, which goes against the idea of treating and 1 the same [1] in other places.

@JeffBezanson
Copy link
Member

Numbers are actually 0-dimensional, so they have a single element without being vectors.

@yuyichao yuyichao mentioned this issue Jul 15, 2015
@matthieugomez
Copy link
Contributor

I also lost a couple of hours due to

for i in length(x)

It'd be great to catch this in some way (which I now do via syntax highlighting, at least).

@malmaud
Copy link
Contributor

malmaud commented Jul 15, 2015

While I still support making scalars non-iterable, the rise of the eachindex(X) paradigm instead of 1:length(X) has made it less pressing.

@rofinn
Copy link
Contributor

rofinn commented Aug 18, 2015

+1 for making scalars non-iterable. I think this would help solve several other consistency issues. For example findin(array, x) works fine when x is scalar, but you need to do findin(array, [x]) for everything else.

@KristofferC
Copy link
Member

If something is messed up with my code, it feels like it is 90% of the times because I have written for i in length(x). Guess I just gotta get used to eachindex.

@ScottPJones
Copy link
Contributor

Maybe make scalars non-iterable first thing in 0.5? If eachindex is a better replacement, why keep something causing so much grief?

@samuela
Copy link
Contributor Author

samuela commented Aug 19, 2015

@ScottPJones +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests