Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type inference infinite loop #10340

Closed
wbhart opened this issue Feb 26, 2015 · 21 comments
Closed

type inference infinite loop #10340

wbhart opened this issue Feb 26, 2015 · 21 comments
Assignees
Labels
needs tests Unit tests are required for this change types and dispatch Types, subtyping and method dispatch

Comments

@wbhart
Copy link

wbhart commented Feb 26, 2015

I have encountered what I believe to be a bug in Julia's dispatch, which I haven't seen before.

When I add a completely unrelated method for the + operator, code that can't ever call the new + operator hangs. Remove the new + operator, everything is fine.

I've cut the issue down to a semiminimal example in:
https://github.com/wbhart/Nemo.jl/tree/polyaddbug

Note that the only files actually loaded are Nemo.jl, Rings.jl, Poly.jl, fmpz_poly.jl. I've moved the entire definition of the Fraction type into Rings.jl so that no addition files are actually loaded when using Nemo.

To build the cut down Nemo exhibiting the bug,

Pkg.clone("https://github.com/wbhart/Nemo.jl")
Pkg.checkout("Nemo","polyaddbug")
Pkg.build("Nemo")

The following code will work:

using Nemo
R, x = PolynomialRing(ZZ, "x")
S, y = PolynomialRing(R, "y")
T, z = PolynomialRing(S, "z")
U, t = PolynomialRing(T, "t")
V, u = PolynomialRing(U, "u")
t + u

However, the following code will not work (restart the REPL first):

using Nemo
R, x = PolynomialRing(ZZ, "x")
S, y = PolynomialRing(R, "y")
T, z = PolynomialRing(S, "z")
U, t = PolynomialRing(T, "t")
V, u = PolynomialRing(U, "u")
+{T <: RingElem}(a::T, b::Fraction{T}) = b + a  # this is the added defn of +
t + u

You can inspect the types of t and u and they are most certainly not Fraction's, so t + u most definitely should not call the Fraction + operator.

The + it is supposed to call is the generic catchall in Rings.jl:

function +{S <: RingElem, T <: RingElem}(x::S, y::T) 
   T1 = promote_type(S, T)
   if S == T1
      +(x, parent(x)(y))
   elseif T == T1
      +(parent(y)(x), y)
   else
      error("Unable to promote ", S, " and ", T, " to common type")
   end
end

I can confirm that this generic catchall is called if you don't add the additional Fraction + operator above and it is not called if you do.

A hint is the fact that this behaviour only shows up 5 levels deep. Replacing t + u (types 5 levels deep) with z + t (types four levels) works just fine.

I'm using Julia Version 0.4.0-dev+2924.

@simonbyrne
Copy link
Contributor

@wbhart Can you create a minimal example that doesn't require building an external library? Preferably one that would fit in a single file?

@simonbyrne
Copy link
Contributor

Also, why not use the julia promotion machinery?
http://julia.readthedocs.org/en/latest/manual/conversion-and-promotion/

You could then use something like,

+(x::RingElem, y::RingElem) =  +(promote(x,y)...)

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

Regarding the dependency on flint, this is a pretty complex library, and I don't think I can remove the dependency easily. However, from the point that t and u are created, at which point you can introspect the values and types easily, there is only two lines of code. So I don't think me spending a few days removing that dependency actually buys you anything.

Were there problems building the external library on your machine? Note that Pkg.build("Nemo") should do all that automatically for you, at least on Linux.

An important point I should have mentioned in the original bug report is this: Fraction is not used ever in any of the files loaded when you do using Nemo, except for where the type is (currently) defined in Rings.jl and in the single line of code in the failing example. In other words, the bug is this. I have working code, add a single new type definition, which is never used anywhere. Define a single addition operator for that type, and existing code breaks. That should not be possible under any circumstances.

Regarding your question about why we don't use the promotion mechanism, we can't do that. A Julia type can't contain all the information we need. That's how the old Nemo worked, but people on the Julia list complained that we were using the type parameter and dispatch mechanisms for things that it wasn't designed for (which we were).

Using parent objects, as opposed to Julia types is the only other way to model fully recursive parameterised mathematical types that I am aware of. This allows us to use Julia in a completely standard way whilst still retaining extra information about the "mathematical types" (parents) of objects that should not be pushed into the type parameters in Julia.

@simonbyrne
Copy link
Contributor

The problem is that as the amount of effort required to reproduce the problem increases, the less likely that other people will be willing to put the effort into resolving the problem.

I did actually get some sort of build error:

==========================================================================
Shall we try to build pari 2.8.0 (development 17566-be4cfd0) now (y/n)? [n]
Ok. Type "make install" when you are ready
Bye !
Making gp in Odarwin-i386
../config/genkernel ../src/kernel/x86_64/asm0.h > parilvl0.h
if test -r ./tune.h; then d=.; else d=../src/kernel/gmp; fi;          cat $d/tune.h ../src/kernel/gmp/int.h ../src/kernel/none/level1.h > parilvl1.h
cat ../src/mt/pthread.h > parimt.h
cat ../src/kernel/gmp/mp.c ../src/kernel/none/cmp.c ../src/kernel/none/gcdll.c ../src/kernel/none/ratlift.c  ../src/kernel/none/invmod.c ../src/kernel/gmp/gcd.c ../src/kernel/gmp/gcdext.c ../src/kernel/none/mp_indep.c ../src/kernel/none/add.c > mpker.c
bison -d ../src/language/parse.y -o ../src/language/parse.c
../src/language/parse.y:25.14-20: syntax error, unexpected string, expecting =
make[1]: *** [../src/language/parse.h] Error 1
make[1]: *** Waiting for unfinished jobs....
File ../funclist updated.
make: *** [gp] Error 2
================================[ ERROR: Nemo ]=================================

LoadError: failed process: Process(`make -j4 gp`, ProcessExited(2)) [2]
while loading /Users/simon/.julia/v0.4/Nemo/deps/build.jl, in expression starting on line 121

================================================================================

Using parent objects, as opposed to Julia types is the only other way to model fully recursive parameterised mathematical types that I am aware of.

Ah, I see.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

Sure, I understand that.

Fortunately, Pari is not needed. I will commit the following changes which will remove that dependency:

  1. remove lines 111-124 of deps/build.jl
  2. remove line 13 of src/Nemo.jl

I'll check this works and commit in a minute or two. Hopefully you will at least be able to build it then. It would at least be good to have an independent replication of the bug. The alternative is probably me spending 2-3 days rewriting a large amount of code. I will if necessary, but just hoping I can possibly avoid it.

There is another dispatch bug related to ccall anyway that I will report separately. So the effort of building the external library won't be wasted.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

I have now removed the pari dependency and verified the bug still exists.

However, this was a very worthwhile exercise because I now have a stack trace, which will probably be a very big clue as to where the bug is in Julia:

in typeinf at ./inference.jl:1625
in typeinf at ./inference.jl:1344
in abstract_call_gf at ./inference.jl:772
in abstract_call at ./inference.jl:925
in abstract_eval_call at ./inference.jl:977
in abstract_eval at ./inference.jl:1004
in abstract_eval_arg at ./inference.jl:941
in typeinf at ./inference.jl:1625
in typeinf at ./inference.jl:1344
in abstract_call_gf at ./inference.jl:772
in abstract_call at ./inference.jl:925
in abstract_eval_call at ./inference.jl:977
in abstract_eval at ./inference.jl:1004
in abstract_eval_arg at ./inference.jl:941
in typeinf at ./inference.jl:1625
in typeinf at ./inference.jl:1344
in abstract_call_gf at ./inference.jl:772
in abstract_call at ./inference.jl:925
in abstract_eval_call at ./inference.jl:977
in abstract_eval at ./inference.jl:1004
in abstract_interpret at ./inference.jl:1167
in typeinf at ./inference.jl:1553
in typeinf at ./inference.jl:1344
in abstract_call_gf at ./inference.jl:772
in abstract_call at ./inference.jl:925
in abstract_eval_call at ./inference.jl:977
in abstract_eval at ./inference.jl:1004
in abstract_eval_arg at ./inference.jl:941
in typeinf at ./inference.jl:1625
in typeinf at ./inference.jl:1344
in abstract_call_gf at ./inference.jl:772
in abstract_call at ./inference.jl:925
in abstract_eval_call at ./inference.jl:977
in abstract_eval at ./inference.jl:1004
in abstract_eval_arg at ./inference.jl:941
in typeinf at ./inference.jl:1625
in typeinf_ext at ./inference.jl:1338
in != at ./operators.jl

This is the tail end of a few hundred pages of repetitious stack trace which just repeats over and over when I press CTRL-C to break out of the hang. Unfortunately there's more than 9999 lines of it, so I can't capture the start of the trace.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

Another perhaps relevant/interesting fact is that after breaking out of the hang, with that stack trace, issuing t + u again works fine. But then u + t fails to work, until I press CTRL-C, then both t + u and u + t work.

The following may also be useful information:

julia> typeof(t)
Nemo.Poly{Nemo.Poly{Nemo.Poly{Nemo.fmpz_poly{:x},:y},:z},:t}

julia> typeof(u)
Nemo.Poly{Nemo.Poly{Nemo.Poly{Nemo.Poly{Nemo.fmpz_poly{:x},:y},:z},:t},:u}

julia> isa(t, Fraction)
false

julia> isa(u, Fraction)
false

julia> typeof(t) <: Fraction
false

julia> typeof(u) <: Fraction
false

That is all as I expect it to be, but it may be useful in tracking down the bug.

I will shortly write a summary of the relevant parts of the type hierarchy that is being used here, so that it is perhaps easier to diagnose the problem.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

Here is a simplified summary of the type hierarchy being used here. I don't know if it is useful on its own without the supporting code to create values, but it might help in understanding what is actually being created here:

abstract Ring

abstract RingElem

abstract PolyElem <: RingElem

type FmpzPolyRing{S} <: Ring
    base_ring::Ring
end

# for an external flint polynomial type
type fmpz_poly{S} <: PolyElem
    coeffs::Ptr{Void}
    alloc::Int
    length::Int
    parent::FmpzPolyRing{S}
    # constructors omitted
end

type PolynomialRing{T <: RingElem, S} <: Ring
    base_ring :: Ring
    # constructors etc
end

type Poly{T <: RingElem, S} <: PolyElem
    coeffs::Array{T, 1}
    length::Int
    parent::PolynomialRing{T, S}
    # constructors omitted
end

Note that FmpzPolyRing and PolynomialRing are for parent objects, which contain extra data we don't want to foist onto the Julia dispatch and dependent type mechanisms. Each polynomial object contains a "pointer" to its parent object, which gives rich mathematical type information (in this case just the ring on which it depends, which is another parent object for that ring).

The definition of Fraction is as follows:

abstract Field <: Ring

abstract FieldElem <: RingElem

type FractionField{T <:RingElem} <: Field
    base_ring::Ring
    # constructors omitted
end

type Fraction{T <: RingElem} <: FieldElem
    num::T
    den::T
    parent::FractionField{T}
    # constructors omitted
end

The FractionField type is for parent objects of an object of type Fraction.

@ihnorton
Copy link
Member

A hint is the fact that this behaviour only shows up 5 levels deep. Replacing t + u (types 5 levels deep) with z + t (types four levels) works just fine.

This has come up in a number of issues. The only one I can quickly find is #10230, but see some of the issues linked to/from #8974 as well.

@simonbyrne
Copy link
Contributor

A hint is the fact that this behaviour only shows up 5 levels deep. Replacing t + u (types 5 levels deep) with z + t (types four levels) works just fine.

It seems to be more complicated than that: mine also hangs for y+z, which is only 3 levels (though unlike t+u, I can't Ctrl-C out of it).

@ihnorton ihnorton added the types and dispatch Types, subtyping and method dispatch label Feb 27, 2015
@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

That MAX_TYPE_DEPTH is the scariest thing I have seen in a very long time.
Please tell me that is not a permanent feature or that it is not what I
think it is.

By the way, the case where this was hit was a standard benchmark, which we
call the Pearce30 benchmark after it appeared in a paper by someone called
Pearce. From a computer algebra point of view, it's a completely routine
example. It's not designed to tax the type system!

On 27 February 2015 at 14:33, Isaiah notifications@github.com wrote:

A hint is the fact that this behaviour only shows up 5 levels deep.
Replacing t + u (types 5 levels deep) with z + t (types four levels) works
just fine.

This has come up in a number of issues. The only one I can quickly find is
#10230 #10230, but see some of
the issues linked to/from #8974
#8974 as well.


Reply to this email directly or view it on GitHub
#10340 (comment).

@jiahao
Copy link
Member

jiahao commented Feb 27, 2015

No one can tell you if it is what you think it is without reading your mind, or at the very least more explanation of what you think it is.

The general type inference problem is undecidable, since Julia's type system allows for the description of infinitely nested types. Any practical algorithm must necessarily give up somewhere, and MAX_TYPE_DEPTH is one of these truncation parameters whereby the algorithm decides to give up looking for the least upper bound on the correct type and finds a wider upper bound.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

On 27 February 2015 at 16:27, Jiahao Chen notifications@github.com wrote:

No one can tell you if it is what you think it is without reading your
mind.

You can't read minds either? :-)

I was really worried that this represents a hard limit on the depth of Julia types.

I know type inference can hang if it doesn't give up. But I'm really hoping that "give up" doesn't entail not working at all.

I guess I was also reacting to the fact that this must be set very low to
cause problems at types 5 levels deep. C++ easily handles types much deeper
than that. We are going to routinely hit this. It's also not been a problem
before 0.4.

The general type inference problem is undecidable. Any practical algorithm
must necessarily give up somewhere, and MAX_TYPE_DEPTH is one of these
truncation parameters.

I didn't realise this would affect Julia.

I was aware that global type inference is undecidable in the presence of
recursive types (?). But local type inference should be decidable, right?

And Julia knows the types of arguments to a function at runtime, in the
worst case, due to type inference being dynamic. Which I thought
effectively bought you local type inference in the worst case.

Also inference in the presence of dependent types is undecidable if those
types are dynamic, i.e. depend on runtime values. But Julia only allows
static dependent types doesn't it (modulo the fact that the user may have
just defined the type in the REPL)?

So whilst I can understand Julia first trying to do some global type
inference, it should surely bail if it fails, and fall back to something
else?

Sorry if this is all just completely wrong. I would actually really like to
understand this issue more fully. It's really critical to our project that
we are going to be able to work with dependent types quite a few levels
deep fairly reliably.

Please tell me I am worrying for nothing and that this behaviour is
considered a real bug which will eventually be fixed.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

@simonbyrne I can confirm y + x does not work for me either, and I cannot even break out of the loop with CTRL-C.

I am now actually a bit happier, as this is clearly not intended behaviour, as we aren't even two levels deep. It's a clear bug -- my guess is in Julia's type inference itself.

@wbhart
Copy link
Author

wbhart commented Feb 27, 2015

Ok, I think the below is a minimal example that reproduces the bug. No external libraries are required. You can plug this code straight into Julia.

abstract Ring

abstract RingElem

abstract PolyElem <: RingElem

type IntegerRing <: Ring
end

ZZ = IntegerRing()

call(::IntegerRing, a::Integer) = a

type FmpzPolyRing{S} <: Ring
    base_ring::Ring
end

type fmpz_poly{S} <: PolyElem
    length::Int
    parent::FmpzPolyRing{S}
end

function +{S}(a::fmpz_poly{S}, b::fmpz_poly{S})
   z = fmpz_poly{S}(a.length + b.length, a.parent)
   return z
end

type PolynomialRing{T <: RingElem, S} <: Ring
    base_ring :: Ring
end

type Poly{T <: RingElem, S} <: PolyElem
    length::T
    parent::PolynomialRing{T, S}
end

function +{T <: RingElem, S}(a::Poly{T, S}, b::Poly{T, S})
   z = Poly{T, S}(a.length + b.length, a.parent)
   return z
end

function Base.call{T <: RingElem, S}(a::PolynomialRing{T, S}, b::T)
   z = Poly{T, S}(b, a)
   z.parent = a
   return z
end

abstract Field <: Ring

abstract FieldElem <: RingElem

type FractionField{T <:RingElem} <: Field
    base_ring::Ring
end

type Fraction{T <: RingElem} <: FieldElem
    num::T
    den::T
    parent::FractionField{T}
end

Base.promote_rule{T <: RingElem, S, V <: Integer}(::Type{Poly{T, S}}, ::Type{V}) = Poly{T, S}

Base.promote_rule{T <: RingElem, S}(::Type{Poly{T, S}}, ::Type{T}) = Poly{T, S}

function +{S <: RingElem, T <: RingElem}(x::S, y::T)
   T1 = promote_type(S, T)
   if S == T1
      +(x, x.parent(y))
   elseif T == T1
      +(y.parent(x), y)
   else
      error("Unable to promote ", S, " and ", T, " to common type")
   end
end

ZZx = FmpzPolyRing{:x}(ZZ)

x = fmpz_poly{:x}(1, ZZx)

z = fmpz_poly{:x}(1, ZZx)

ZZyx = PolynomialRing{fmpz_poly{:x}, :y}(ZZx)

y = Poly{fmpz_poly{:x}, :y}(z, ZZyx)

+{T <: RingElem}(a::T, b::Fraction{T}) = b + a

y + x

x + y

@JeffBezanson
Copy link
Member

Thanks for the reduced test case. I'll work on this.

it should surely bail if it fails

Quite right, but deciding when to bail is tricky, and here we are probably hanging on past the point when we should have bailed :)

@JeffBezanson JeffBezanson added the bug Indicates an unexpected problem or unintended behavior label Mar 3, 2015
@JeffBezanson JeffBezanson self-assigned this Mar 3, 2015
@JeffBezanson
Copy link
Member

In trying to fix this I ended up fixing #1631 instead.

@JeffBezanson JeffBezanson changed the title Dispatch bug for nested types type inference infinite loop Mar 4, 2015
@wbhart
Copy link
Author

wbhart commented Mar 4, 2015

That's really great news. I had been planning on reporting that one next if
it wasn't fixed. I hadn't realised it had been reported.

On 4 March 2015 at 22:21, Jeff Bezanson notifications@github.com wrote:

In trying to fix this I ended up fixing #1631
#1631 instead.


Reply to this email directly or view it on GitHub
#10340 (comment).

@JeffBezanson
Copy link
Member

I believe this has the same root cause as #7534.

@ihnorton ihnorton added needs tests Unit tests are required for this change and removed bug Indicates an unexpected problem or unintended behavior labels Mar 6, 2016
@ihnorton
Copy link
Member

ihnorton commented Mar 6, 2016

@wbhart's test case appears be be fixed on latest master. Does this need a test before closing?

@vtjnash vtjnash closed this as completed Mar 25, 2016
@tkelman
Copy link
Contributor

tkelman commented Mar 28, 2016

need a test before closing

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs tests Unit tests are required for this change types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

7 participants