Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault while compiling package using a foreign type #700

Closed
fingolfin opened this issue Jun 17, 2022 · 1 comment
Closed

Segfault while compiling package using a foreign type #700

fingolfin opened this issue Jun 17, 2022 · 1 comment

Comments

@fingolfin
Copy link
Member

fingolfin commented Jun 17, 2022

tl;dr: it seems that when using PackageCompiler.create_sysimage, if module A depends on module B, then it can happen that the precompiled data for A is deserialized before B.__init__ was run. This causes a crash when A is referencing a foreign type which is provided by B.__init__ at runtime only.


(Disclaim: I realize the following may be "my fault", and I am willing to work on fixing it, but I simply don't understand the various moving parts and components enough to do it on my own, so I hope to get some help here).

The following issue happens with all Julia versions I tested from 1.6 to current master. This is with PackageCompiler 2.0.6 in case it matters.

The GAP.jl package is a wrapper around the computer algebra system GAP. For various reasons, people are interested to run PackageCompiler.jl on code using GAP.jl (among many other things). Unfortunately, this segfaults (see here for the initial report. A minimal reproducer: add PackageCompiler to GAP to a fresh project, then:

julia> using PackageCompiler ; PackageCompiler.create_sysimage([:GAP], sysimage_path="bla")
⣠ [00m:04s] PackageCompiler: compiling incremental system image
signal (11): Segmentation fault: 11
in expression starting at /var/folders/d_/1zss2fnd6xdgclqnj8jj_27m0000gp/T/jl_EhKTEwUAk6:18
jl_unwrap_unionall at /Users/mhorn/Applications/Julia-1.8.0-rc1.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
jl_deserialize_value at /Users/mhorn/Applications/Julia-1.8.0-rc1.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
Allocations: 931831 (Pool: 930917; Big: 914); GC: 1
✖ [00m:05s] PackageCompiler: compiling incremental system image
...

Analyzing this further, I've determined that it crashes deserializing the type GAP_jll.GapObj (which was precisely what I expected to see once I saw the initial report...). I'll get into background in a moment, let me just quote the code in function jl_deserialize_value_any in which the crash happens:

    if (dt == jl_typename_type) {
        int internal = read_uint8(s->s);
        ...
        jl_module_t *m = (jl_module_t*)jl_deserialize_value(s, NULL);
        jl_sym_t *sym = (jl_sym_t*)jl_deserialize_value(s, NULL);
        if (internal) {
            tn->module = m;
            tn->name = sym;
            ...
        else {
            jl_datatype_t *dt = (jl_datatype_t*)jl_unwrap_unionall(jl_get_global(m, sym)); // CRASHES HERE
            assert(jl_is_datatype(dt));
            tn = dt->name;
            backref_list.items[pos] = tn;
        }
        return (jl_value_t*)tn;

The reason it crashes is because jl_get_global(m, sym) (with m = GAP_jll, sym = :GapObj) returns a NULL pointer...


Some background: that type is a "foreign type", a very special kind of type that as far as I know is used precisely by GAP.jl resp. GAP_jll.jl and nothing else. As such, it is not surprising to find things that don't work well with such types, as they were never written with them in mind (indeed, see this issue for some discussions on issues with precompilation etc.).

Aaaanyway: instances of such foreign types can not be (de)serialized; in fact there is a check in dump.c that prevent serializing such instances.

However, so far serializing the foreign type itself worked fine. Indeed, I've inserted the following right before the crashing line with the jl_unwrap_unionall call, in Julia master::

  fprintf(stderr, " jl_get_global(%s, %s) = %p\n", jl_symbol_name(m->name), jl_symbol_name(sym), jl_get_global(m, sym));

If I just do using GAP in a Julia session, I see this, among many other things:

...
 jl_get_global(Core, Tuple) = 0x11bbd69e0
 jl_get_global(GAP_jll, GapObj) = 0x10cb5c010
 jl_get_global(Base, #getproperty) = 0x11bfa4c30
...

But with the above PackageCompiler call, I see this:

...
 jl_get_global(Base, #steprange_last_empty) = 0x145b1d5b0
 jl_get_global(Core, TypeMapEntry) = 0x143a89d40
 jl_get_global(Core, Tuple) = 0x14397a9e0
 jl_get_global(GAP_jll, GapObj) = 0x0

signal (11): Segmentation fault: 11

So, this looks to me as if it is trying to deserialize the type before the __init__ method of GAP_jll has been run, which is responsible for initializing GAP_jll.GapObj.

--

So far, so good. But now how to fix this?

Any pointers would be greatly appreciated!

@fingolfin
Copy link
Member Author

I wonder why the difference: why is the type being deserialized here before __init__ is run, while it works fine in a regular session? Presumably this is because here, with a sysimage, everything is deserialized at once, before any __init__ methods are called? In that case, I guess I am out of luck to fix this without a major change... E.g. one could consider putting all foreign types being deserialized into a list, and then properly initialize them at the end -- or something like that. Doesn't sound very appealing.

Am I right in this guess how things work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants