Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom (de)serialization when saving and restoring precompiled modules (for types containing pointers, or foreign types) #46214

Open
fingolfin opened this issue Jul 29, 2022 · 3 comments

Comments

@fingolfin
Copy link
Contributor

fingolfin commented Jul 29, 2022

The code in src/dump.c and src/staticdata.c does not allow customization of how instances of specific types are serialized.

That is a problem for structs that store Ptr to external data: if one uses instances of these on the global level in a package module, then the (de)serialized data contains garbage pointers. Funcrashes ensue. See e.g. Nemocas/Nemo.jl#810 for a real life instance of this.

Indeed, the Julia stdlib is affected by this: BigInt wraps GMP... it only "works" because src/dump.c resp. src/staticdata.c contain custom code to deal with (de)serializing BigInt.

It would be nice if packages could also hook into this serialization code. I realize this may be difficult to achieve. So as an intermediate stop-gap solution, it would be nice if at the least one could mark types such that any attempts to serialize an instance of them leads to an precompilation error. That way, at least one can prevent accidental misuses. I would hope this is much simpler to implement.

ADDED: The same API presumably could also be used to allow (de)serializing instances of "foreign types", created via jl_new_foreign_type

@JeffBezanson
Copy link
Sponsor Member

Currently we save all Ptr fields as NULL, in hopes that it can be recognized and fixed up (when the object is used) in the new process. Is that possible in your case?

@vchuravy
Copy link
Member

vchuravy commented Aug 2, 2022

Something to also consider is that for some data structures, ones that implement caches of some sort, it would be very interesting to have a customized serialization. As an example the CodeCache in GPUCompiler.jl.

@fingolfin
Copy link
Contributor Author

Currently we save all Ptr fields as NULL, in hopes that it can be recognized and fixed up (when the object is used) in the new process. Is that possible in your case?

The problem with that is that it effectively means one has to insert checks for NULL pointers everywhere which can hamper performance.

The simplest "solution" (just to get started) that I can think of would be to allow registering C (?) callbacks for certain types that allow (de)serializing them.

Example for bigint (just copy & pasted the relevant code):

// to be called from `jl_serialize_value_` from `dump.c`
// TODO: maybe allow for an error return code?
// TODO: actually there is also serialization code for bigint in `jl_write_values`
//       (file `staticdata.c`), but I don't understand how they relate
void my_serialize_bigint(jl_serializer_state *s, jl_value_t *v)
{
        write_uint8(s->s, TAG_SHORT_GENERAL);
        write_uint8(s->s, jl_datatype_size(jl_bigint_type));
        jl_serialize_value(s, jl_bigint_type);
        jl_value_t *sizefield = jl_get_nth_field(v, 1);
        jl_serialize_value(s, sizefield);
        void *data = jl_unbox_voidpointer(jl_get_nth_field(v, 2));
        int32_t sz = jl_unbox_int32(sizefield);
        size_t nb = (sz == 0 ? 1 : (sz < 0 ? -sz : sz)) * gmp_limb_size;
        ios_write(s->s, (char*)data, nb);
}

// to be called from jl_write_values
void my_deserialize_bigint(jl_serializer_state *s, jl_value_t *v)
{
        jl_value_t *sizefield = jl_deserialize_value(s, NULL);
        int32_t sz = jl_unbox_int32(sizefield);
        int32_t nw = (sz == 0 ? 1 : (sz < 0 ? -sz : sz));
        size_t nb = nw * gmp_limb_size;
        void *buf = jl_gc_counted_malloc(nb);
        if (buf == NULL)
            jl_throw(jl_memory_exception);
        ios_readall(s->s, (char*)buf, nb);
        jl_set_nth_field(v, 0, jl_box_int32(nw));
        jl_set_nth_field(v, 1, sizefield);
        jl_set_nth_field(v, 2, jl_box_voidpointer(buf));
}

That would be just a quick & dirty solution, though... Of course the exact behavior of these callbacks would need to be specified; and a way to store them (in the type object? in a global dictionarY?). Also it would perhaps be better if jl_serializer_state was an opaque type, with accessor functions.

Even nicer would be if one could write such (de)serialization functions in pure Julia...

@fingolfin fingolfin changed the title Allow custom (de)serialization when saving and restoring precompiled modules Allow custom (de)serialization when saving and restoring precompiled modules (for types containing pointers, or foreign types) Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants