Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary compatibility in case of separate compilation #102

Closed
skuzmich opened this issue Jul 9, 2020 · 19 comments
Closed

Binary compatibility in case of separate compilation #102

skuzmich opened this issue Jul 9, 2020 · 19 comments

Comments

@skuzmich
Copy link

skuzmich commented Jul 9, 2020

In some languages, including Kotlin, changing the set of private class fields is
considered a binary compatible change,
meaning that app don't have to be recompiled when library changes some of its private fields.

What would be a suggested way to implement this kind of separate compilation in Wasm GC?

What I'm interested in is a solution that allows:

  • Child classes to be a subtypes of parent class.
  • Changing list of private fields fields without changing the type.
  • Adding or reordering public fields without changing the type.
  • Adding super classes without changing the type.

Seems like we can achieve that if we expose all object types as an array of anyref (or similar) where elements correspond to classes in inheritance chain and contain their own non-inherited fields.
This would, unfortunately, mean a lot of indirections and extra casts for classes which can be used across modules. Could there be a better way?

@RossTate
Copy link
Contributor

RossTate commented Jul 9, 2020

I raised this issue at the in-person meeting. @rossberg suggested exactly what you propose. And it's actually worse than what you describe, because without knowing how many (private) fields of inherited classes there are you can't know the offsets of your own fields. Furthermore, @rossberg has insisted that imported values only be incorporated at instantiation time rather than at compile time, so every field access will involve a dynamic load of the imported i32 indicating the size of the inherited class, which will then be added to a constant, which then will be looked up into an array, which then will require performing another array lookup to rtt-cast the fetched value to the expected type, which then has to be unboxed in the case of primitive types.

You should also be warned that the recent change to add an index to the type of rtt means you cannot introduce a superclass to an existing class without also recompiling every class that extends that existing class (because the change bumps up the indices of the rtt's for these classes).

@skuzmich
Copy link
Author

skuzmich commented Jul 9, 2020

And it's actually worse than what you describe, because without knowing how many (private) fields of inherited classes there are you can't know the offsets of your own fields.

The method I described allocates inherited fields in a separate struct, this way field offsets don't depend on super class size.

I haven't benchmarked it but those field accesses doesn't look performant. But, I guess , we could ship a custom linker to link relocatable wasm modules inside a browser to keep field accesses fast and still use some of the browser cache for unchanged modules.

@RossTate
Copy link
Contributor

RossTate commented Jul 9, 2020

Ah, sorry, I misunderstood your description. But if you want to be able to add superclasses without recompiling, how do you know which index in the array has the structure for your class’s fields?

@skuzmich
Copy link
Author

skuzmich commented Jul 9, 2020

Number of super classes for each class could be provided dynamically via exported global or function.

@RossTate
Copy link
Contributor

RossTate commented Jul 9, 2020

Then I agree that your strategy is likely better than mine for most programs. I also agree that it’s not very satisfactory. A stand-alone field access requires an offset fetch, then an array access, then an inefficient cast, then finally a field fetch. The potential saving grace is that most of those steps can be skipped for repeated accesses to fields defined within the same class (not including superclasses).

To address public field reordering, you can order by name.

Something this still doesn’t address is changes to where fields are declared among superclasses. Same goes for non-private class methods. Are those changes are uncommon enough that it’s okay to require recompilation?

@tebbi
Copy link

tebbi commented Jul 10, 2020

Manual linking on the client would probably be pretty bad for startup, given that we can have streaming compilation and processing modules in parallel otherwise. Given the big performance cost either way, how important is this kind of late linking on the client? Wouldn't it be better to do this on the server? Trading separate compilation for runtime performance, memory consumption and startup doesn't sound like too bad a deal for the web (after all, that's why the JS ecosystem uses webpack). And splitting an app into modules for on-demand download would still be possible, just that you can't create these modules independently.

@skuzmich
Copy link
Author

Sure, "server-side" linking is a great solution for a lot of people. It allows for intermodule DCE and other link-time optimizations. It would not suffer from above-mentioned issue because instead of Wasm binary it would use a library distribution format with a good level of private implementation encapsulation.

But this approach does not scale well with application size since you would invalidate cache for the whole mono-module during a partial app update.

@rossberg
Copy link
Member

The idea of the GC proposal is to maintain the idea of a low-level VM and (ultimately) enable languages to implement whatever object layout they see fit on top of Wasm, not to build it into Wasm.

For that reason, representation-level Wasm types and their subtyping hierarchies should not be confused with source-level types and their subtyping hierarchies. There has to be some commuting mapping, but it cannot generally be the identity. The idea of being able to directly map every type in a source type system would immediately result in the sum-of-all-type-systems problem for Wasm, which obviously is intractable.

Instead, Wasm goes the opposite direction: provide as thin an abstraction over the hardware as we can get away with (in terms of being both efficient and safe, which obviously are conflicting goals), and accept a minimal level of runtime checks (including casts).

In this particular case, yes, this means that you will have to introduce extra indirections. But if not you, the VM would have to do it. However, with the GC MVP, these indirections are going to be somewhat more costly than the ideal. That is understood, of course, and the nature of an MVP.

To implement Kotlin-like subtyping (or similar schemes, like polymorphic records) more efficiently in the future, Wasm should probably add primitives that allow more fine-grained indirection. In particular, I think it would require typed field offset (a.k.a. "member pointers") as a primitive in Wasm. Then, instead of having an indirection in the object representation itself, you just have an indirection in the accessed offset, and use some form of evidence passing for offset vectors. Polymorphic functions are another feature that will be needed.

I'm somewhat confident that much of this is possible to express in user space (eventually). The only big problem from my perspective is how to provide efficient casts without hardwiring too many assumptions about the shape of allowable type hierarchies into Wasm, as the current RTT mechanism unfortunately does...

Furthermore, @rossberg has insisted that imported values only be incorporated at instantiation time rather than at compile time

To be clear, that is not something I insist on, but that simply is inherent in Wasm's compilation model. We have talked about extending this to support pre-binding imports at compile time on a number of occasions, but there hasn't been any concrete proposal (and there might be some tricky details).

Even if there was such a mechanism, a language implementation might not want to depend on it. In general, you would hope to be able to map language-level separate compilation to Wasm-level separate compilation.

@skuzmich
Copy link
Author

In this particular case, yes, this means that you will have to introduce extra indirections. But if not you, the VM would have to do it.

Could you please elaborate on why do we have to have indirection? There are a lot of programming systems that have modules with private field hiding, yet classes are still flat in the memory.
At the first glance, it looks like all we need is to be able to import fields by name (like we do with global fields) and extend a struct without knowing all of its fields.

@rossberg
Copy link
Member

I'm using "indirection" in a general sense. Basically, if a derived class cannot know at compile time how many fields are in the object before its own, then you have only two choices:

  1. Make sure that the number/size of parent fields does not affect how it accesses its own fields. That is achieved by the scheme you describe, where individual fields are stored in a "2nd dimension" that is reached through a pointer indirection.

  2. Pass the information of where its own fields are located as a parameter. That is the alternative I was alluding to: the class gets some offset into the object from somewhere, and each access to a field of its own needs to add that offset. In other words, the overall offset is computed via a parameter indirection. Clearly, that cannot be expressed with the MVP, and would require more complicated Wasm features.

Any implementation will have to use one of these mechanisms, whether they are built into the Wasm VM or expressed in user space. The only alternative is to delay compilation until after the offsets are known. That either means giving up some degree of separate compilation, committing to a less dynamic linking model, or requiring an additional JIT phase after type specialisation, like in the CLR.

@skuzmich
Copy link
Author

Andreas, thanks a lot for the detailed explanation!

I haven't took into account the key piece of Wasm compiling modules independent of their imports. I'm wondering if it is an important property of Wasm that people care about?

@binji
Copy link
Member

binji commented Jul 20, 2020

@skuzmich This is implied but not required by Wasm (for example, JSC only compiles code on Instantiate when imports are available IIRC). But certainly some applications assume that compilation is expensive and instantiation is cheap (see my clang in wasm demo as an example: https://github.com/binji/wasm-clang). And at least for now, to use threads requires compiling a module, sending it to a Worker, and instantiating it there. If native code is compiled by WebAssembly.compile (or similar) then this process is relatively cheap.

@RossTate
Copy link
Contributor

Yeah, it seems like there are both applications that would benefit substantially from compile-time imports and applications that are benefitting substantially from instantiation-time imports (and likely applications that would benefit from mixing both together). Similarly, some features seem better fit to using compile-time versus instantiation-time imports/exports. So I've been pondering about how to design a compilation model that serves both patterns well. I hope to have some thoughts up in not too long.

@rossberg
Copy link
Member

rossberg commented Jul 21, 2020

@skuzmich, it was the entire reason why the JS API separated compile from instantiate. In general, the idea was that compilation is expensive, and you may want e.g. programmatically cache compilation, or instantiate the same module multiple times (e.g. for workers). Though on the Web, the API for manual caching -- essentially, the ability to store a compiled module in s.th like IndexDB -- was pulled for various reasons.

The simplest way to relax this would be to optionally allow binding a subset of imports with compile, which was discussed a number of times. However, once we have type imports, there can be dependencies between imports, so there would have to be some dependency analysis and respective restrictions. It would probably be more future-proof to explicitly distinguish between regular imports and "preimports" that are staged earlier, and disallow the latter to depend on the former.

But what makes this all more tricky is that you will quickly discover the need for "preexports" as well to supply downstream preimports, at which point a more explicit staging mechanism may be desirable to keep it all sane.

@RossTate
Copy link
Contributor

Awesome. Those are the same conclusions I came to. Sounds like we'll be in the same page 😃

@RossTate
Copy link
Contributor

RossTate commented Aug 5, 2020

@skuzmich I think there's a problem with your current plan, though maybe you decided to intentionally disregard this consideration.

Suppose you compile class A, which extends class B which extends class C. When you compile it, some method of class A accesses a field declared in class B.

Next suppose that field is moved from class B up to class C. (Or, alternatively, that class C didn't exist before, but B was since refactored to have a superclass C that the field was moved to.) I don't think your current plan will handle this. Supposing B is defined in a library that A builds upon, that would mean libraries couldn't update without having all their clients recompile. Maybe you're fine with this; I just wanted to let you know.

Unfortunately, the only fix for this seems to be to treat objects simply as arrays of anyref, whose elements are the field values themselves (rather than class-grouped structures of fields). Then you could import the size of superclasses and the offsets of relevant fields. Of course, this means that all primitive fields will have to be boxed, and all field accesses will require casting the value fetched from the array using an rtt.

@skuzmich
Copy link
Author

skuzmich commented Aug 5, 2020 via email

@RossTate
Copy link
Contributor

RossTate commented Aug 6, 2020

Ah, cool. Thanks for the updates and insights!

@tlively
Copy link
Member

tlively commented Feb 12, 2022

Closing this for now since it seems like the discussion has ended and is not actionable for the MVP, but please reopen if there is anything to add.

@tlively tlively closed this as completed Feb 12, 2022
rossberg added a commit that referenced this issue Sep 7, 2023
Use another name for registering a new module in test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants