-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary compatibility in case of separate compilation #102
Comments
I raised this issue at the in-person meeting. @rossberg suggested exactly what you propose. And it's actually worse than what you describe, because without knowing how many (private) fields of inherited classes there are you can't know the offsets of your own fields. Furthermore, @rossberg has insisted that imported values only be incorporated at instantiation time rather than at compile time, so every field access will involve a dynamic load of the imported You should also be warned that the recent change to add an index to the type of |
The method I described allocates inherited fields in a separate struct, this way field offsets don't depend on super class size. I haven't benchmarked it but those field accesses doesn't look performant. But, I guess , we could ship a custom linker to link relocatable wasm modules inside a browser to keep field accesses fast and still use some of the browser cache for unchanged modules. |
Ah, sorry, I misunderstood your description. But if you want to be able to add superclasses without recompiling, how do you know which index in the array has the structure for your class’s fields? |
Number of super classes for each class could be provided dynamically via exported global or function. |
Then I agree that your strategy is likely better than mine for most programs. I also agree that it’s not very satisfactory. A stand-alone field access requires an offset fetch, then an array access, then an inefficient cast, then finally a field fetch. The potential saving grace is that most of those steps can be skipped for repeated accesses to fields defined within the same class (not including superclasses). To address public field reordering, you can order by name. Something this still doesn’t address is changes to where fields are declared among superclasses. Same goes for non-private class methods. Are those changes are uncommon enough that it’s okay to require recompilation? |
Manual linking on the client would probably be pretty bad for startup, given that we can have streaming compilation and processing modules in parallel otherwise. Given the big performance cost either way, how important is this kind of late linking on the client? Wouldn't it be better to do this on the server? Trading separate compilation for runtime performance, memory consumption and startup doesn't sound like too bad a deal for the web (after all, that's why the JS ecosystem uses webpack). And splitting an app into modules for on-demand download would still be possible, just that you can't create these modules independently. |
Sure, "server-side" linking is a great solution for a lot of people. It allows for intermodule DCE and other link-time optimizations. It would not suffer from above-mentioned issue because instead of Wasm binary it would use a library distribution format with a good level of private implementation encapsulation. But this approach does not scale well with application size since you would invalidate cache for the whole mono-module during a partial app update. |
The idea of the GC proposal is to maintain the idea of a low-level VM and (ultimately) enable languages to implement whatever object layout they see fit on top of Wasm, not to build it into Wasm. For that reason, representation-level Wasm types and their subtyping hierarchies should not be confused with source-level types and their subtyping hierarchies. There has to be some commuting mapping, but it cannot generally be the identity. The idea of being able to directly map every type in a source type system would immediately result in the sum-of-all-type-systems problem for Wasm, which obviously is intractable. Instead, Wasm goes the opposite direction: provide as thin an abstraction over the hardware as we can get away with (in terms of being both efficient and safe, which obviously are conflicting goals), and accept a minimal level of runtime checks (including casts). In this particular case, yes, this means that you will have to introduce extra indirections. But if not you, the VM would have to do it. However, with the GC MVP, these indirections are going to be somewhat more costly than the ideal. That is understood, of course, and the nature of an MVP. To implement Kotlin-like subtyping (or similar schemes, like polymorphic records) more efficiently in the future, Wasm should probably add primitives that allow more fine-grained indirection. In particular, I think it would require typed field offset (a.k.a. "member pointers") as a primitive in Wasm. Then, instead of having an indirection in the object representation itself, you just have an indirection in the accessed offset, and use some form of evidence passing for offset vectors. Polymorphic functions are another feature that will be needed. I'm somewhat confident that much of this is possible to express in user space (eventually). The only big problem from my perspective is how to provide efficient casts without hardwiring too many assumptions about the shape of allowable type hierarchies into Wasm, as the current RTT mechanism unfortunately does...
To be clear, that is not something I insist on, but that simply is inherent in Wasm's compilation model. We have talked about extending this to support pre-binding imports at compile time on a number of occasions, but there hasn't been any concrete proposal (and there might be some tricky details). Even if there was such a mechanism, a language implementation might not want to depend on it. In general, you would hope to be able to map language-level separate compilation to Wasm-level separate compilation. |
Could you please elaborate on why do we have to have indirection? There are a lot of programming systems that have modules with private field hiding, yet classes are still flat in the memory. |
I'm using "indirection" in a general sense. Basically, if a derived class cannot know at compile time how many fields are in the object before its own, then you have only two choices:
Any implementation will have to use one of these mechanisms, whether they are built into the Wasm VM or expressed in user space. The only alternative is to delay compilation until after the offsets are known. That either means giving up some degree of separate compilation, committing to a less dynamic linking model, or requiring an additional JIT phase after type specialisation, like in the CLR. |
Andreas, thanks a lot for the detailed explanation! I haven't took into account the key piece of Wasm compiling modules independent of their imports. I'm wondering if it is an important property of Wasm that people care about? |
@skuzmich This is implied but not required by Wasm (for example, JSC only compiles code on |
Yeah, it seems like there are both applications that would benefit substantially from compile-time imports and applications that are benefitting substantially from instantiation-time imports (and likely applications that would benefit from mixing both together). Similarly, some features seem better fit to using compile-time versus instantiation-time imports/exports. So I've been pondering about how to design a compilation model that serves both patterns well. I hope to have some thoughts up in not too long. |
@skuzmich, it was the entire reason why the JS API separated The simplest way to relax this would be to optionally allow binding a subset of imports with But what makes this all more tricky is that you will quickly discover the need for "preexports" as well to supply downstream preimports, at which point a more explicit staging mechanism may be desirable to keep it all sane. |
Awesome. Those are the same conclusions I came to. Sounds like we'll be in the same page 😃 |
@skuzmich I think there's a problem with your current plan, though maybe you decided to intentionally disregard this consideration. Suppose you compile class A, which extends class B which extends class C. When you compile it, some method of class A accesses a field declared in class B. Next suppose that field is moved from class B up to class C. (Or, alternatively, that class C didn't exist before, but B was since refactored to have a superclass C that the field was moved to.) I don't think your current plan will handle this. Supposing B is defined in a library that A builds upon, that would mean libraries couldn't update without having all their clients recompile. Maybe you're fine with this; I just wanted to let you know. Unfortunately, the only fix for this seems to be to treat objects simply as arrays of |
I'd like to clarify that this 2D approach is not the current plan for
Kotlin. My intent with that was to point out the issue. Current short term
plan is to disallow separate Wasm compilation, use our custom library
format for "server-side" linking and use Wasm module system for
lazy-loading and code reuse for different parts of the same application,
but all these modules would be a result of a single compilation.
Regarding the moving filed from B to C, I had in mind exporting functions
that encapsulate field access. We would need to add these functions for
inherited fields too.
But I would not feel bad if we would require recompilation in this case, it
seems pretty rare compared to adding a private field.
ср, 5 авг. 2020 г., 16:33 Ross Tate <notifications@github.com>:
… @skuzmich <https://github.com/skuzmich> I think there's a problem with
your current plan, though maybe you decided to intentionally disregard this
consideration.
Suppose you compile class A, which extends class B which extends class C.
When you compile it, some method of class A accesses a field declared in
class B.
Next suppose that field is moved from class B up to class C. (Or,
alternatively, that class C didn't exist before, but B was since refactored
to have a superclass C that the field was moved to.) I don't think your
current plan will handle this. Supposing B is defined in a library that A
builds upon, that would mean libraries couldn't update without having all
their clients recompile. Maybe you're fine with this; I just wanted to let
you know.
Unfortunately, the only fix for this seems to be to treat objects simply
as arrays of anyref, whose elements are the field values themselves
(rather than class-grouped structures of fields). Then you could import the
size of superclasses and the offsets of relevant fields. Of course, this
means that all primitive fields will have to be boxed, and all field
accesses will require casting the value fetched from the array using an
rtt.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#102 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHA5UWE3YHQNH4DJFNR2N3R7FNTDANCNFSM4OVNUBFA>
.
|
Ah, cool. Thanks for the updates and insights! |
Closing this for now since it seems like the discussion has ended and is not actionable for the MVP, but please reopen if there is anything to add. |
Use another name for registering a new module in test.
In some languages, including Kotlin, changing the set of private class fields is
considered a binary compatible change,
meaning that app don't have to be recompiled when library changes some of its private fields.
What would be a suggested way to implement this kind of separate compilation in Wasm GC?
What I'm interested in is a solution that allows:
Seems like we can achieve that if we expose all object types as an array of anyref (or similar) where elements correspond to classes in inheritance chain and contain their own non-inherited fields.
This would, unfortunately, mean a lot of indirections and extra casts for classes which can be used across modules. Could there be a better way?
The text was updated successfully, but these errors were encountered: