Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate RTTs into the Type #119

Closed
RossTate opened this issue Aug 14, 2020 · 18 comments
Closed

Incorporate RTTs into the Type #119

RossTate opened this issue Aug 14, 2020 · 18 comments

Comments

@RossTate
Copy link
Contributor

Following up on #111, this is a suggestion focusing purely on reducing incompatibilies. So unlike #109, this will not add or alter any (important) functionality, it just provides a different way to express current MVP's key functionality in a way that is more extensible.

In the current MVP, every reference has an associated RTT. These RTTs are generally (always?) determinable at instantiation time. So the suggestion is

  1. to define RTTs not as first-class values but as special values like is done for exception-handling event tags, enabling WebAssembly modules to refer to them by index, e.g. $my_rtt_for_pairs
  2. to specify reference types as rttref $my_rtt_for_pairs, making explicit the most-precise statically known RTT for the value (which may have more precise RTTs that can be ascertained by downcasting)

This addresses the issue with unamortizable quadratic-time subtyping (#117) because we can now use the constant-time subtyping algorithm for RTTs.

This potentially addresses the incompatibility issue with bounded parametric polymorphism (#116) because it makes the static type system nominal and there are known decidable and expressive extensions to nominal subtyping incorporating both parametric polymorphism and existential types.

If we furthermore remove the requirement that all RTTs extend from a common base RTT, then this addresses the incompatiblity issue with pointer tagging (#118) because there is no longer a universal top type that all variants must be downcastable from.

Independent of that, if we furthermore remove the expectation that RTTs are derived canonically from their type signatures, then we also provide a mechanism for ensuring data abstraction without requiring dynamic boxing and unboxing as added in WebAssembly/proposal-type-imports#8.

@rossberg
Copy link
Member

RTTs are explicit in order to

  1. make all behaviour and operational cost explicit (no hidden allocations, parameters, etc);
  2. make runtime types optional and customisable;
  3. remain future-proof wrt type parameters, avoiding painting ourselves into a corner that would require reified generics.

They also are a possible extension hook for other features and host-specific customisation, e.g. attaching JS object prototypes or accessors through an external API.

These RTTs are generally (always?) determinable at instantiation time.

Not once we have type parameters, see the example in Post-MVP. Moreover, those are supposed to support polymorphic recursion (as allowed by languages like C# or OCaml), so there generally can't even be an instantiation-time bound on the number of RTTs needed.

@RossTate
Copy link
Contributor Author

RTTs are explicit

I'm not sure what the point of this comment is. The suggestion I gave them makes RTTs even more explicit.

Not once we have type parameters, see the example in Post-MVP. Moreover, those are supposed to support polymorphic recursion (as allowed by languages like C# or OCaml), so there generally can't even be an instantiation-time bound on the number of RTTs needed.

The design for type parameters in #114 has multiple problems. One of which is its misunderstanding of how casting works in languages with parametric polymorphism. #116 specifically discusses why the example your reference is not how pairs are implemented. Another problem not raised in #116 is that the RTT design fails to incorporate the covariance of pairs. These casting problems with the post-MVP are of course in addition to its undecidability problem. Given these problems, the post-MVP should not be used as an argument against considering alternatives until it has been revised and demonstrated to work for an actual language.

On the other hand, there are extensions to the suggestion I make here that have been implemented and demonstrated to work for a major industry language. So the claim of practical extensibility here is not a hypothesis—it is a proven fact.

@taralx
Copy link

taralx commented Aug 19, 2020

RTTs are explicit in order to ...
make all behaviour and operational cost explicit (no hidden allocations, parameters, etc);

Can you elaborate? I don't understand this part.

@taralx
Copy link

taralx commented Aug 25, 2020

@rossberg ^

@rossberg
Copy link
Member

rossberg commented Aug 25, 2020

@taralx, in production engines, type descriptors, as used by the GC and for other purposes, tend to be represented as separate heap objects. For example, in existing Web engines, these are known as hidden classes, maps (V8), or shapes (SpiderMonkey). They tend to be allocated and managed by fairly elaborate mechanisms, shared between objects, and used in a first-class manner inside the engine.

At first glance, making this explicit may seem like overkill for the MVP, but it becomes more relevant once we introduce type parameters:

  • If you allocate an object whose type involves a type parameter of the current function, then you generally have to allocate a new RTT based on that parameter. (This may be cached, but generally has to be dynamic, otherwise you cannot support first-class polymorphism or polymorphic recursion.)

  • To do so, RTTs for the respective type parameters need to be passed to the function.

  • If you call a function instantiated with types involving your own type parameter, and that function requires RTTs, then you also have to generally allocate a new RTT based on your own type parameters. The cost of constructing these for every call can accumulate along the call chain.

  • Clearly, we don't want to pay this cost for every call site with a type parameter -- that would strongly violate the pay-what-you-use principle, since generally only few functions perform allocations.

  • A producer OTOH may know upfront which functions and type parameters require runtime information, so should be enabled to implement respective calling conventions with RTT-passing based on its needs. One obvious way to enable that is by treating RTTs as the first-class values that they already are inside a typical engine.

  • That also decouples type checking (validation) from execution: type parameters would remain a purely validation-time mechanism that does not affect jitting in any way. That is a significant conceptual and practical plus over an entangled mechanism (reified generics).

  • In some parts of the literature, the step lowering implicit value arguments that back types to explicit arguments is known as an evidence translation. For a proclaimed low-level VM like Wasm, it only is natural (and potentially more effective) to expect that producers perform this lowering and that Wasm features an instruction set where everything that can have an operational cost or effect is explicit in the operations.

(An alternative would be to mark type parameters as "static" and "dynamic", but that is more coupled and has the disadvantage of not explicating the cost for allocating new RTTs.)

@RossTate
Copy link
Contributor Author

If you allocate an object whose type involves a type parameter of the current function, then you generally have to allocate a new RTT based on that parameter. (This may be cached, but generally has to be dynamic, otherwise you cannot support first-class polymorphism or polymorphic recursion.)

The techniques in #116 talking about how existing systems support parametric polymorphism, without allocating new RTTs dependent on RTTs for type parameters, extend to first-class polymorphism and polymorphic recursion.

I think the issue you are referring to is these features in combination with reified downcasting. But reified downcasting also does not need to allocate new RTTs dependent on RTTs for type parameters. That is a particular implementation strategy with alternatives that language implementations choose between depending on how they expect the tradeoffs to apply to their application. In particular, it is a space-saving technique, just like putting the v-table in the descriptor saves space. But unlike v-tables where objects of the same exact class will always have the same methods, objects of the same class will have different type arguments, and so putting those type arguments into the descriptor rather than the object can introduce time overhead and thwart a number of optimizations (e.g. caches). In some cases, like immutable pairs, the type arguments can be completely redundant because the values in the pair already have the relevant RTT information. Also, your use of a cache to ensure a canonical representative requires multithreaded coordination, introducing a synchronization point at every such allocation in multithreaded systems with sharing. All in all, there's a wide variety of implementation strategies for reified downcasting, and building in a particular one would seem to be counter to WebAssembly's design philosophy.

Furthermore, there are already published techniques that specifically support reified downcasting. Those techniques are expressive enough to let the application choose how it implements reified downcasting, including the implementation strategy you are suggesting as well as the others I mentioned. They are compatible with the suggestion made here—in fact, they rely on the suggestion made here. They also do not require all polymorphic functions to be reified—it is still the application that is in charge of explicitly passing around reification information. In fact, these techniques support optimizations/expressiveness that significantly reduce the amount of reification information that needs to be passed around.

The Post-MVP, on the other hand, does not support these optimizations/expressiveness. It also cannot express common reified generic data structures like Java/C# arrays, whereas the aforementioned techniques that build on the suggestion I made here can.

Altogether, I believe the suggestion made here address the important concerns you raise, and in a manner better than the Post-MVP.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 29, 2020

This addresses the issue with unamortizable quadratic-time subtyping (#117) because we can now use the constant-time subtyping algorithm for RTTs.

This potentially addresses the incompatibility issue with bounded parametric polymorphism (#116) because it makes the static type system nominal and there are known decidable and expressive extensions to nominal subtyping incorporating both parametric polymorphism and existential types.

Could you expand on this? Even if RTTs were to be generated at instantiation-time and referenced statically, the act of creating an RTT would still require a full structural subtyping calculation (assuming no other changes).

to specify reference types as rttref $my_rtt_for_pairs, making explicit the most-precise statically known RTT for the value (which may have more precise RTTs that can be ascertained by downcasting)

Based on this quote, is the idea also to change the way (e.g.) struct types are declared, by requiring that they can only be shallow compositions of RTTs?

@RossTate
Copy link
Contributor Author

RossTate commented Aug 29, 2020

This addresses the issue with unamortizable quadratic-time subtyping (#117) because we can now use the constant-time subtyping algorithm for RTTs.

Could you expand on this? Even if RTTs were to be generated at instantiation-time and referenced statically, the act of creating an RTT would still require a full structural subtyping calculation (assuming no other changes).

My sentence is referring to problems with value-level subtyping, whereas your comment seems to be referring to type-declaration validation. Value-level subtyping is done many many times throughout module validation, whereas type-declaration validation is done once.

Also, "full structural sytyping calculation" is different for structural and nominal systems. With nominal systems, fixpoints in types are made explicit, as are the relationships between these fixpoints. So when we have to check if the bodies two nominal types are compatible (because we declared one nominal type to be a subtype of the other), we can do so without recursively checking the relationships between fixpoints (because they have been explicitly declared). On the other hand, a structural comparison has to recurse into these fixpoints and determine if they are compatible. This is why nominal systems can decidably express irregularly coinductive structures, whereas structural systems can only express regularly coinductive structures.

The Post-MVP mentions that "recursive type definitions with parameters must [be] uniformly recursive", thereby ensuring regularity. But nearly every polymorphic language does not abide by this restriction. Here's a minified ML example of a pattern that is known to occur in practice (see Section 3 of "What are principal typings and what are they good for?" by Trevor Jim):

datatype 'a T = EMPTY | NODE of 'a * ('a T) T

fun collect EMPTY = nil
  | collect (NODE(n, t)) = n :: flatmap collect (collect t)

This datatype is inexpressible in the Post-MVP due to the above restriction, and relaxing that restriction would make the Post-MVP undecidable (even without bounded polymorphism). On the other hand, it is known how to express this datatype in a nominal type system.

Based on this quote, is the idea also to change the way (e.g.) struct types are declared, by requiring that they can only be shallow compositions of RTTs?

Unfortunately, I do not understand your question.

@conrad-watt
Copy link
Contributor

The main thrust of my question is that just moving RTT generation to instantiation-time would not make the type system nominal, so I'm trying to understand how this is accomplished.

Is it still permitted for me to define something like

type $A = struct $B $C
type $B = struct i32
type $C = struct f64

or would I now write something like

type $A = struct $X $Y

rtt $X = rtt.get $B
rtt $Y = rtt.get $C

type $B = struct i32
type $C = struct f64

@RossTate
Copy link
Contributor Author

The suggestion specifies two steps, the second of which is to incorporate RTTs into the type.

@conrad-watt
Copy link
Contributor

The suggestion specifies two steps, the second of which is to incorporate RTTs into the type.

This is somewhat hard to engage with, given that I quoted that step, asked a question about it, and your response was that you didn't understand me.

Is my example accurate? "Incorporating RTTs into the type" doesn't provide a clear picture of how the MVP's existing mechanism for defining types would change in this proposal.

@RossTate
Copy link
Contributor Author

Oops, sorry. In switching back and forth between the two discussions I lost track of context here. Thanks for your patience.

For your example, you would only write something like your first snippet. Every type declaration would specify the structure of the type as well as associate an RTT for the type. So if you had something like

type $X = struct i32
type $Y = struct i32

they would be defining two distinct types because their values would have distinct RTTs.

(Meta note just for clarity: obviously we're brushing over details many here. These examples are conceptual.)

@taralx
Copy link

taralx commented Aug 31, 2020

@rossberg Thanks for the detailed reply. It's taking me a bit of time to digest; your patience is appreciated. :)

@conrad-watt
Copy link
Contributor

conrad-watt commented Sep 1, 2020

In this example:

type $A (+rtt...) = struct $C
type $B (+rtt...) = struct $D $D

type $C (+rtt...) = struct i32
type $D (+rtt...) = struct i32

$B is not able to be a subtype of $A?

Assuming this is correct, then this seems like a "true MVP" version of SOIL - the RTT is effectively the parent field (except maybe more explicitly keeping track of the transitive parents). This is probably a really good thing for focussing debate.

Am I correct in thinking with this version, there would be no reason for RTTs not to statically list the parent types they're tracking?

@RossTate
Copy link
Contributor Author

RossTate commented Sep 1, 2020

Am I correct in thinking with this version, there would be no reason for RTTs not to statically list the parent types they're tracking?

In fact, an RTT needs to explicitly specify its immediate parent (like in rtt.sub).

$B is not able to be a subtype of $A?

If $D explicitly declares $B to be its parent, that declaration will be accepted because the two have compatible body types and the two will be considered subtypes. If furthermore $B explicitly declares $A to its parent, then that declaration too will be accepted and the two will be considered subtypes. The following case demonstrating coinductive fixpoints works as well:

type $A (+rtt...) = struct $C
type $B (+rtt... extending $A) = struct $D $D

type $C (+rtt...) = struct $A
type $D (+rtt... extending $C) = struct $B

then this seems like a "true MVP" version of SOIL

See #73

This is probably a really good thing for focussing debate.

Woot! That's what I was aiming for.

@conrad-watt
Copy link
Contributor

Oh, my example should have been more specific.

type $A (parent: none) = struct $C
type $B (parent: $A) = struct $D $D

type $C (parent: none) = struct i32
type $D (parent: none) = struct i32

This should be a type error (sidestepping questions of whether any is a top type)?

See #73

Ah I missed this earlier! I can also make the positive statement that this issue's proposal seems to capture a particularly small delta between the current structural MVP and a sensible nominal MVP, assuming RTTs are appropriately renamed/resyntax'd in a "full" proposal of this.

@RossTate
Copy link
Contributor Author

RossTate commented Sep 1, 2020

This should be a type error (sidestepping questions of whether any is a top type)?

Correct. All subtyping is intentional in this suggestion, so $C and $D are unrelated by default, making struct $D $D not a structural subtype of struct $C, and consequently $B is not allowed to descend from $A.

rossberg added a commit that referenced this issue Feb 24, 2021
Fix a couple of parser omissions regarding elem & data syntax. Refactor grammar in spec slightly for more clarity. Adjust tests.
@tlively
Copy link
Member

tlively commented Feb 17, 2022

Closing this in favor of #275, since that is based on concrete implementer feedback and the details of the type system we have settled on for the MVP.

@tlively tlively closed this as completed Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants