-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive nominal types with limited structural types #232
Comments
IMO, the key constraint would be that structural types would not be permitted to be recursive. |
Consider a class
|
It's hard to offer an opinion or technical assessment without pinning down more details. It's hard to suggest how we should pin down those details without use cases. It's also hard to know if this overall approach would address @rossberg's concerns without use cases. So, I think we still need concrete examples from real languages to make progress on this issue. It sounded like @rossberg might now be able to offer some from OCaml, so I'll wait to see those. (Since the plan is for the MVP to not support generics, it would be good to have examples that do not rely on generics.) |
@tlively, can you clarify what you mean by "children"? Note that the assumption that non-recursive structural source types naturally map to non-recursive representation types is false. I showed a prominent counter-example in my presentation, namely closures: a source-level function type like (Int,Int)->Int is structural and non-recursive, yet has to be lowered to two mutually recursive types:
|
AFAICS, only if language semantics require that the function at runtime has access to itself; is that the case anywhere? Otherwise, you could lower the function type to a non-recursive triple of types (with "iii" abbreviating "i32_i32->i32"):
|
I mean the children in the (possibly infinite) syntax tree of the type definition. So a type's "children" would be the types its definition directly refers to. And thanks for the closure example; that's exactly the kind of counterexample I was looking for. I share @jakobkummerow question, and I also note that using |
This example is flawed for two reasons. The first reason is that the recursion has no utility here. The The second reason is that this is an unrealistic lowering of a surface-level function type. For real-world object-oriented languages, function types inevitably compile down to something more nominal (e.g. functional interfaces, For real-world functional languages, function types have to account for both polymorphism and currying. I'll focus on the latter issue. Values of type It's worthwhile considering how to solve this problem. There seem to be three solutions.
I went into all this detail to illustrate that, once one accounts for real-world considerations, none of the lowerings of function types really benefit from canonicalization of unbounded low-level types. The lowering that I expect would be most efficient does rely on canonicalization, but it is canonicalization of call tags—not types—and the call tags need to be created in a manner that is very application specific (in order to have the right default behavior). So, yes, most language runtimes (compiled to typed backends) rely on some form of canonicalization. But canonicalization of just low-level types is likely to be insufficient for these runtimes. Furthermore, the canonicalization process—both in terms of what the parameters are (e.g. arity number for call tags, or surface-level types for surface-level generics, as @titzer has mentioned) and in terms of how the canonical value gets constructed (e.g. application-specific default behavior for call tags)—generally needs to be application specific. These discussions have essentially amounted to observing that any embedder that wants to have support for separate compilation of realistic languages will need to support application-specified canonicalization in some way or another. In a JS embedder, one can do all this canonicalization in the JS code (i.e. the custom loader). In a string-based embedder, one has to resort to string mangling and module deduping (unless the embedder provides a better process). If we feel this is a widespread need that merits cross-platform support, then we can develop a standard for this and even make it have nice properties like being declarative and incremental (and much more efficient). Structural types barely make a dent on the canonicalization problem for real language runtimes, but they do create a whole bunch of problems for GC. On the other hand, nominal types are much more tractable for GC, and their (sole?) limitation (for separate compilation, not GC) seems to be addressed by having any sort of canonicalization infrastructure that any embedder that wants to support separate compilation for real languages will need to provide in one way or another anyways. So lets work on a proposal that focuses on the concerns of GC, recognizing that the need for canonicalization is valid but is better served (both for GC and for canonicalization) by being factored out. |
@jakobkummerow, yes, you could work around it the way you suggest, but that would prevent you from flattening the closure environment, as it's done in most implementations of closures. (V8 and possibly other JS engines make a different choice, but that is rather specific to JS and has a number of unfortunate consequences.) @tlively's, you are right: as with everything, it's possible to break recursion by weakening to a top type and casting back in case you need it concretely. But as you note, it has some disadvantages (another minor one being that you have to define all types in two versions). So yes, no doubt this could be worked around. Better if you don't have to. Another interesting case is recursion among just function types. @lukewagner encountered one recently in the context of a suggestion for expressing symmetric stack switching, where the representation of a suspended thread would have a type like
because each thread is passed the continuation of the previously active thread when woken up. In this case, working around with casts would be a bit trickier, since that example is unrelated to the GC proposal, and we haven't assumed any casting ability outside it so far, especially not on continuations. @RossTate, sigh, the representation I described is a direct rendering of the data structures that real-world compilers use when compiling to native code. The concrete types of individual closures will obviously be subtypes of it (requiring a downcast for the time being). The established technique for currying, e.g. used in OCaml and Haskell, is called eval/apply, and uses the same kind of closure representation, except that the function's arity is extractable from it in some way. And I actually went through the extensive effort of implementing all this very concretely in a full-fledged compiler. https://github.com/WebAssembly/gc/tree/waml/proposals/gc/waml#functions-and-closures (That was part of what I intended to present a few weeks ago when my presentation fell off the agenda. I hope there'll be space for it in some future meeting.) |
This recursion is because you have chosen to simulate symmetric stack switching with asymmetric stack switching. In addition to this simulation producing significant overhead (which is why no language implementation whose performance genuinely relies on efficient stack switching, e.g. Erlang, uses it), it also makes for unnecessarily complex types (as you have shown). The design we developed that natively supports symmetric stack switching does not need this (yes, that design was untyped, but the typed variant would also not need this recursion).
@rossberg You have been asked my me (and I believe by others as well) to stop doing this many times.
Your compiler uses a variant of strategy (1) that I listed above. (You call the bound on arity
I know (as hopefully the previous remark makes clear). But the point I made that all closures will either not use their environment or will need to downcast it still stands. That is, there is literally no overhead saved by making these types recursive.
I have spoken with implementers of these languages about this issue specifically in the context of WebAssembly and the constraints that a typed back-end imposes. I wanted to check if they thought the approach would work well, and they actually pointed out WebAssembly's constraints eliminate many of the economic and performance advantages of eval/apply, and suggested that strategy 2 might work better for WebAssembly. I then spoke with them about call tags, and they pointed out that call tags addressed many of the disadvantages of the traditional push/enter approach. As such, it was their estimation that call tags would outperform eval/apply on WebAssembly, possibly significantly. I will note that, in my discussion with implementers of functional languages hoping to target the web with WebAssembly, they expressed much more concerned about the performance characteristics of the current GC proposal than they did about linking. After all, all the concerns @rossberg has raised are easily addressed by a loader written in JavaScript or by string mangling. Meanwhile, @rossberg has readily disregarded the linking concerns actually raised by object-oriented implementers that cannot be addressed by loaders written in JavaScript (such as being able to efficiently access the fields of a class defined in another module without needing to have full knowledge of that class's private fields and memory layout), insisting that they are PostMVP issues. Meanwhile, it seems we cannot even get whole-program compilation performing well even for the languages best suited to the MVP, and there still is no decent JavaScript-interop story for the MVP. So @rossberg has inverted the priorities of this proposal, causing us to focus on problems no real language implementer has raised because they only (potentially) apply to embedders that are specifically not the web, rather than focusing on how to enable the functionality that would make the GC proposal turn WebAssembly into a great target for garbage-collected languages wanting to integrate into the web. We know that nominal types work for the language implementers currently trying to target the web through WebAssembly. We have known that for years. If a real language implementer comes along and can give a real story on why they really need structural types, along with specifically what kind of structural types, typing rules, and casting mechanisms would serve their needs (e.g. recursive? covariant? contravariant?), then we can add them. But for now let's stop spending substantial time, effort, and money implementing complex features that no real language implementer has actually asked for so that we can start supporting the features they actually are asking for. |
To me the key point in @RossTate's lengthy explanations :) We've come to similar conclusions many times before, yet we somehow keep returning to the idea that we absolutely cannot move forward because it would break separate compilation for some non-existing potential future language implementation. @tlively's direction seems entirely pragmatic for an MVP to me. |
Actually, I don't believe this is what @rossberg is getting at. I believe his point is that if we shipped structural types in the MVP (as opposed to some form of nominal types), we actually get some unplanned reuse in the same spirit as function types do now. One could argue that nominal types would actually be a regression w.r.t. function types in that regard, and they would require a non-MVP mechanism, namely type imports, plus a runtime system. I actually tend to agree with Andreas here. If we ship structural types in the MVP, then some (limited set of) things do just work. |
I've been operating under the assumption that type imports will be a part of the MVP no matter what the MVP looks like because they were split out into a prerequisite proposal. |
Ah, I see, that's true that type imports are listed as a prereq in the overview. However, I think the issue of import explosion (from @rossberg's presentation, both this week and more than a year ago), is a very serious issue. Structural types allow reuse without coordination through imports. Type imports is also incomplete without a mechanism to import field indexes, because even though you can import a type, you can't access any fields on it. As discussed also, the canonicalization of structural types needs JIT superpowers not yet designed. The list of things needed just keeps growing. |
In #128, @conrad-watt and I went through the examples from @rossberg's presentation and found that nominal types did not exhibit the type-import explosion he had portrayed. Similarly, in this week's presentation the type-import explosion was shown in a misunderstanding of how people were expecting nominal types to be used.
The type-import proposal already includes subtyping bounds so that you can access fields of imported types. However, as has been mentioned, this approach requires the importer to know the much more of the content of the imported type than you should need to. Instead of subtyping constraints, you can use field members, which also allow you to access the fields of imported types, but with the added benefit that you do not have to know about, say, the private fields of the imported type in order to do so. So this is more an issue of the MVP being designed around the wrong import/export constructs. (Note that there are uses of subtyping constraints, but those uses are generally for constraints between two abstract types. And, for many of those uses it's useful to be able to have multiple subtyping upper bounds—as I believe you used for Jawa—which had issues with exponential-time complexity for the structural approach due to field accesses, but which the nominal field-members approach can keep to linear time.)
As also discussed, you can simply ship your module alongside a module that defines the type. No jitting is necessary; you're simply pulling content that's meant to be shared with others out of your module. (That also connects to how @rossberg's solution to this problem relies heavily on violating abstraction.) Also, there are places where the structural approach requires jitting where the nominal approach does not. One example is subclassing. As you know, with the current MVP, if you want a class to be able to extend another class without knowing the full contents of the superclass, you have to resort to jitting. With a nominal approach, when you define the structure of a nominal type, you (optionally) specify a nominal type it extends, and then the fields you specify are simply in addition to the fields of the specified nominal type and you can do this without knowing the definition of the specified nominal type.
We are removing |
Yes, I agree that this means subtyping constraints aren't sufficient, but we can't wholly replace them with field member imports, because the engine does need to reason about subtyping. But after we solve that, the next problem is that source-level names and source level types are needed to make linking of real languages, like Java, work. There's no place to put field names now, so soon someone proposes a new section type for that, or we kludge it with a custom section. I took a stroll to the end of that road. Java modularity needs Java abstractions to be preserved. So the future would be an endless parade of insufficiently general Wasm mechanisms of increasing complexity until the full source abstractions of Java are encodable into Wasm in an inviolable way. (Or maybe Java producers will just ship a literal Java Look, I want to dive into all of these:
But I don't want to inflate the thread with the same old discussion. I haven't spent time here because the it always falls into the same grooves and then becomes adversarial to the point that I dread participating in details. (I inevitably, probably already, regret speaking up, as it just invites a metric shitton of criticism). There's a schism around structural types that is not resolved with technical discussion. Rather, it might even be exacerbated by it! Our best increments of progress reliably come from implementing and trying things out. I've been most heartened by the experimental work done on type canonicalization and measuring its costs. Frankly, that's been more useful than an entire year of these threads. (I can see why someone might let out a sigh or two.) @tlively Thanks for posting this issue, because I can see that you are striving to find a solution that involves the good aspects of structural types while specifically avoiding performance costs. We need to keep thinking of things like that. We absolutely must focus what is shippable and useful in an MVP for WebAssembly GC. Let me be even more clear. We must ship an MVP. Let's please avoid discussions that wander off into the next iteration of the schism, because that means we are not all pulling in the same direction. And we haven't been. Long and meandering replies, accusations, and personal attacks...these are negative progress. |
I also want to get an MVP shipped, and soon, and appreciate the work that @tlively, @jakobkummerow, and @askeksa-google have put into testing how even slight variations might address the technical issues we are collectively facing. Furthermore, (based on offline conversation following the comment above) I am glad that @titzer is taking significant steps to work out the non-technical issues we are collectively facing (including my own role in those issues). |
Closing this because we've settled on #243 as the MVP type system, obsoleting this idea. |
Structural types certainly make sharing tuple, array, and function type definitions simpler, whether they are necessary or not. We also need recursive types because they are so common in source languages. What if we supported recursive nominal types and some form of limited structural types that had negligible canonicalization overhead?
One possible limitation on structural types might be that they could only have nominal or primitive types as children.
AFAICT, this approach seems to have all the advantages of both nominal and equirecursive types. Tuples, arrays, and function types that are structural in source languages are typically not also recursive except indirectly through non-structural types like classes, so they could be lowered to structural types in this scheme.
For those who want nominal types in the first place, does this extension with limited structural types seem useful and acceptably simple?
For those who want structural types in the first place, are there important use cases that this idea still does not support without the central coordinating logic necessary with nominal types?
The text was updated successfully, but these errors were encountered: