-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value types rather than int31 #53
Comments
A basic assumption of Wasm is that programs are lowered to the level of machine code as much as possible on the producer side. Language-level features like "value types" are hence assumed to be compiled away already and shouldn't require support from Wasm. (Although immutable structs are actually semantically equivalent to value types if reference equality is not available for them.) I don't see how they address the same problem as int31ref, though. Since pointer tagging is the preferred technique for implementing GC in all existing Wasm VMs -- and AFAIK proven to be the most efficient technique in general --, the GC design has to allow for this implementation while still supporting unboxed integers in places where references are stored (see the discussion of polymorphism and uniform representation in the overview). |
Another way to say this is that Wasm's design approach is to offer minimal mechanisms for high-level languages to compile to an abstract machine without (unreasonable) loss of performance. As such it exposes an abstraction that is an efficient reflection of today's CPU capabilities with minimal additional abstraction to make it safe (e.g. function boundaries, invisible callstack, indirect function tables, globals). For languages that implement polymorphism with a universal representation, like Ocaml, |
Andreas Rossberg wrote:
Perhaps we're talking at cross purposes but I'd have said the opposite: OCaml and Java are slow because they box types like Some relevant benchmarks: Hash table insertion on solutions using tagging (OCaml/Haskell) is 4-10x slower than on .NET (no tagging): http://flyingfrogblog.blogspot.com/2013/09/hash-table-insertion-performance-f-vs.html Another hash table benchmark showed that the tagging solution (JVM) was 17x slower than the non-tagging solution (.NET) http://fsharpnews.blogspot.com/2010/05/java-vs-f.html OCaml is ~3x slower than .NET on the Monte Carlo task (a PRNG) of the SciMark2 benchmark which uses 31- or 32-bit ints http://flyingfrogblog.blogspot.com/2009/01/mono-22.html On a hailstones benchmark that requires 64-bit ints the solutions using tagging (OCaml/Haskell) are 7-32x slower than those that do not (F#/HLVM) https://groups.google.com/forum/#!msg/fa.caml/yg3p8Txk4_o/7hDNY6ObLrYJ On the ray tracer benchmark the solutions using tagging (Java/Haskell/OCaml) are 1.4-3x slower than those that do not (MLton, HLVM, C++) http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html |
Thanks, those are some interesting numbers. Besides efficiency, to me the bigger reason why I don't see the point of a baked in int31 type is that I don't see it as that widely applicable. The amount of languages/systems/formats/protocols that specify an int as having exactly 32 (or 8/16/64/same_as_cpu) bits has to dwarf the ones that say it is implementation defined and can conveniently pick 31 without funny things happening to certain kinds of values. That said, I guess I am not against a feature that helps certain languages but is neutral for others, if its usefulness is sufficiently widespread. |
|
As I discussed before, this is missing the point. We don't provide it for the purpose of exposing it in a source language. We provide it so that producers can use the very common implementation technique of (predictably!) unboxing small integers when compiling to GC types. The vast majority of implementations of dynamic or polymorphic languages rely on this -- I think you happen to be working on one of them. ;)
I don't see how it does. An engine could still use a different scheme. Though there are reasons why so many GC'ed runtimes use pointer tagging. |
Here's a packing scheme we found worked really well for our language we just developed: if the low bits are 00, it's a ref; if they are 11, it's a signed integer constructed by bit-shifting right 2 bits and preserving the sign (with other signed integers represented using a ref); otherwise it's a (single word) float constructed by bit-wrapping right 3 bits (with other floats represented using a ref), unless the high 30 bits are 0 in which case it is constructed by big-wrapping right 2 bits. The reason for the float-packing design is that it is fast to pack and unpack and it ensures that all 32-bit floats representing numbers whose exponent's absolute value is small are packed. This enabled us to perform well on both integer-based numerical benchmarks and float-based numerical benchmarks rather than favoring one or the other (and of course there's a 64-bit version of this packing scheme as well). How do you propose that my 32-bit engine use this packing scheme and support |
@RossTate, that sounds like a plausible scheme for a native implementation of a specific language, but I think I'm missing how that matters for Wasm. How would a Wasm engine benefit from such a scheme? Custom tagging or packing schemes would perhaps be nice for certain producers, but there is no way that Wasm could expose a mechanism for that to producers in a platform-neutral manner. FWIW, extra tagging bits are compatible with i31ref. Bit 0 tells you whether it's an int or not. If not, it's a pointer that can use the next couple of bits for tagging, as alignment allows. V8 is one engine using this scheme (or once did). |
FWIW, we believed a tagged reference was important for the Mu micro vm (which has similar objectives to Wasm). We have
See the 'Tagged Reference' in the Mu spec or 5.2.4 of Kunshan's thesis. |
Ah, so you found nan boxing was a better packed representation than a simply bit flag. I suspect nan hacking wouldn't do so well in the 32-bit setting, but that's just a detail. The important thing is that both general-purpose virtual-machine engines and specialized language implementations have found it worthwhile to pack floating-point numbers at the expense of reducing the number of integers that can be packed. This is how a wasm engine could benefit from my scheme (to answer your question). It is plausible that wasm has some special properties that make that not the case, but wasm's implementation strategies certainly have not been explored enough at this point to know that, so adding something like
As for using And all of this complexity is pointless if the wasm module happens to be executed on a 64-bit machine, in which case many packing schemes can easily store a full 32-bit integer. So |
@steveblackburn, that's interesting. So this is another type in the spirit of i31ref, fixing on a specific encoding. That could be a post-MVP feature. We actually discussed the viability of such a type in the past. There are some issues for Wasm, though: such a type won't easily work on 32 bit platforms, and Wasm requires strict portability; (2) injection into such a type would be lossy for f64, and Wasm elsewhere guarantees to maintain all its bits. Also, we weren't aware of many impls using NaN boxing in practice, it seems to be interesting primarily for languages like JavaScript, where floats are the only number type (and even there most engines don't use it). Are you aware of other interesting examples?
Not sure that answers my question. ;) Concretely, which Wasm reference type would be represented as an unboxed float, and how would you make its performance predictable?
You are fixated on int32s, but that misses the point of this type, which is completely unrelated. It isn't meant to map native integer types from a source language. Nor is it limited to mapping integer types in the first place. Instead, a producer can use it for the representation of a broad range of scalar types, like booleans, enums, characters, small integers, etc. For these, they don't care whether the max range is 32, 31, or 28 bits, they are smaller anyway. But they do care that they are unboxed. So 31 bit is indeed a somewhat arbitrary choice, but we have to pick some portable max range, and it's the most common one by far.
Agreed with (a) on a more philosophical level, but I still haven't heard a suitable alternative given Wasm's portability constraint! Strongly disagreed with (b) and (c), see above. Why not be concrete: What would be your counter proposal for supporting portable unboxed scalars under the Wasm GC proposal? |
Does Wasm need to support dynamic linking or could it assume whole-program compilation? If the latter, types are known at compile time and there is no need for any boxing or tagging. |
@jdh30 what is the downside of the .NET approach, i.e. for what kind of code is the OCaml tagging strategy faster than .NET? (Or does the OCaml strategy, generally, just consume less memory at the cost of being slower? Or does the OCaml strategy enable something that is not possible under the .NET garbage collector?) I'd really like to understand how the .NET GC works. @rossberg I believe I understand this: We need This is my current mental model, I am not an expert at any of this, please try to correct where I am just plain wrong. Also, feel free to drop links to papers or anything that someone who wants to understand this should study. |
@sabine Not sure if this answers a concern, but for what it's worth, it's perfectly possible to use Problem is, you have to call out to the runtime to do type predicates and to tag/detag the value. The |
Yes, absolutely. Furthermore, whole-program monomorphisation approaches cannot handle first-class polymorphism or polymorphic recursion anyway, which are common today. @sabine, yes, that's right. Uniform representation is one important motivation for tagging schemes. That is not at all specific to OCaml, but used by many or most dynamically or polymorphically typed languages. The approach used by .NET is the only expressive enough alternative, but it is waaaay more complicated (in Wasm, it would require dynamic generation and compilation of Wasm code, i.e., shipping a Wasm code generator with your web page), and the type passing involved can also become a significant cost. |
Related discussion: WebAssembly/design#919 |
So you agree that 31 bits is an arbitrary choice that was made based on your understanding of what is common. What I am trying to let you know is that, because of this project, I have spent the last year asking people how their gc's work, and what I saw was that, while 31 bits was common, it was not as common as you seem to think it is. For 32-bit machines, 30 bits was quite common, and 29 bits was used a decent amount as well. For 64-bit machines, 63-bit, 62-bit, 61-bit, and (as in @steveblackburn's example) 52-bit were all common. I have seen experiments showing that many of these schemes perform better than 31 bits. I gave you a concrete example and the intuition as to why it performs better (i.e. floats can be packed in addition to ints - something that is perfectly plausible to be useful to wasm engines since at some point you'll have some subtype of
You keep bringing up "predictable" performance, but this is not the right measure. What someone compiling to gc-wasm wants (beyond generally good performance) is to know that they picked the implementation strategy for their language that will work best for each engine. That is, they want to be able to evaluate the tradeoffs of various implementation strategies. So something we want to avoid is for someone to compile to Predictable performance is just a proxy for this ability to evaluate tradeoffs. This is important because there are many wasm programs/operations that are going to perform differently on 32-bit versus 64-bit machines, and it's unreasonable to expect performance to be predictable across such significant differences in hardware. What's actually important is that the optimal way to compile a program to wasm running on a 32-bit machine is also the optimal way to compile a program to wasm running on a 64-bit machine.
I can think of many, but right now I'll start with the simplest, building upon the above principle. I'm also going to focus on values that belong to Add an |
I wouldn't even mind if it was an i30ref, it doesn't really matter. Though I'd want to see real evidence that that is a better choice, that it would benefit any of the existing engines (which already use 31 bits), and that it would benefit Wasm even hypothetically. In my mind, your example has not been evidence for that yet (see below).
That doesn't matter, because we cannot use any 64-bit-only limit without defeating the purpose.
Are you proposing to introduce an f32ref type? But your encoding is lossy, so this cannot be an f32, but rather an f30ref. Unless the engine is implicitly allocating and boxing in some cases, but see next point.
I think that is where the fundamental disagreement lies. You are talking about semantics that introduces implicit allocations. I would argue that that is a dubious thing to have in a low-level VM, and moreover, in many circumstances its use wouldn't be safe for space. So it would not be attractive for producers in many cases, and certainly not the ones I talked about.
You can do that, but that is a more expensive type (that has the aforementioned issues). It quite simply solves a different problem. If you go back to the older threads, we have already discussed all that. It is conceivable that we may want to add a type like i32ref or i64ref in the future. But it doesn't solve the problem that I was asking about and which i31ref solves. Specifically, on a 32 bit platform intref would make every injection and projection involve an extra check and branch, and in addition, for every injection the 32 bit jit would have to generate a code path for allocation. That is completely redundant overhead when the producer already knows that the value can never exceed the limit of an unboxed value -- why would you want to generate allocation code when storing a two-valued enum on the heap? But the Wasm engine can't know that it doesn't need that path, unless you give the producer a way to communicate that knowledge. And that is exactly what a type like i31ref does. |
Good point, although easily fixed. Add to the
You haven't talked about a single concrete producer in this thread. The only concrete example of how you expect this to be used is booleans in the Overview, so I'll demonstrate that use case. Suppose an OCaml or Haskell-like language is being compiled to wasm. It's compiling the expression In my proposal, the compilation would be (forgiving syntactic slip ups):
For any engine with at least a bit of packable (unsigned) integers, this will perform almost identically to Furthermore, for any engine that packs exactly 31-bit integers, So why do you insist that this would be unattractive for producers when it performs just as well and is more adaptable?
Let me know how the above doesn't address what you would argue for this use case. |
That sounds exactly like exposing it to me.. since either the source language needs to be aware of the fact that this is how small integers are represented (to be both efficient and know what values can be represented) or needs to pay the cost of dynamically checking for values that use the full 32-bits and "upgrading" them as need be (which seems very undesirable in an efficiency oriented system like Wasm).
I don't think I am. I guess you are referring to JS, and as far as I can tell, Wasm+GC as it currently stands would be a fairly unattractive target to directly compile JS to (much like other very dynamic languages), regardless of the presence of i31ref, so I don't see how that is relevant. When it comes to more statically typed languages, most of those I have worked on or am aware of are statically able to tell whether a given 32-bit quantity is all integer, float or pointer bits, and don't need this tagging to function. Your "The vast majority of implementations" seems to be my "niche", so I don't think we're going to resolve this :) |
I just remembered, one way to do weak references is to use a bit flag to distinguish between standard references and weak references. But this strategy also means you only have room to pack 30-bit integers. It also has the interesting property that two of the distinguished cases are references rather than packed values. |
No, that loses the information after injection. Consequently, projection still has to branch. Worse, it also has to do an additional range check now (another branch), so projection is even more expensive than with the plain instruction. No, the value range has to be evident in the type. Moreover, what is the maximum XX you are going to support? If you pick some N then what’s the point of supporting smaller ones? Or do you want to support all N and not say which ones are guaranteed to be unboxed? Then you have again defeated the purpose. (An obvious example where a producer needs to know a guaranteed unboxing limit is the efficient implementation of bignums. A performant implementation has to unbox small values. But if the producer can’t rely on a certain type being unboxed then whatever choice it makes may turn out to be worse than not trying to unbox at all, since now both the producer and the engine have to insert code for making the distinction and branching based on it.) Let me reiterate the requirements:
AFAICS, providing an iNref type for some N < 32 is the only way to meet these requirements. Whether N is 30 or 31 bits or something else is secondary, and I’m certainly open to picking another value if that was a clear win.
Upthread I listed a number of examples of source-level types that ask for unboxed representations:
This is pretty much universal to all producers.
Disagreed, see above.
Providing polymorphism does not solve this. Dynamic languages won’t benefit from it and will always require uniform representations; likewise, most polymorphic languages use uniform representations and tend to have forms of polymorphism that the Wasm type system will never be able to handle fully, so they won’t benefit either; and even if they could, they will certainly not want to redo their entire implementation for an efficient Wasm port. Uniform representation is a fact of life and a low-level VM like Wasm has to support it efficiently.
Okay, first, it does not mean that, because the second bit is only needed for pointers. Second, why would anybody possibly want to optimise their heap representation around weak references by putting an extra cost on all others??? |
The producer, i.e., the source language compiler certainly needs to be aware of such limits -- as with every architecture they target, physical or virtual. But it is in no way necessary to expose it to the programmer.
Only if you are talking about int32. Not in the case of all the other types I mentioned that are represented as small ints. You guys are way to fixated on int32 and keep overlooking everything else. ;) For int32, somebody has to do these checks, at least on 32 bit platforms. For Web assembly it wouldn't be unnatural to do that in user land. As I have said several times before, it is also conceivable that we introduce an i32ref type as well to defer that choice to the engine (such that it can be avoided on 64 bit platforms). But that is a different use case from i31ref, and a less fundamental one. (And it actually is unnecessary to make that a feature, as engines could simply unbox structs with a single immutable i32 field to achieve the exact same effect.)
Plenty of people have asked about compiling JS to Wasm. I agree with you that that is not an attractive thing to do, but for reasons that are mostly unrelated to GC. That does not change the general observation, though.
Let's see. For dynamic languages there are two possible heap representations:
Every serious dynamic language implementation uses (1). For polymorphic languages, there are these:
For example, pretty much all functional languages use (1). I am not aware of any major language using (2), since it's known to be fairly expensive. C# is the main example of (3), and that requires a jit (i.e., is incompatible with ahead-of-time compilation), obviously is very complicated to implement (even more so on top of Wasm), so will likely remain a rare technique. Java and JVM-languages effectively do (4), which is pretty terrible for the performance of polymorphism, but from what I heard there are plans to move away from it. C++ is an example of (5), but that solution only works for limited forms of 2nd-class polymorphism, so is not applicable to most modern languages (it's also inherently unmodular). |
In Lumen (Erlang/Elixir AOT compiler) for 32-bit (and therefore It uses 3 levels of 2-bit tags. Pointers (boxes and lists) and small integers use the lower 2 bits tags. That means small integers are stored as |
@KronicDeath, thanks for the data point. I think this would almost map to the GC proposal, except for the special cons tag (which is popular among Prolog and Lisp implementations as well). In particular, the Header tag wouldn't be needed when using GC types, since the respective information becomes abstracted away be the Wasm engine. Immediate would map to i31ref, with the secondary and tertiary tags being mapped into the int. I don't think much can be done about the cons tag. Supporting such a custom tag would require engines to reserve a "user bit" in the representation of all reference values. That is most likely incompatible with at least some existing engines. The closest we can do is introducing variant types in some future extension. Until then, cons will require a regular in-block tag. |
Erlang was originally implemented in Prolog, so that is Erlang showing its heritage. |
Not necessary, but practically required. Many languages (certainly the ones that map well to Wasm+GC) make guarantees about the number of bits in a type, or don't want performance to fall off a cliff. So the language hiding this from the programmer would be very undesirable. "Your int can fit 32bits.. but if you use them, we will transition to heap allocation!" ?
Because that is what languages need. What languages, ones that are a good fit for Wasm and the current GC proposal, are cool with.. you don't really need to know how big an int is. It may heap allocate at some random value?
Why? Knowing this statically doesn't seem impossible to me, in fact, common, and desirable.
They ask it because it is an obvious question, not because it has a sensible answer. Most people have no clue what magic V8 and friends do to their code, and why thus why compiling to Wasm likely will make it (a LOT) slower rather than faster. Either way, my point is in designing features like
Again, I am not considering dynamic languages here. If we wanted to support them realistically (e.g. competitively with their current implementations) we need a very different GC design and/or special Wasm features to support them (JIT?).
Yes, so 2/3/4/5 encompasses languages which can say " 1 then is all about FPLs, Ocaml, Haskell, and friends. Some of these languages probably could be using better implementation techniques like 3 or 5 but choose not to. So lets be clear here, |
What information? If the (static) integer parameter to the instruction is less than the number of packable bits, then it knows to trap if the value is a reference (something it has to check for with
You execute a bitwise-and and then trap if the result is non-zero. I've implemented this for other systems. The cost is trivial. If you really want to get rid of the branch, you can do a tiny more bitwise arithmetic to merge this with the check that it's not a reference.
I explicitly said that there is no guarantee how many bits will be boxed, and I gave a big argument about why this is useful without knowing that number. Take my example with the booleans. If the engine boxes the boolean, then it boxes the boolean. That engine happened to decide not to do packed integers, whether the reason be simplicity, confidence in correctness, ease of formal verification, or performance tradeoffs elsewhere. The point is that, for that engine, this is still the optimal way to support uniform representation of booleans. That is the purpose: to provide a way for wasm programs to uniformly represent small scalar values in a manner that will be implemented optimally for the engine at hand whatever it may be.
You explicitly told me to focus on small scalar values and not on
How about 16? Since you're insisting on not 32, that seems like the natural next step down. It's big enough to pack "booleans, enums, characters, small integers", so it satisfies your stated use cases. It's small enough to allow a lot of packing schemes.
Wait, what? A bunch of these languages have javascript implementations. They're redoing their entire implementations for wasm solely for efficiency. And that's a huge effort because wasm is so different! What makes you assume they wouldn't do a much smaller revision for a more efficient wasm port? (Language teams have already told me to expect to do this.)
I never said it wasn't. We're just disagreeing about how to enable uniform representation.
Ah, good point!
It doesn't impose an extra cost. The surface language doesn't let you dereference a pointer without knowing its static type (because after all you don't even know that it's a pointer without its static type in a uniformly represented language!) The static type tells you whether its a normal reference or a weak reference, so the compiler knows to apply the mask only in the latter case. Why producer-level packing is a bad ideaIn a lot of languages, something like |
@aardappel No, it isn't just "statically typed functional languages", there's also the wide range of dynamically typed languages, such as almost all Lisps, Python, Lua, Perl, Julia, Ruby, etc. They all need uniform representation, and they don't want to pay the cost of boxing small integers, so 31-bit ints are very common in compilers for those languages (with auto-promotion to boxed integers). Basically any language which isn't static will likely want 31-bit ints. I think you're focusing far too much on static languages while ignoring the large number of (very popular) dynamic languages. There's nothing wrong with Wasm including a feature which is useless for static languages but really beneficial for dynamic languages (or vice versa). |
@Pauan if you read my full posts above, you can see I actually address dynamic languages. I don't think the current GC proposal caters to them (they are currently much better served by linear memory or even JS objects), but feel free to prove me wrong. |
@aardappel That seems like an odd argument to me. You're essentially saying "we shouldn't have int31 because the GC MVP is currently sub-par for dynamic languages". Aren't those orthogonal concerns? And in any case, wouldn't the solution instead be to improve the GC so that it is suitable for dynamic languages (which is the long term plan)? So I don't see how removing int31 is a good solution. |
I'm starting to suspect that there is some misunderstanding about what boxing scenarios we are even talking about. The GC proposal readily supports unboxed native numbers in almost all situations, i.e., locally or when stored inside structs or arrays on the heap. Numeric values inside structs and arrays are not tagged! The only case where boxing would be necessary is when you needed to inject a standalone number into the heap, e.g., to pass it to a generic function as a reference. But that's exactly the situation where a language like Java will box its number types as well! So it is no worse than Java. You simply box native numbers in that situation and thereby avoid any of the more subtle performance cliffs you mention. It's even less of a problem if your language's generics are weak enough to allow monomorphisation, i.e., static specialisation a la C++ templates. Then you'll never run into the case at all. Does that alleviate your concerns?
As clarified above, I believe for the kinds of scenarios you have in mind that is certainly the case, whether Wasm GC uses tagging or not.
That is a very narrow set of languages, all from the same corner of the spectrum, and none of them is dynamic or handles polymorphism very efficiently. It's a stated goal of the GC work to take a broader perspective than that. The combination of heterogeneous structs/arrays and unboxed scalars in the proposal is meant to support both the Java-like use case and the FP use case equally well (except that the latter will still need lots of casts in the MVP). Makes sense? (Go is way tougher, due to its decision to make interior pointers statically indistinguishable from regular ones. I'm not sure what the chances are that all engines would be willing to support that efficiently. For the MVP, we certainly have to pass on that one.)
People are implementing languages like Scheme and Erlang in Wasm today, so I don't know what exactly you base that claim on. Fortunately, not all dynamic languages are even remotely as crazy as JavaScript. The GC proposal should be a sufficiently decent fit for the others -- that's the intention anyway. Do you care to elaborate why you think it isn't?
You seem to be implying two things: no GC'ed language besides the narrow breed of Java-like ones really matters (including dynamic ones). And that all these other languages picked their implementation approach either out of laziness or out of ignorance. Neither is true, and I would suggest to be more thoughtful with assumptions like that. As mentioned, (5) is not an option for almost any modern language. (3) relies on jitting, i.e., is inherently incompatible with AOT, and generally is the most expensive and complex approach (which I doubt would fare well with the double stage jitting it would induce on top of Wasm). To clarify, high-level languages like functional ones typically choose tagging because their use of polymorphism tends to be way more expressive and pervasive (as in: every other function call is likely polymorphic). That puts their profile much closer to dynamic languages. With such a profile, heavyweight techniques like (2) or (3) quickly turn out to be inferior, let alone (4). |
As you say, this implication only holds if the integer parameter is less than the unboxing limit. If the producer cannot know what the unboxing limit is then it generally has no way of picking an adequate parameter. It must assume the worst case for any choice it makes. That worst case is two checks and two branches on every access to an "unboxed" value on the heap -- as opposed to a mere shift with an explicit type like i31ref.
Sorry if I didn't bring up this particular example before. There is a lot of rationale behind the design, some implicit, some explicit. I cannot exhaustively repeat it in every comment, nor can I always guess which goals are self-evident to whom. Either way, I think we agree that this clearly is a relevant use case for unboxing (and representative of a larger class). (FWIW, a lot of the points I keep reiterating in this thread are already explained in the Overview, though I wouldn't claim that to be exhaustive either. I don't think it mentions the bignum example.)
Sorry, I don't follow. The anyref->intref test is the only branch, and always needed. After that, it should not be necessary to repeat the pointer test in the generated code.
In today's world of Unicode, characters are 21 bits. ;) Also, this obviously is very wasteful, see the second bullet. For example, every bignum would take up approximately twice as much memory. At the same time, there is no conceivable engine architecture that would align individual allocations to 64 K and benefit from the ability to use 16 tag bits, so this seems like a rather silly choice to make.
Building a compiler to Wasm off a JS backend (other than asm.js) is a terrible fit -- one is a very high-level target, the other low-level, and they have very little in common. Any language that has an existing native code or VM backend (i.e., most) will want to build their Wasm compiler off such a middle/backend. I am not aware of many counter-examples.
From what I have seen, this wouldn't be a small revision.
Always remember: knowledge of the engine =/= knowledge of the producer. The engine cannot know that, unless the producer can communicate that knowledge through the Wasm type system. But if the Wasm type system distinguishes weak references from others then there is no need to distinguish them dynamically anymore (unless you allow injecting them into anyref, but there is little point in allowing that, and even if we did, an extra indirection for weak refs in the engine solves that just fine). The extra cost with multiple pointer tags is that dereferencing always has to do masking instead of using static offsets. In case you haven't seen, one particularly cheap tagging scheme is to set the lsb to 1 for pointers and use offset -1 in all load/store instructions, which is essentially free on many architectures. But that only works if there is only a single pointer tag.
Sorry, but I really don't know what you are talking about. First, in a language where enums are equivalent up to reordering, the compiler can simply normalise the order lexicographically. Alternatively, it can use hashes of the constructor names as their runtime representatives (which even works with structural subtyping or row polymorphism). I can easily point you to examples of both techniques in existing compilers. Second, and more importantly, this has absolutely nothing to do with tagging or whether you map the enum to 31 or 32 bit integers in Wasm. I also don't follow what this has to do with hacks on the web, or why it is exposing anything to the "world" other than the engine, or in a way that is specific to enums... |
@aardappel writes:
I have worked on a Scheme implementation that used to use linear memory and currently uses JS objects via anyref. Linear memory is bad because you have to maintain a shadow stack to mark, and you cannot solve cycles between wasm and the host (JS). JS solves those problems but introduces more dynamic checks and run-time call-outs than would be necessary; it's not in the spirit of a low-level VM. The GC MVP solves all of these problems, quite neatly IMO. |
I am quite aware
I argued that the set of languages for which Not sure if mainstream-ness should be an argument. Maybe just a tie-breaker ;)
Sure..
Which are using linear memory and/or JS objects? I'm purely arguing in terms of what features make sense for the GC proposal.
Maybe I am missing something, but most of these languages represent all their data with something that looks like a dictionary + a variety of other, more basic types. I don't see how you'd represent any of that with this proposals typed structs and arrays, that can't be done better with whatever custom representation they are currently using in linear memory (or as JS object). Or rather, if the best way to represent a dynamic language in the GC proposal is significantly worse than those other options, that counts to me as not catered for.
I can't imagine where I said that. My mention of that set of Java-likes was merely as a counter to your assertion that pretty much "all languages" need And its not that narrow. Once you remove dynamic languages and "languages happier with linear memory anyway", these Java-likes cover the remaining mainstream, and more importantly, languages the GC proposal appeared to be targeting most.
I pointed out that their implementation technique is not set in stone, and that there are other options. You put the words "laziness" and "ignorance" in my mouth. I would suggest to be more thoughtful with characterizations like that.
I'd say most any (static) language can do this, if they don't mind link-time code generation (simulating a closed world model).
The argument was about existing implementation techniques, which so far have not been architected for Wasm. |
Sabine wrote:
F# is generally much faster than OCaml in practice, often several times faster. The only place where tagging is competitive is when the heap is composed almost entirely of pointers and enums which is common in pedagogical examples but extremely rare in industrial code.
OCaml's tagging strategy consumes more memory whilst running slower.
OCaml's strategy enables extremely simple implementation. The entire GC could be just ~100 lines of code. It is also trivially applicable to untyped languages. However, my bugbear is that OCaml's strategy fails to leverage static type information to recover C-like performance which is really frustrating because all of the information required to generate really fast code is right there and OCaml just throws it all away and generates (sometimes) really awful code instead. Andreas Rossberg wrote:
But .NET does whole-program monomorphisation and handles polymorphic recursion?
Where do Julia and Swift fit into that?
I think this is the killer argument: a uniform representation is vastly easier to implement. |
Okay, now your argument isn’t about the design but the design goal. Supporting a diverse range of languages and paradigms has been a stated goal for Wasm GC and its MVP from the beginning. It is also the stated goal for Wasm in general. And it has been agreed long ago that the GC design ought to be validated on representatives of at least OO, FP, and dynamic languages, which are the 3 dominant classes of GC’ed languages.
What else should they be using right now? Are you implying that they are satisfied with the status quo? I think two people on this thread already told you otherwise.
Dynamic languages are not all centered around dictionaries, where did you get that idea? Just consider classics like Lisp & Scheme, Erlang, Prolog and so on. And for the OO ones you are probably thinking of: uniform representation may or may not be sufficient, but it's still necessary. (I would hope that dictionaries can be implemented in Wasm user space, but I wouldn’t make any performance predictions. A workable implementation of, say, Lua seems totally feasible with this proposal, however.)
I believe I said the majority of dynamic or polymorphic languages. And I stand by that.
There is no valid reason to remove dynamic languages from the equation just to define the problem away. If the current Java-like mainstream was all we cared about, then we could just tweak the JVM and go home.
Well, you seemed quick to assert that they could make different choices. Why do you think they made the existing choices? Why do you think that practically all FP and dynamic languages end up with uniform representations if there are better alternatives? Did you try to understand what constraints and use cases they have and what cost trade-offs are involved? Are you aware that there has been quite a bit of research about efficient memory representations for such languages?
In the Wasm setting, linking is dynamic and link-time = runtime, so this becomes more or less indistinguishable from (3) and would e.g. require shipping a jit with each webpage. More importantly, though, as I said before, your guess is simply wrong for any sufficiently interesting type system. Simple (contrived) counter example in pseudo code: struct Pair<A, B> { x : A; y : B; } func bar(n : int, v : T) { There are many other, more involved and probably more practically interesting typing features (first-class polymorphism, existential types, GADTs, inductive type families, ...), this is just a particularly easy one to demonstrate.
Not sure how that addresses what I said. Either way, we want to allow porting native implementations to Wasm, not require redoing them from scratch with some yet to be invented techniques. For better or worse, pointer tagging is a common technique in GCs.
I question those claims, unless you equate industrial with numeric — which I know is your main use case, but not the typical use case for OCaml (or the web). And don't forget that F# threw out half of OCaml's features to be able to run efficiently on .NET. Also note that this proposal, unlike OCaml, supports heterogeneous structs, so should adapt reasonably well to both worlds.
.NET does do runtime code specialisation, I was referring to monomorphisation, i.e., static specialisation.
I don’t know enough about Julia, but Swift inherited reference counting from Objective-C, so is not in the same boat.
No, as mentioned before, the alternative requires a JIT, so is not possible, allowed, practical, affordable, or even performant in some environments. On the Web, for example, each web page would have to include a complete JIT for its language (on top of Wasm). I’m pretty sure the impact on both latency and performance would make that a rather unattractive solution. |
Numerics is certainly one place where .NET is often much faster than OCaml but there are many others. Here are some of them:
For example, this non-numerical program is over 3x faster in F# (0.8s) than OCaml (2.7s) if you unbox the variant types:
Ah, I see. If you make it support both unboxing/untagging and features like polymorphic recursion then you need JIT which would suck, agreed. I was suggesting it as a default that languages could use if they didn't support such features, i.e. when there is no need for a JIT. Languages that do could use a uniform representation. |
* Adjust element segment encoding to match the bulk-memory proposal * Update interpreter/binary/encode.ml Co-Authored-By: Andreas Rossberg <rossberg@mpi-sws.org> * Address comments
This discussion covered many different pointer tagging and value packing schemes and abstract discussion about related design principles for Wasm. Now that we have concrete implementations of the proposal, the best way to continue this discussion would be to raise concrete suggestions as new issues. |
An alternative to the i31ref would be to support 'value types' as well as 'reference types'.
Semantically, one of the differences is that instead of the tagged value determining that the value is 'flat', the owner of the reference would have to assert the type.
The big pay-off is a uniformity of representability of data types (the only required primitive type would be bit string(s))
You can reconcile value types with existentially quantified types - at the expense of a more elaborate system of allocating local variables.
My suggestion would be to supplement reference types with value types - I am not advocating C++-style specialization of generated code.
The text was updated successfully, but these errors were encountered: