Clarification on type rolling and unrolling #578

CharlieTap · 2025-01-12T18:20:04Z

CharlieTap
Jan 12, 2025

I'm trying to get an understanding of what state recursive types should be in at the various semantic phases. By state I mean whether they contain type indices, recursive type indices, or have had these swapped out with concrete defined type references.

Looking at the reference interpreter I can see that during validation, you roll up all the types such that they contain their recursive indices. My understanding is that types in the validation context are always in this state containing recursive type indices. Some type unrolling may happen during validation when matching defined types/recursive types but this would not alter the context and thus types in the context never contain type indexes or concrete defined types.

During execution I can see you you substitute the recursive type before rolling, I imagine this is to replace the type indices with concrete defined types and prevent them being replaced with recursive type indices during rolling?

So types in the module instance never contain type indices or recursive type indices only concrete types? Maybe so they they work across module boundaries?

If so could the defined type matcher skip substitution of the defined types during matching knowing this invariant?

I'm looking at ways to optimise my algorithm around type validation and matching and want to be sure I really understand the relationships and transformations

Answered by rossberg

Jan 13, 2025

It may be useful to think of it in terms of the difference between "syntactic" types, as they appear in the source program, and "semantic" types, which are used in most of the semantics (validation, linking, execution).

Syntactic types contain type indices from a module's type index space. Since these are module-local, they are not generally meaningful in any context where types can flow outside their original module, or where types from different modules mix, which is everywhere but validation. Hence, in the spec (and the interpreter), all original type indices are substituted away ("closed") before their use at link time or run time.

Validation is a bit different. While the types in the…

View full answer

jakobkummerow · 2025-01-12T19:40:21Z

jakobkummerow
Jan 12, 2025

I'm not sure there's an official statement on "should" (in part because it depends on engine design and constraints), but as a fellow implementer, what I can say is that this is super hard to get right, and we (V8) had several severe bugs in our type handling.

Keep in mind that there are three different kinds of type indices:

"regular" module-relative indices, same as what existed before recursion groups were introduced.
group-relative indices. You need these when comparing two recursion groups to determine whether they are canonically the same or not. I don't think you need them for anything else.
canonicalized indices. After matching up two recursion groups (which may or may not come from different modules), you'll likely want to assign "canonical" type indices to the types in them.

It is tempting, for example, to store group-relative indices in your storage of canonicalized types, because that's what you'll need when the next recursion group comes along and needs to be canonicalized. However, then you can't use these stored types directly for other purposes, because most other uses of types don't want group-relative indices (but instead, depending on your design choices, module-relative or canonical indices).

It is also tempting to think that you can just convert all types to group-relative types for recursion group canonicalization, but that's not the case: types in a recursion group can refer to both other types in the recgroup (e.g. group-relative index 1), and to other types defined in earlier recgroups (e.g. module-relative index 1). Obviously (struct (field (ref $group-relative-index-1))) and (struct (field (ref $module-relative-index-1))) aren't generally the same (unless the group happens to be the first group in the module).

It is also tempting to just use canonicalized types for everything (except canonicalization of new recursion groups). The biggest difficulty with that is code sharing/caching: statically pre-assigning canonicalized types isn't feasible because there are too many possible types, so you'll need to assign them on demand as you decode modules, so they'll depend on the order in which modules were decoded. You'll very likely need to embed some type indices into the code you compile (e.g. to implement ref.cast), so if you want to cache generated code across different possible decoding orders, you'll have to either embed module-relative indices (which are, of course, stable for each module), or devise a patching scheme (to patch the code you deserialize from your cache).

To make it easier to catch bugs, I would recommend to make use of your implementation language's type distinction facilities. For example, in C++, instead of using uint32_t for every kind of type index, use something like

struct ModuleRelativeTypeIndex {
  uint32_t index;
};
struct CanonicalTypeIndex {
  uint32_t index;
};
struct GroupRelativeTypeIndex {
  uint32_t index;
};

so your compiler warns you when you accidentally pass one of these to a function that expects one of the others. That's a bit annoying because it adds duplication or complication or templatization or something (because it doesn't end with those three structs: they'll trickle up into whatever classes you use to represent types), but in our experience it's worth that cost.

0 replies

rossberg · 2025-01-13T11:29:44Z

rossberg
Jan 13, 2025
Maintainer

It may be useful to think of it in terms of the difference between "syntactic" types, as they appear in the source program, and "semantic" types, which are used in most of the semantics (validation, linking, execution).

Syntactic types contain type indices from a module's type index space. Since these are module-local, they are not generally meaningful in any context where types can flow outside their original module, or where types from different modules mix, which is everywhere but validation. Hence, in the spec (and the interpreter), all original type indices are substituted away ("closed") before their use at link time or run time.

Validation is a bit different. While the types in the typing context are always semantic as well, as you note, that is not the case for the types occurring in instructions, which have to be inspected during validation. Take the typing rule for br_on_cast, for example: that has to compare the types rt1 and rt2, and both are syntactic, i.e., non-closed. There are two basic ways to deal with this: either "eagerly" close all types in every rule before invoking subtyping, or "lazily" look up their definition inside the subtype check. I had originally used the former formulation, but that lead to a lot of clutter (and bugs) in many rules, so I migrated to the latter solution, which factors that better. Hence the context is needed for the subtyping judgement during validation.

Of course, an implementation is free to employ the eager approach, which should be equivalent. Engines essentially do that when they canonicalise types in a pre-pass.

The other part of your question involves recursive types. When closing types, we have to distinguish two occurrences of type indices: those referring to types from other recursion groups, and those referring to types from their own recursion group. The former are substituted by deftypes, while the latter are substituted by "recursive type indices", courtesy of the roll operation. Hence, types in the module instance do contain recursive indices, but they are still always closed, that is, recursive indices only occur locally inside a rectype that binds them, never outside. To maintain this invariant, inspecting a recursive type requires unrolling, which replaces the internal indices with the actual type.

To some degree, these are just artefacts of a term-based presentation of types. Another, equivalent view is to think of semantic types as graphs. If we didn't have recursive types, then these graphs would always be acyclic, i.e., the usual expression trees representing type terms. In such a graph, every node corresponds to a type while every directed edge is a reference to another type. With recursive types, these graphs can now be cyclic; types are equivalent when they form identical graphs. In this representation, recursive type indices are not necessary, they become back-edges in the graph (from a type to its rec node). In a term-based representation, however, we need them to break up cycles, because terms have to be finite.

0 replies

CharlieTap · 2025-01-13T14:53:31Z

CharlieTap
Jan 13, 2025
Author

Thank you guys for the awesome replies, both were super helpful in different ways! The syntactic vs semantic concept is a really intuitive way of thinking about.

When comparing defined types for equivalence, for example in defined type matching after the pointer check but before the unrolling. Here they are completely closed, and contain no more type indices and only recursive type indices or references to defined types. How are your preventing infinite recursion when comparing the defined type heap type references, for example my HeapType encoding looks like this:

Kotlin encoding

sealed interface HeapType : Type

sealed interface AbstractHeapType : HeapType {

    data object Func : AbstractHeapType

    .... others

}

sealed interface ConcreteHeapType : HeapType {

     // module
    @JvmInline
    value class TypeIndex(val index: Index.TypeIndex) : ConcreteHeapType

    // group relative
    @JvmInline
    value class RecursiveTypeIndex(val index: Int) : ConcreteHeapType

    // canonical
    @JvmInline
   value class Defined(val definedType: DefinedType) : ConcreteHeapType
}

Looking at @jakobkummerow's example CanonicalTypeIndex is just an i32 so i guess its just a pointer to the actual defined type, and thus In my scenario I should override the equals and just compare pointers?

0 replies

rossberg · 2025-01-13T16:13:53Z

rossberg
Jan 13, 2025
Maintainer

One central property of the type system is that equality or subtype checks do not actually need to expand/recurse on recursive types. For two def types to be equivalent, they need to have the exact same recursive type indices in the same places. If they don't in just one place, a type equality check can immediately return false. Similarly, for subtyping, a rec index on the left can only match either the exact same rec index on the right, or it has a declared supertype (looked up in C.recs during the subtype test) that itself matches the rhs. That's the only place where you recurse when encountering a rec index, but you do so with a supertype on the left. Since the declared type hierarchy is checked to be acyclic, this can never enter a cycle, and is guaranteed to terminate.

0 replies

CharlieTap · 2025-01-13T21:19:36Z

CharlieTap
Jan 13, 2025
Author

So I figured out why my code was getting itself into a recursive loop and I think its worth outlining my issue because this might help someone in the future.

If you follow the algorithm in the reference interpreter verbatim everything works fine, but the performance is prohibitive when dealing with modules with large amounts of recursive types (In the thousands). This is because the the substitutors allocate rather than mutate at every stage of substitution, i.e allocate new rec type -> allocate new sub types -> allocate new superTypes and composite type etc etc until the type has been traversed. Substitutors will allocate even if the type is already closed, which results in recursive types being substituted over and over again during the validation and execution phases.

So you switch to mutation, however if you unroll a recursive type during matching, and mutate its heap types from recursive type indices to concrete defined type references, you will now have a defined type which contains references that point back to itself. Instead of this, always deep clone the recursive type during unrolling, and if you want good performance maintain these clones in a cache of unrolled sub types.

TLDR: Defined Types should never contain unrolled recursive types

0 replies

rossberg · 2025-01-14T15:22:32Z

rossberg
Jan 14, 2025
Maintainer

Yes, the implementation in the interpreter is intentionally naive, to represent the paper rules. I would not recommend copying it for a real engine. ;)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on type rolling and unrolling #578

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Clarification on type rolling and unrolling #578

CharlieTap Jan 12, 2025

Replies: 6 comments

jakobkummerow Jan 12, 2025

rossberg Jan 13, 2025 Maintainer

CharlieTap Jan 13, 2025 Author

rossberg Jan 13, 2025 Maintainer

CharlieTap Jan 13, 2025 Author

rossberg Jan 14, 2025 Maintainer

CharlieTap
Jan 12, 2025

jakobkummerow
Jan 12, 2025

rossberg
Jan 13, 2025
Maintainer

CharlieTap
Jan 13, 2025
Author

rossberg
Jan 13, 2025
Maintainer

CharlieTap
Jan 13, 2025
Author

rossberg
Jan 14, 2025
Maintainer