From 414746d479fccb8110e02d09a0cd484fc316d291 Mon Sep 17 00:00:00 2001 From: Chris Fallin Date: Tue, 17 Aug 2021 01:01:12 -0700 Subject: [PATCH 1/4] RFC: design of ISLE instruction-selector DSL. This is based on extensive discussions surrounding the ideas in bytecodealliance/rfcs#13 and is meant to be a refinement of earlier ideas that is compatible with our goals related to verification, and is able to build on top of Peepmatic's automaton abstraction. --- accepted/cranelift-isel-isle-peepmatic.md | 874 ++++++++++++++++++++++ 1 file changed, 874 insertions(+) create mode 100644 accepted/cranelift-isel-isle-peepmatic.md diff --git a/accepted/cranelift-isel-isle-peepmatic.md b/accepted/cranelift-isel-isle-peepmatic.md new file mode 100644 index 0000000..2618e21 --- /dev/null +++ b/accepted/cranelift-isel-isle-peepmatic.md @@ -0,0 +1,874 @@ +# Summary + +This RFC proposes a new instruction selection DSL for Cranelift, based +on initial discussions begun with a pre-RFC (bytecodealliance/rfcs#13). + +This RFC proposes the name "ISLE" (Instruction Selection/Lowering +Expressions), pronounced like the word for a small island (silent +'s'), which is conveniently also an anagram of "isel". + +This DSL combines aspects of an earlier prototype design that the +author has developed with ideas from term-rewriting systems, and in +particular plans to implement on top of Peepmatic's automata-based +compilation strategy with a new Rust-generating backend. + +The basic principles, in terms introduced/framed by the pre-RFC linked +above, are: + +- Expression/value-based design with term-rewriting-like semantics. +- Interaction with existing system, including rich query APIs, via + "virtual nodes" defined by external extractor functions. +- Typed terms. +- Rule application order guided by explicit priorities, but + semantically undefined (hence flexible). +- Single-pass lowering semantics via unrolling. + +We will dive into each of these principles in more detail below. + +# Intro: Term-Rewriting System + +The basic concept behind the sort of system that this DSL embodies is +that of *term-rewriting*. In brief, this means that the input and +output of the system are trees (or more general graphs) of *terms* +(symbolic nodes with arguments), and the system transforms input to +output by applying a set of *rewrite rules*. Each rule matches on some +*pattern* -- for example, a term of a particular type with an input +that matches some sub-pattern -- and replaces it with a new tree of +terms. + +This is a natural abstraction for transforming code. If we define a +term view of the IR on the input side and machine instructions on the +output side, such that a term represents an operator that consumes +values and produces a value, then the rewrite rules just define +*equivalences*. These rules are easier to validate than other sorts of +lowering logic because "all" one has to do is to show that two values +are equivalent. + +Most importantly, this foundational concept provides a *simple, +understandable semantics*. The importance of this cannot be +overstated: if the core of the DSL is complex because its design is +intertwined with a particular backend design or strategy, or other +incidental details, then one will not be able to use it effectively +without understanding implementation details. On the other hand, if we +have a clean rewrite-based semantics, one can easily convince oneself +that a new rule is safe to add as long as it denotes a true +equivalence. This is similar to how "equational reasoning", in +functional programming terms, makes a program's semantics easier to +understand (i.e., one can always substitute a function for its +definition). + +The order in which rewrite rules will be applied can be controlled by +user-specified priorities (attached to extractors; see +below). However, the language definition is careful to *not* specify +which rule must be applied first. This leaves room for the DSL +use-case to apply rules differently: for example, we might test/fuzz a +backend by applying rules in a different order. One should also be +free to tweak priorities to attempt to optimize, without fear of +breaking the semantics. + +Note that the ordering is only undefined when there are multiple legal +choices according to the types and available rules. In other words, +this does *not* mean that we will arbitrarily steer into a dead-end in +the graph of rewrite steps; an instruction that is lowerable should be +lowerable with any rule ordering heuristic. + +# Extractors: Programmable Matching on Virtual Nodes + +The first key innovation that this DSL introduces is in how it allows +patterns, or *left-hand sides* of rewrite rules, to *match* on input +terms. + +A vanilla term-rewriting system operates on trees of nodes and +transforms a single input tree (which is usually reified as an actual +in-memory data structure) into a single output tree. While this is an +easily-understood model, the requirement to define a *single* view of +the input can unnecessarily restrict expressivity. Said another way: +we sometimes want to use terms to represent certain properties of a +value, and it is difficult to arrange the terms into a tree and then +write the pattern-matching rules in a way that is flexible enough to +support multiple types of queries. + +### Motivation + +For a concrete example, consider if we had a term type representing a +constant integer value + +```plain + (iconst 42) +``` + +and we are performing instruction selection to lower to a CPU +instruction set that supports multiple different *integer immediate +formats* for its instructions. For example, AArch64 supports a 12-bit +immediate (unshifted or shifted left 12 bits) in some of its +arithmetic instructions, and a special "logic-instruction immediate" +with bit-pattern generation in other instructions. In a conventional +term-rewriting system, with helper functions defined appropriately, we +might have rules like + +```plain + (=> (iconst val) + (when (is-aarch64-imm12-value val)) + (aarch64-imm12 val)) + (=> (iconst val) + (when (is-aarch64-logical-imm-value val)) + (aarch64-logical-imm val)) +``` + +which turns the `iconst` into an `aarch64-imm12` node and/or an +`aarch64-logical-imm` node when the predicates allow. Then we might +have rules that pattern-match on `aarch64-imm12` or +`aarch64-logical-imm` terms in certain argument positions in certain +instructions: + +```plain + (=> (iadd ra imm@(aarch64-imm12 val)) ;; bind `imm` to subterm + (aarch64-add-reg-imm ra imm)) + (=> (ior ra imm@(aarch64-logical-imm val)) ;; bind `imm` to subterm + (aarch64-or-reg-imm imm)) +``` + +The question then becomes: how do we build a lowering engine that can +successfully use these rules? If we start with the semantics of "apply +any rule that fits", we might have a situation where an immediate +value (say, `1`) can be represented in *multiple* forms, so it might +be nondeterministically rewritten to either of the above terms. + +What we want instead is for a directed search of sorts: starting from +the top-level opcode match (`iadd` or `ior`), we try to rewrite into +only the appropriate immediate form. This is what the equivalent +handwritten backend code would do. More generally, we don't want to +require a *general graph-search problem*, where the backend has to +find a path that connects through the graph of possible rewrite states +from beginning to end. Rather, we want something that can be executed +in a more directed form. + +## Extractor (Left-hand Side) Functions + +The key idea that we introduce is that of the "extractor +function". This looks like a term on the left-hand side of a pattern +-- in fact, a rule that uses one will look identical to the above: + +```plain + (=> (iadd ra (aarch64-imm12 val)) + (aarch64-add-reg-imm ra val)) +``` + +The essential difference is that this is not matching on an +already-rewritten `aarch64-imm12` term. Rather, it is *invoking an +extractor* called `aarch64-imm12` that is a sort of programmable match +operator. It receives the value that it is meant to match (here, the +original `iconst` term) and produces either values for its subterms, +or fails to match. In other words, it is exactly the *reverse* of a +normal function application -- for the type + +``` + (declare-extractor aarch64-imm12 (AArch64Imm12) IConst) +``` + +the extractor is a (partial) function *from* the `IConst` type (what +would be the "return type" for an ordinary function) to the +`AArch64Imm12` type (an "argument"). More generally, the extractor can +extract one value into multiple values, just as a forward function can +have multiple arguments. + +This feature was largely inspired by the Scala `unapply` function +which can be defined on classes, as the dual of `apply`, to allow for +programmable `match` semantics in a very similar way. + +## Equivalence to Base Term-Rewriting Forms + +Note that so far, this is actually completely equivalent to a base +term-rewriting system with only tree pattern-matching. The only +distinction is that the extractor makes the attempt to perform a +particular rewrite *explicit*, or *subordinate* to another rule +application. It allows for natural structuring of rewrites through +intermediate steps so that we don't need a more general scheduling +algorithm. + +## External Extractor and Constructor Invocations + +The next conceptual leap is that the extractor functions can serve as +the *sole* way to access the input to the term-rewriting system if we +allow them to be *externally-defined*. In other words, the system does +not need to have a native data structure or abstraction for a tree of +terms/nodes. Rather, it can have *only* extractor functions that lower +to calls to external Rust functions. Every "match" operation in the +left-hand side of a pattern is a programmable operation that bottoms +out in "runtime" code of some sort, and the DSL compiler does not need +to have any application-specific implementation of match logic. (As an +optimization, we allow constant integer/enum value patterns directly, +but these could be custom extractor functions with no fundamental loss +of expressivity.) + +This is a sort of "foreign function interface" or "FFI" for ISLE: it +allows the backend author (or more likely, the author of a common +"prelude" that provides definitions for all machine backends) to build +abstractions in ISLE on top of the rest of the compiler's backend +APIs. + +So far we have seen a way to ingest the *input* tree (i.e., the IR), +but not to produce the *output* tree. To handle this in an analogous +way, we define "constructors" that are also defined externally and +invoked from the DSL. This allows slightly more flexibility than a +pure "output is a tree of Rust enums" design would allow: for example, +one can have a `(new-temp ty)` constructor that can be bound to a +variable (see `(let ...)` form below) and used in the output tree, +which in the generated Rust code becomes a call to +e.g. `LowerCtx::alloc_tmp()`. + +## Virtual Internal Term Types + +So far above, we have only discussed matching on the input tree (via +extractors) and producing the output tree (via constructors). However, +if a chain of rules applies in succession, with a rule matching on a +term that was produced by a prior rule application, we ideally should +allow the rules to "communicate" directly without calling out to +external constructors and extractors. Below we describe the +"single-pass" compilation strategy; for now we just note that in +addition to the above mechanisms, we allow for "virtual" terms that +are never actually constructed but only connect rules together. + +# Heterogeneous Term Types + +A traditional term-rewriting system has one "type", the expression +tree. However, when building instruction lowering, there are often +pieces of an instruction, or intermediate knowledge constructed about +a certain subexpression, that *should* have a more specific type. One +could imagine a series of rewrite rules that transform the address +input to a load instruction into an `(Amode ...)` term that encodes a +machine-specific addressing mode; but this is not a general +instruction, and should not be interchangeable with others. At an even +more basic level, a set of lowering rules from a machine-independent +IR to a machine instruction set deals with two disjoint sets of node +kinds (the input and the output). Ideally these would be typed +differently. + +To address this need, the proposed DSL assigns a type to every term. A +particular term symbol -- which may denote an extractor function, a +constructor function, or an internal term -- has a static type. This +allows "FFI" signatures to be statically typed as well. + +Concretely, the types correspond to Rust enums or primitive types in +the generated Rust code. When a term's type is a Rust enum, there are +implicit extractors and constructors corresponding to that enum's +variants. + +# Language Design: Types, Rules, and Export/Import Forms + +## Types + +First, we allow the user to define types. The type system permits two +kinds of types: primitives, which are always (in Rust terminology) +`Copy` integer-like types, and enums, which correspond to Rust enums +with variants that have named, typed fields and are semantically +passed-by-value (though passed by borrows whenever applicable in +generated code). + +A type declaration can take two forms: it can define a new type local +to the DSL definitions, used solely for internal terms (those that are +produced by some rules and consumed by others, but never present in +the final output of the system); or it can define a type that is also +present in existing Rust code that will be linked with the generated +code, so we merely refer to it. + +A type declaration is then: + +```plain + + ;; Enum with two variants. + ;; Field types can be primitives or other enums. + ;; + ;; Note that types must be acyclic: an enum field + ;; cannot refer to the containing enum or any other enum + ;; that refers to it. (We may relax this later if needed + ;; but it requires reasoning about boxing so adds complexity.) + + (type A (enum (B (x u32) (y u32)) + (C (z u32) (w u32)))) + + ;; Same, but we do not emit a Rust `enum B { ... }` for it; + ;; we assume it has been declared elsewhere, and merely + ;; use it. Note that variants that are not matched on or + ;; constructed can be omitted. + (type B extern (enum ...)) + + ;; Particular example for `Opcode`: if variant types are declared + ;; without fields here then they will be matched on and constructed + ;; in field-less form (e.g. `Opcode::Iadd => { ... }`). + (type Opcode extern (enum Iadd Imul Isub Iconst ...)) + + ;; Primitive type. These are presumably included in a prelude + ;; and should not normally need to be declared by a compiler + ;; backend author using the DSL. + (type u32 primitive) +``` + +Once we have the value types defined, we define *type signatures for +terms*. Every term that is used -- as an internal node, as an +extractor or as a constructor -- needs types for its arguments, and +the whole expression is given a type (analogous to a return-value +type). + +```plain + + ;; `Iadd` is a term symbol, and `(Iadd ...)` terms take two arguments + ;; of type `Value` and produce a result of type `Value`. + (decl (Iadd Value Value) Value) + + (decl (Iconst u32) Value) +``` + +Now, given a proper type environment for values and terms, we need to +define ways to process them: with extractors and constructors, and +with internal rules. A particular term can *either* have an external +extractor and/or constructor function, or can have an internal rule, +but never both. If a term is to be defined by an extractor and/or +constructor, it must be declared with an `extern` like so: + +```plain + (decl extern (MyTerm u32) u32) +``` + +## Defining External Extractors and Constructors + +First, we define *extractors* and *constructors* for some terms. These +terms are those that represent the external interface to the +instruction selector rules. Extractors, as introduced above, provide a +view onto a virtual term tree at the input. + +One can think of an extractor like a "reverse function": it is given +the value of the whole term tree that the extractor is meant to match, +and if it matches successfully, it returns values for each of the +arguments. Extractors are attached to terms as follows: + +```plain + + (decl extern (concat u32 u32) u64) + (extractor concat "unconcat") +``` + +This indicates to the DSL compiler that the `concat` term can be used +in the left-hand side of a rule (as described below), and when a match +is evaluated, a Rust function `unconcat` will be called. This external +function is assumed to have the signature: + +```rust + + fn unconcat(context: &mut C, value: u64) -> Option<(u32, u32)>; +``` + +If the function returns a `Some((a, b))` value, the match was a +success and the arguments' values are recursively matched by +sub-patterns. + +Similarly, a constructor can be attached to a term as follows: + +```plain + + (decl extern (concat u32 u32) u64) + (constructor concat "concat") +``` + +which implies the existence of a Rust functioN: + +```rust + + fn concat(context: &mut C, val1: u32, val2: u32) -> u64; +``` + +Note that a constructor, unlike an extractor, cannot fail. More on the +execution model below. + +## Defining Rules + +The final way to define a term -- and the most common -- is to provide +a set of pattern-matching rules for it. A rule consists of two parts, +the *left-hand side* and the *right-hand side*. The rule specifies +that whenever a term of form that matches the left-hand side exists +and is to be reduced, it can be rewritten as the right-hand side. + +Several examples of rule definitions follow: + +```plain + + (rule (A x y z) (B x y z)) ;; rewrite (A x y z) to (B x y z) + + (rule (A x @ (B 42 _)) ;; match (A (B 42 _)), binding x to the B subterm; + ;; the `42` matches a constant value; + ;; the `_` matches anything (wildcard). + (C x)) +``` + +Formally, a rule is defined as `(rule pattern expr)` where the grammar for +the left-hand side `pattern` is: + +```plain + + pattern := (extractor pattern*) + | (internal-rule-term pattern*) + | constant-integer + | constant-enum + | (enum-variant pattern*) + | variable @ pattern + | `=` variable ;; Match same value as variable bound + ;; earlier in the pattern. + | (and pattern*) ;; Match with multiple patterns. + | `_` ;; Wildcard. +``` + +and the right-hand side `expr` is: + +```plain + + expr := (constructor expr*) + | (internal-rule-term expr*) + | constant-integer + | constant-enum + | (enum-variant expr*) + | variable + | (let ((variable expr)*) expr) +``` + +The evaluation semantics are described in more detail below. Note that +while the invocations of extractors (in a pattern context) and +constructors (in an expression context), builtin enum destructuring +and building, and binding and uses of variables all behave in +relatively straightforward ways as described above, the use of terms +defined by other rules, in both patterns and in expressions, is +somewhat subtle and will be discussed further under "Single-Pass +Elaboration" below. + +## Exporting an Entry Point + +Finally, once we have a series of rules that define an instruction +lowering, we must provide an *entry point* by which an external user +can start the rule-matching process. + +This may seem a bit odd at first: isn't the concept of an "entry +point" somewhat tied to an imperative view of the world, as opposed to +a list of declarative expression-equivalence rules? The reason derives +from ISLE's evaluation process, which is somewhat different from a +vanilla "apply any applicable rule until fixpoint" approach. In +particular, the matching for any particular "root term" (at the root +of the input tree) can be fully computed as an automaton that combines +all patterns rooted with that symbol; and when we use another internal +term in an expression, we can immediately inline any rules that would +apply to *that* term. (More details are given below.) So the matching +process is a "push" rather than "pull" process: we invoke generated +matching code with a term, and it matches (via extractor calls) until +it produces a final output (via constructor calls). + +Because of this, the matching must be started (the initial "push") by +invoking an entry point corresponding to a particular root term, with +arguments for that term. This is the reason that the ISLE DSL code +needs to export a term that has been defined with rules as an "entry +point". To do so, we write: + +```plain + + ;; We have an internal term `LowerInst` that, when wrapping + ;; an `Inst` term, matches a list of rewrite rules that perform + ;; the lowering. In other words, we rewrite + ;; + ;; (LowerInst (Iadd ...)) + ;; to + ;; (X86Add ...) + ;; or similar. + ;; + ;; This is a little different than a straight rewrite from + ;; `Iadd` to `X86Add`, but only superficially; it serves to + ;; give us one term at the root to which we can anchor + ;; all of our rules. + + (decl (LowerInst (Inst) MachineInst) + + (rule (LowerInst (Iadd ...) (X86Add ...))) + + (export LowerInst "backend_lower") +``` + +This will generate a callable entry point with a Rust function +signature as follows: + +```rust + + fn backend_lower(context: &mut C, inst: &Inst) -> bool; +``` + +that (i) produces the final output by invoking the declared +constructors with the given `context`, and (ii) returns a boolean +indicating whether the toplevel rule matched (because rules can be +partial; no argument-coverage checking is done). + +# Evaluation Semantics + +The evaluation semantics of ISLE are carefully defined to (i) ensure +that the effects of rules are simple to understand, (ii) ensure an +efficient compilation of the matching rules is possible, and (iii) +reserve enough freedom to consumers of the DSL that rules can be +flexibly used in many ways. + +The overall evaluation process is driven by a toplevel call into an +entry point, defined as above, and proceeds in two phases: the +"pattern-match" phase and the "expression-build" phase. + +When evaluating a pattern, we always know (i) the root term's symbol +and (ii) the value (of some type in ISLE's type system) of the root +term's expression. At the entry point, these are explicitly given (the +entry-point term is fixed, and the values of arguments are given). For +sub-matching, we will know these by the time we recursively match. + +If the root term's symbol is `extern`, and has an extractor, we invoke +the extractor with the given value, receiving the argument values in +return. If the extractor fails the match, we return to the most nested +choice-point (at which we have multiple applicable rules) and try the +next rule. + +Otherwise, if the root term is defined by ISLE-internal rules, we +first determine which rules are applicable: these are any with the +root term as the root of their pattern. Record this as a "choice +point" to which we can backtrack. Let us then arbitrarily pick one +according to some prioritization scheme. The semantics are defined +such that we try all applicable rules, in some arbitrary order, and +the first that matches "wins". If more than one would match, results +are determined only by the prioritization scheme. (Deterministic and +understandable control of this scheme is thus important to allow the +user to tweak matching behavior; see below for more.) For each +applicable rule, we try to match the sub-patterns against the known +argument values, recursing as appropriate. + +Note that there is a way to compile this recursive-backtracking match +behavior into a set of linearized match operator sequences (where each +operator is an extractor invocation or a constant-value match), which +can then be combined into more efficient code (e.g. by use of an +automaton abstraction). See below for more. + +When a rule's pattern (left-hand side) matches, we proceed to +evaluation of the right-hand side. This occurs in the usual +syntax-directed way: we evaluate argument values recursively, then +evaluate constructors with those argument values. Variables are bound +to the values encountered during pattern-matching. + +An important invariant during evaluation is that for any given rule, +the pattern-matching phase is fallible, with failure causing the +matching process to progress to the next rule; but the +expression-evaluating phase is *infallible*, and must complete once +started. This is important because constructors (on the right-hand +side) may have side-effects, e.g. allocating temporary registers for +use; we do not want to call into the embedder to produce the final +term until we know we have the correct rule to do so. This strict +phase separation allows for cleaner reasoning about semantics in +general. + +In all of the above, we have glossed over the *chaining* of several +rules. What happens when a pattern on the left-hand side refers to an +internal term (produced by other ISLE rules), or when an expression on +the right-hand side produces an internal term? + +The key to understanding this behavior is to see that if we inline and +optimize all invoked rules together, we can cancel paired +internal-term expressions that construct the terms and internal-term +patterns that match the terms. Two cases are relevant: + +- When during expression evaluation we see a constructor call to an + internal term, we immediately look up all applicable rules that + match on that term at the root. We can do a sort of "constant + propagation" wherein we symbolically match the patterns against the + constructed internal-term tree. This may statically determine which + rule we invoke and whose expression we can immediately inline. Or, + the pattern-matching may "reach into" arguments that are ultimately + bound from the entry point's original pattern, and so imply more + (fallible) matching-phase execution. A key restriction is that this + fallible matching must not depend on the return values of any + constructor call, to permit the phase separation to remain. + + An example of both cases follows: + +```plain + + (rule (A (SubTerm x)) (B x)) + (rule (A (MyExtractor y)) (C y)) + + ;; We can inline `A` straightforwardly here, with the `SubTerm` + ;; introduced and eliminated internally, never exposed to the + ;; outside world: + (rule (EntryPoint (D x)) (A (SubTerm x))) + + ;; Here, inlining `A` requires us to do a further fallible + ;; match on the input argument, invoking `MyExtractor` (an external + ;; extractor) on `x`: + (rule (EntryPoint x) (A x)) +``` + +- When during pattern-matching we see an internal term that does not + statically resolve, we need to run any defined rules "in reverse". + An example might serve to illustrate: + +```plain + (rule (A x 1) (B x)) + (rule (A x 2) (C x)) + + ;; The `(B x)` pattern, because `B` is an internal term and not + ;; an extractor, will search for any rule that produces `B` at the + ;; root, then substitute the pattern for that rule (here `(A x 1)`) + ;; at that point in the pattern. + (rule (EntryPoint (B x)) x) +``` + + In essence, one can see an internal term as being equivalent to (i) + an extractor and (ii) a constructor definition, each of whose bodies + is composed of logic collected from (i) all rules that produce this + internal term, and (ii) all rules that match on this internal term, + respectively. In fact, this is a valid (though possibly inefficient) + compilation strategy. + + Note that there is a slight asymmetry here: when running a rule "in + reverse" the dispatch is actually on the term that is produced (the + term at the root of the expression, or right-hand side), while + running a rule "forward" the dispatch is on the term that is matched + (the term at the root of the pattern, or left-hand side). The + opposite side of the rule actually does not need an internal term at + the root, and in fact when the inlining/recursion bottoms out, it + must be an external extractor or constructor. Hence when defining a + rule, we *may* use an extractor at the root, with the consequence + that the rule can only be evaluated in reverse (when its RHS is + invoked by the pattern of another rule), such as: + +```plain + + (decl extern (Extractor ...) ...) + (extractor Extractor "extractor") + + (decl (A ...) ...) + + (rule (Extractor a) (A a)) + + (decl (EntryPoint ...) ...) + + ;; This will inline `A` in the pattern-matching phase, running + ;; "in reverse", bottoming out in a call to `extractor`. + (rule (EntryPoint (A a)) a) +``` + +# Compilation Strategy + +In order to compile ISLE to Rust code, we perform several steps. + +## Single-Pass Elaboration + +A traditional term-rewriting system typically works in an iterative +fashion: rewrite rules are applied, editing the representation +(e.g. expression tree) in memory, until no more rules apply. This +*dynamic scheduling* of rewrite rules is costly, as is the actual +reification of the intermediate states. In contrast, we expect this +DSL to be compiled into a single-pass transform, and we add certain +restrictions to rules to enable this. + +The basic strategy is to "inline" the rule-matching that would apply +to the newly constructed value on the right-hand side of a rule. In +other words, if a rule transforms `(A x y)` to `(B x y)` and there is +another rule that matches on `(B x y)`, we generate code that chains +these two transforms together. We call this inlining "elaboration", in +analogy to a similar flattening step in other hierarchical-design DSLs +(in particular, hardware-description languages). + +To ensure that elaboration terminates, we disallow recursion in +rewrite rules. This implies a stratification of term constructors: +terms of one kind `A` can rewrite to `B`, but `B` cannot rewrite back +to `A` (or to another `B` expression). The backends will break +recursion that would otherwise occur in an isel context via external +constructors and references/names: e.g., "get input in reg" does not +immediately recursively lower, but just marks that instruction as used +and it will be lowered later in the scan. + +## Linearization into Match Operators + +Once we have completely elaborated a rule, we lower its actions into a +straight-line sequence of (i) match operators, in the matching phase, +and (ii) constructor calls, in the evaluation phase. + +Match operators consist of calls to external extractors, destructuring +of native Rust enum types, matches on constant values, and matches on +already-bound variable values. Each match operator may fail (causing +evaluation to proceed to another rule), or may succeed and optionally +bind variables. + +Once we have a linearized sequence of operators for a given root term, +we can combine these sequences into an automaton for that term. We +perform this lowering for the entry-point (and it will naturally +inline other rules as needed during elaboration). The combination of +linearized operator sequences into an automaton will rely on +Peepmatic's existing machinery. + +## Prioritization of Rules and Extractors + +We have so far not discussed how the matching phase chooses which rule +to try first. This ordering is decided by a set of priorities assigned +to (i) rules and (ii) extractors. + +Any rule for an internal term may have a priority attached, as follows: + +```plain + + (rule (prio 10) (A x) (B x)) +``` + +Likewise, any extractor may hvae a priority attached, as follows: + +```plain + + (extractor (prio 10) MyTerm "my_term") +``` + +When dispatching on a certain root term during pattern-matching, we +collect all priorities and sort in descending order. The default +priority is zero for all term matchers and -100 for wildcards (this +will naturally place more specific rules above more general ones in +priority order). If two or more options have the same priority, we +decide arbitrarily (but deterministically for a given version of the +ISLE compiler, to avoid non-reproducible build issues). + +# Discussion: Comparison with Hand-Written Backends + +We discussed in more depth in bytecodealliance/rfcs#13 how the DSL +approach in general compares to hand-written instruction lowering +code. A few points are notable with specific reference to the design +described in this RFC, however. + +First, the handwritten code in our existing `MachInst`-style backends +is fundamentally an imperative design: the machine-independent driver +invokes a backend function once per CLIF instruction to lower, and +this driver in turn queries the IR and eventually performs actions to +build the lowered code (allocate temps and emit machine instructions, +mainly). The earlier pre-RFC contrasted this approach with a +declarative DSL one, in which the execution semantics of the DSL were +not prescribed. Note, however, that this DSL design still takes +efficient compilation *into account*, even if it does not prescribe +the actual execution sequence. Specifically, the design of the +dispatch mechanism, in which the root term (determined first by the +entry point, then by matches on or constructions of internal terms) +statically dispatches to a set of rules that can be combined into an +efficient automaton, it enables a "single-pass" approach that is +distinct from most iterative term-rewriting systems. + +Second, the DSL encourages a mode of thinking about instructions (both +in the IR and at the machine level, in VCode) that is more +value-oriented. The `LowerCtx` API is designed to be a sort of general +query API, and instruction output registers are just attributes of an +instruction like any other. In contrast, the ISLE DSL design +privileges the notion of an instruction's "value". This makes the +notation more natural when expressing algebraic-style reductions, and +is consistent with many other instruction-selection systems. (As a +particular implementation note, multiple-output instructions should +work within this framework as well because one can treat the N-output +tuple as a value with constructors and extractors as necessary.) +However, ISLE does not go as far as an instruction-selector framework +that is explicitly aware of, e.g., SSA values: rather, that is up to +the constructors that have been defined. The ISLE system itself only +understands terms and trees of terms; this just happens to be a +natural abstraction with which to represent expression trees. + +# Discussion: Future Implications for Verification + +Though we have not yet worked out all the details, we are confident +that the translation of rules expressed in the ISLE DSL into some +machine-readable form for formal verification efforts should be +possible. This is primarily because of the "equivalent-value" +semantics that are inherent in a term-rewriting system. The +denotational value of a term is the symbolic or concrete value +produced by the instruction it represents (depending on the +interpretation); so "all" we have to do is to write, e.g., equations +for some SMT-solver or theorem-prover that describe the semantics of +instruction terms on either side of the translation. + +Note that while externally-defined extractors and constructors at +first glance may appear to make this more difficult, because they +define an "FFI" into imperative code, in actuality we can just treat +them as axiomatic building blocks. In essence, they are the pieces +that define the input and output instruction sets, and so are tied to +the definitions of the formal semantics for these two program +representations; we would start by formally describing the program +value represented by the result of an extractor on an instruction, +and/or the preconditions it implies, then carry through the +implications of this to intermediate internal terms and finally to the +tree of constructor calls that build the output instructions. + +# Example Sketch + + +``` + ;; --- "FFI" mapping of input instructions and lowering API --- + (type Inst extern (enum + (Add (a Input) (b Input)) + (Load (addr Input)) + (Const (val u64)) + ...) + + (type Reg primitive) + (type Insn primitive) + (type OwnedInsn primitive) + (type Value primitive) + (type usize primitive) + + ;; Extractor/constructor to go between an instruction reference and its produced value. + (decl InsnValue (Insn) Value) + (extractor InsnValue ...) + (constructor InsnValue ...) + ;; Extractor to get instruction that produces a value consumed *only* by + ;; the currently-lowering instruction (and nowhere else). + (decl OwnedInput (OwnedInsn) Value) + (extractor OwnedInput ...) + + (decl Opcode (Insn) Opcode) + (extractor Opcode ...) + + ;; TODO: build an `Add` convenience extractor with `Opcode`, `InsnValue`, + ;; and `InsnInput`. + + ;; --- x86 backend --- + + ; Note that while existing VCode instructions for x86 have explicit destination + ; registers, and largely have two-address form with "modify" semantics (mirroring + ; the actual machine instructions on x86), the terms produced by the x86 lowering + ; rules have two explicit inputs and an implicit output. The constructors can + ; insert the necessary move in a mechanical way so we can deal with purely + ; expression-tree-like representations here. + (type X86Inst + (Move (a Reg)) + (Add (a Reg) (b Reg)) + (AddMem (a Reg) (b X86AMode)) + (type X86AMode + (...)) + + ;; --- + + (decl LowerAMode (Input) X86AMode) + + ; Addressing mode: const + other + (rule (LowerAMode (InputInsn (Add (InputInsn (Const c)) other))) + (BasePlusOffset (InputToReg other) c)) + + ; Addressing mode: fallback + (rule (LowerAMode input) + (BaseReg (InputToReg input))) + + ; Main lowering entry point: this term reduces an `Inst` as an arg to an `X86Inst`. + (decl (Lower Inst) X86Inst) + + ; Add with constant on one input. + (rule (Lower (Add (FromInst (Const c)) b) out) + (Emit (Move out (InputReg b))) + (Emit (AddConst out c))) + + ; Add with load on one input. + ; Note the `FromInstOnlyUse` which ensures we match only if + ; this is the *sole* use. + (rule (Lower (Add (OwnedInsn (Load addr)) b)) + (AddMem (InputReg b) (LowerAMode addr))) + + ; Fallback for Add. + (rule (Lower (Add a b)) + (Add a b)) ;; lookup of constructor name is type-dependent -- + ;; here we get X86Inst::Add. +``` From df4ca982412bd6de66f6e011d77986b6a1c2511a Mon Sep 17 00:00:00 2001 From: Chris Fallin Date: Tue, 17 Aug 2021 11:51:55 -0700 Subject: [PATCH 2/4] Typo fix from fitzgen (thanks!). Co-authored-by: Nick Fitzgerald --- accepted/cranelift-isel-isle-peepmatic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/accepted/cranelift-isel-isle-peepmatic.md b/accepted/cranelift-isel-isle-peepmatic.md index 2618e21..005acd2 100644 --- a/accepted/cranelift-isel-isle-peepmatic.md +++ b/accepted/cranelift-isel-isle-peepmatic.md @@ -370,7 +370,7 @@ Similarly, a constructor can be attached to a term as follows: (constructor concat "concat") ``` -which implies the existence of a Rust functioN: +which implies the existence of a Rust function: ```rust From c171cc67b2e6003dbfc69f2127920ecfbca5b875 Mon Sep 17 00:00:00 2001 From: Chris Fallin Date: Wed, 18 Aug 2021 22:43:48 -0700 Subject: [PATCH 3/4] Edit pass; many clarifications. --- accepted/cranelift-isel-isle-peepmatic.md | 360 +++++++++++++--------- 1 file changed, 206 insertions(+), 154 deletions(-) diff --git a/accepted/cranelift-isel-isle-peepmatic.md b/accepted/cranelift-isel-isle-peepmatic.md index 005acd2..66ac653 100644 --- a/accepted/cranelift-isel-isle-peepmatic.md +++ b/accepted/cranelift-isel-isle-peepmatic.md @@ -17,7 +17,7 @@ above, are: - Expression/value-based design with term-rewriting-like semantics. - Interaction with existing system, including rich query APIs, via - "virtual nodes" defined by external extractor functions. + "virtual nodes" defined by external extractor functions. - Typed terms. - Rule application order guided by explicit priorities, but semantically undefined (hence flexible). @@ -41,30 +41,45 @@ term view of the IR on the input side and machine instructions on the output side, such that a term represents an operator that consumes values and produces a value, then the rewrite rules just define *equivalences*. These rules are easier to validate than other sorts of -lowering logic because "all" one has to do is to show that two values -are equivalent. - -Most importantly, this foundational concept provides a *simple, -understandable semantics*. The importance of this cannot be -overstated: if the core of the DSL is complex because its design is -intertwined with a particular backend design or strategy, or other -incidental details, then one will not be able to use it effectively -without understanding implementation details. On the other hand, if we -have a clean rewrite-based semantics, one can easily convince oneself -that a new rule is safe to add as long as it denotes a true -equivalence. This is similar to how "equational reasoning", in -functional programming terms, makes a program's semantics easier to -understand (i.e., one can always substitute a function for its -definition). - -The order in which rewrite rules will be applied can be controlled by -user-specified priorities (attached to extractors; see -below). However, the language definition is careful to *not* specify -which rule must be applied first. This leaves room for the DSL -use-case to apply rules differently: for example, we might test/fuzz a -backend by applying rules in a different order. One should also be -free to tweak priorities to attempt to optimize, without fear of -breaking the semantics. +compiler-backend implementations because "all" one has to do is to +show that the two sides of the equivalence are equal, separately for +each rule (i.e., the validation is modular). + +An example of a rewrite rule might be: + +```plain + + (rule (Iadd a b) + (X86Add a b)) +``` + +which means that an `Iadd` "term" with two arguments `a` and `b` (the +"left-hand side" or "pattern" of the rule) can be written to an +`X86Add` term with the same arguments (the "right-hand side" or +"expression"). One could imagine many such patterns for particular +combinations that the target machine supports. + +The concept of a rewrite rule, transforming one term into another +equivalent one, provides a *simple, understandable semantics*. The +importance of this cannot be overstated: if the core of the DSL is +complex because its design is intertwined with a particular backend +design or strategy, or other incidental details, then one will not be +able to use it effectively without understanding implementation +details. On the other hand, if we have a clean rewrite-based +semantics, one can easily convince oneself that a new rule is safe to +add as long as it denotes a true equivalence. This is similar to how +"equational reasoning", in functional programming terms (i.e., that +one can always substitute a function invocation with its definition), +makes a program's semantics easier to understand. + +The order in which rewrite rules will be applied can be influenced by +user-specified priorities. However, the language definition is careful +to *not* specify which rule must be applied first; priorities are just +heuristics, or hints. This leaves room for the DSL use-case to apply +rules differently: for example, we might test/fuzz a backend by +applying rules in a different order. One should also be free to tweak +priorities to attempt to optimize, without fear of breaking the +semantics. Note that the ordering is only undefined when there are multiple legal choices according to the types and available rules. In other words, @@ -75,8 +90,7 @@ lowerable with any rule ordering heuristic. # Extractors: Programmable Matching on Virtual Nodes The first key innovation that this DSL introduces is in how it allows -patterns, or *left-hand sides* of rewrite rules, to *match* on input -terms. +patterns to *match* on input terms. A vanilla term-rewriting system operates on trees of nodes and transforms a single input tree (which is usually reified as an actual @@ -88,9 +102,9 @@ value, and it is difficult to arrange the terms into a tree and then write the pattern-matching rules in a way that is flexible enough to support multiple types of queries. -### Motivation +## Motivation -For a concrete example, consider if we had a term type representing a +For a concrete example, let us assume we a term type representing a constant integer value ```plain @@ -107,12 +121,15 @@ term-rewriting system, with helper functions defined appropriately, we might have rules like ```plain - (=> (iconst val) - (when (is-aarch64-imm12-value val)) - (aarch64-imm12 val)) - (=> (iconst val) - (when (is-aarch64-logical-imm-value val)) - (aarch64-logical-imm val)) + ;; 12-bit immediate form. + (rule (iconst val) + (when (is-aarch64-imm12-value val)) ;; Conditional predicate + ;; that evaluates custom logic. + (aarch64-imm12 val)) + ;; Logical immediate form. + (rule (iconst val) + (when (is-aarch64-logical-imm-value val)) + (aarch64-logical-imm val)) ``` which turns the `iconst` into an `aarch64-imm12` node and/or an @@ -122,26 +139,26 @@ have rules that pattern-match on `aarch64-imm12` or instructions: ```plain - (=> (iadd ra imm@(aarch64-imm12 val)) ;; bind `imm` to subterm - (aarch64-add-reg-imm ra imm)) - (=> (ior ra imm@(aarch64-logical-imm val)) ;; bind `imm` to subterm - (aarch64-or-reg-imm imm)) + (rule (iadd ra imm@(aarch64-imm12 val)) ;; bind `imm` to subterm + (aarch64-add-reg-imm ra imm)) + (rule (ior ra imm@(aarch64-logical-imm val)) ;; bind `imm` to subterm + (aarch64-or-reg-imm imm)) ``` The question then becomes: how do we build a lowering engine that can successfully use these rules? If we start with the semantics of "apply any rule that fits", we might have a situation where an immediate value (say, `1`) can be represented in *multiple* forms, so it might -be nondeterministically rewritten to either of the above terms. +be nondeterministically rewritten to either of the above terms. This +is fundamentally a *search problem*: we need to know which rewrite +path to take on the immediate term before we see its use. -What we want instead is for a directed search of sorts: starting from +What we want instead is a goal-directed search of sorts: starting from the top-level opcode match (`iadd` or `ior`), we try to rewrite into only the appropriate immediate form. This is what the equivalent -handwritten backend code would do. More generally, we don't want to -require a *general graph-search problem*, where the backend has to -find a path that connects through the graph of possible rewrite states -from beginning to end. Rather, we want something that can be executed -in a more directed form. +handwritten backend code would do. This avoids the general +graph-search problem, instead allowing a more directed form of +execution. ## Extractor (Left-hand Side) Functions @@ -163,7 +180,7 @@ or fails to match. In other words, it is exactly the *reverse* of a normal function application -- for the type ``` - (declare-extractor aarch64-imm12 (AArch64Imm12) IConst) + (decl aarch64-imm12 (AArch64Imm12) IConst) ``` the extractor is a (partial) function *from* the `IConst` type (what @@ -172,9 +189,11 @@ would be the "return type" for an ordinary function) to the extract one value into multiple values, just as a forward function can have multiple arguments. -This feature was largely inspired by the Scala `unapply` function -which can be defined on classes, as the dual of `apply`, to allow for -programmable `match` semantics in a very similar way. +This language concept was largely inspired by the Scala [extractor +object](https://docs.scala-lang.org/tour/extractor-objects.html) +feature, which allows an `unapply` function to be defined on classes +(as the dual of `apply`) to allow for programmable `match` semantics +in a very similar way. ## Equivalence to Base Term-Rewriting Forms @@ -192,11 +211,12 @@ The next conceptual leap is that the extractor functions can serve as the *sole* way to access the input to the term-rewriting system if we allow them to be *externally-defined*. In other words, the system does not need to have a native data structure or abstraction for a tree of -terms/nodes. Rather, it can have *only* extractor functions that lower -to calls to external Rust functions. Every "match" operation in the -left-hand side of a pattern is a programmable operation that bottoms -out in "runtime" code of some sort, and the DSL compiler does not need -to have any application-specific implementation of match logic. (As an +terms/nodes at the input, with enumerable contents. Rather, it can +have *only* extractor functions that lower to calls to external Rust +functions. Every "match" operation in the left-hand side of a pattern +is a programmable operation that bottoms out in "runtime" code of some +sort, and the DSL compiler does not need to have any +application-specific implementation of match logic. (As an optimization, we allow constant integer/enum value patterns directly, but these could be custom extractor functions with no fundamental loss of expressivity.) @@ -215,29 +235,35 @@ pure "output is a tree of Rust enums" design would allow: for example, one can have a `(new-temp ty)` constructor that can be bound to a variable (see `(let ...)` form below) and used in the output tree, which in the generated Rust code becomes a call to -e.g. `LowerCtx::alloc_tmp()`. +e.g. `LowerCtx::alloc_tmp()`. More generally, this means that +constructors return values that are arbitrary, not just the +unevaluated literal term tree; e.g. `new-temp` could return a value of +type `Reg` that is actually a virtual register. So in a sense this +folds the code-editing/generating actions that the backend would +perform with the final rewritten term tree into the production of the +term tree itself. -## Virtual Internal Term Types +## Virtual Internal Terms So far above, we have only discussed matching on the input tree (via extractors) and producing the output tree (via constructors). However, if a chain of rules applies in succession, with a rule matching on a term that was produced by a prior rule application, we ideally should -allow the rules to "communicate" directly without calling out to -external constructors and extractors. Below we describe the -"single-pass" compilation strategy; for now we just note that in -addition to the above mechanisms, we allow for "virtual" terms that -are never actually constructed but only connect rules together. +allow the rules to chain directly without calling out to external +constructors and extractors. Below we describe the "single-pass" +compilation strategy; for now we just note that in addition to the +above mechanisms, we allow for "virtual" terms that are never actually +constructed but only connect rules together. # Heterogeneous Term Types -A traditional term-rewriting system has one "type", the expression -tree. However, when building instruction lowering, there are often -pieces of an instruction, or intermediate knowledge constructed about -a certain subexpression, that *should* have a more specific type. One -could imagine a series of rewrite rules that transform the address -input to a load instruction into an `(Amode ...)` term that encodes a -machine-specific addressing mode; but this is not a general +A traditional term-rewriting system has one "type": the expression +tree node. However, when performing instruction lowering, there are +often pieces of an instruction, or intermediate knowledge constructed +about a certain subexpression, that *should* have a more specific +type. One could imagine a series of rewrite rules that transform the +address input to a load instruction into an `(Amode ...)` term that +encodes a machine-specific addressing mode; but this is not a general instruction, and should not be interchangeable with others. At an even more basic level, a set of lowering rules from a machine-independent IR to a machine instruction set deals with two disjoint sets of node @@ -410,10 +436,10 @@ the left-hand side `pattern` is: | constant-integer | constant-enum | (enum-variant pattern*) - | variable @ pattern - | `=` variable ;; Match same value as variable bound - ;; earlier in the pattern. - | (and pattern*) ;; Match with multiple patterns. + | variable @ pattern ;; Match, and also bind this subtree. + | variable ;; Bind any subtree (first occurrence) or match + ;; same value as variable bound earlier. + | (and pattern*) ;; Match all subpatterns. | `_` ;; Wildcard. ``` @@ -431,13 +457,9 @@ and the right-hand side `expr` is: ``` The evaluation semantics are described in more detail below. Note that -while the invocations of extractors (in a pattern context) and -constructors (in an expression context), builtin enum destructuring -and building, and binding and uses of variables all behave in -relatively straightforward ways as described above, the use of terms -defined by other rules, in both patterns and in expressions, is -somewhat subtle and will be discussed further under "Single-Pass -Elaboration" below. +the use of terms defined by other rules, in both patterns and in +expressions, is somewhat subtle and will be discussed further under +"Single-Pass Elaboration" below. ## Exporting an Entry Point @@ -447,17 +469,18 @@ can start the rule-matching process. This may seem a bit odd at first: isn't the concept of an "entry point" somewhat tied to an imperative view of the world, as opposed to -a list of declarative expression-equivalence rules? The reason derives -from ISLE's evaluation process, which is somewhat different from a -vanilla "apply any applicable rule until fixpoint" approach. In -particular, the matching for any particular "root term" (at the root -of the input tree) can be fully computed as an automaton that combines -all patterns rooted with that symbol; and when we use another internal -term in an expression, we can immediately inline any rules that would -apply to *that* term. (More details are given below.) So the matching -process is a "push" rather than "pull" process: we invoke generated -matching code with a term, and it matches (via extractor calls) until -it produces a final output (via constructor calls). +a list of declarative expression-equivalence rules? The need for this +designation lies in ISLE's evaluation process, which is somewhat +different from a vanilla "apply any applicable rule until fixpoint" +approach. In particular, the matching procedure for any particular +"root term" (at the root of the input tree) can be statically built as +an automaton that combines all patterns rooted with that symbol; and +when we use another internal term in an expression, we can immediately +inline any rules that would apply to *that* term. (More details are +given below.) So the matching process is a "push" rather than "pull" +process: we invoke generated matching code with a term, and it matches +(via extractor calls) until it produces a final output (via +constructor calls). Because of this, the matching must be started (the initial "push") by invoking an entry point corresponding to a particular root term, with @@ -509,15 +532,16 @@ efficient compilation of the matching rules is possible, and (iii) reserve enough freedom to consumers of the DSL that rules can be flexibly used in many ways. -The overall evaluation process is driven by a toplevel call into an -entry point, defined as above, and proceeds in two phases: the -"pattern-match" phase and the "expression-build" phase. +Evaluation is initiated by a toplevel call into an entry point, +defined as above, and proceeds in two phases: the "pattern-match" +phase and the "expression-build" phase. When evaluating a pattern, we always know (i) the root term's symbol and (ii) the value (of some type in ISLE's type system) of the root term's expression. At the entry point, these are explicitly given (the -entry-point term is fixed, and the values of arguments are given). For -sub-matching, we will know these by the time we recursively match. +entry-point term is fixed, and the values of arguments are given as +arguments to the entry point). Then, as execution progresses and we +match on subterms, we have their root symbols and values as well. If the root term's symbol is `extern`, and has an extractor, we invoke the extractor with the given value, receiving the argument values in @@ -529,20 +553,18 @@ Otherwise, if the root term is defined by ISLE-internal rules, we first determine which rules are applicable: these are any with the root term as the root of their pattern. Record this as a "choice point" to which we can backtrack. Let us then arbitrarily pick one -according to some prioritization scheme. The semantics are defined -such that we try all applicable rules, in some arbitrary order, and -the first that matches "wins". If more than one would match, results -are determined only by the prioritization scheme. (Deterministic and -understandable control of this scheme is thus important to allow the -user to tweak matching behavior; see below for more.) For each +rule according to some prioritization scheme. The semantics are +defined such that we try all applicable rules, in some arbitrary +order, and the first that matches wins. If more than one would match, +results are determined only by the prioritization scheme. For each applicable rule, we try to match the sub-patterns against the known argument values, recursing as appropriate. Note that there is a way to compile this recursive-backtracking match -behavior into a set of linearized match operator sequences (where each -operator is an extractor invocation or a constant-value match), which -can then be combined into more efficient code (e.g. by use of an -automaton abstraction). See below for more. +behavior into a set of linearized match operator sequences, where each +operator is an extractor invocation or a constant-value match. This +can then be combined into more efficient code, e.g. by use of an +automaton abstraction. See below for more. When a rule's pattern (left-hand side) matches, we proceed to evaluation of the right-hand side. This occurs in the usual @@ -577,13 +599,14 @@ patterns that match the terms. Two cases are relevant: propagation" wherein we symbolically match the patterns against the constructed internal-term tree. This may statically determine which rule we invoke and whose expression we can immediately inline. Or, - the pattern-matching may "reach into" arguments that are ultimately - bound from the entry point's original pattern, and so imply more - (fallible) matching-phase execution. A key restriction is that this - fallible matching must not depend on the return values of any - constructor call, to permit the phase separation to remain. + multiple rules may potentially match, and so we need to attempt + fallible matching within an inner backtracking scope. + + Note that though this inlining can result in further internal + backtracking, it will not cause the calling rule to fail; the rule + is committed once we reach the right-hand side. - An example of both cases follows: + An example of several cases follows: ```plain @@ -623,18 +646,18 @@ patterns that match the terms. Two cases are relevant: respectively. In fact, this is a valid (though possibly inefficient) compilation strategy. - Note that there is a slight asymmetry here: when running a rule "in - reverse" the dispatch is actually on the term that is produced (the - term at the root of the expression, or right-hand side), while - running a rule "forward" the dispatch is on the term that is matched - (the term at the root of the pattern, or left-hand side). The - opposite side of the rule actually does not need an internal term at - the root, and in fact when the inlining/recursion bottoms out, it - must be an external extractor or constructor. Hence when defining a - rule, we *may* use an extractor at the root, with the consequence - that the rule can only be evaluated in reverse (when its RHS is - invoked by the pattern of another rule), such as: - +The above implies that defining a rule with `(rule ...)` may attach an +"internal extractor" to an internal term, if one is produced at the +root on the right-hand side, and/or an "internal constructor" to an +internal term, if one is matched at the root on the left-hand side. If +the left-hand side pattern's *root* symbol is an externally-defined +extractor, then this means that it will only ever be evaluated in then +reverse direction. Similarly, if the right-hand side expression's +*root* symbol is an externally-defined constructor, then this means it +will only ever be evaluated in the forward direction. An example of +the former -- a rule that has an external extractor at the pattern +root root -- follows: + ```plain (decl extern (Extractor ...) ...) @@ -669,9 +692,11 @@ The basic strategy is to "inline" the rule-matching that would apply to the newly constructed value on the right-hand side of a rule. In other words, if a rule transforms `(A x y)` to `(B x y)` and there is another rule that matches on `(B x y)`, we generate code that chains -these two transforms together. We call this inlining "elaboration", in -analogy to a similar flattening step in other hierarchical-design DSLs -(in particular, hardware-description languages). +these two transforms together, effectively inlining the `(A x y)` +extractor on the left-hand side of the latter rule. We call this +inlining "elaboration", in analogy to a similar flattening step in +other hierarchical-design DSLs (in particular, hardware-description +languages). To ensure that elaboration terminates, we disallow recursion in rewrite rules. This implies a stratification of term constructors: @@ -682,6 +707,12 @@ constructors and references/names: e.g., "get input in reg" does not immediately recursively lower, but just marks that instruction as used and it will be lowered later in the scan. +It is an open question whether limited recursion could be allowed +either by (i) statically bounding the unrolling depth, or (ii) +requiring some sort of substructural recursion, in the same way that +some "total" functional languages used in theorem-proving (e.g. Coq) +ensure termination. + ## Linearization into Match Operators Once we have completely elaborated a rule, we lower its actions into a @@ -714,7 +745,7 @@ Any rule for an internal term may have a priority attached, as follows: (rule (prio 10) (A x) (B x)) ``` -Likewise, any extractor may hvae a priority attached, as follows: +Likewise, any extractor may have a priority attached, as follows: ```plain @@ -756,7 +787,10 @@ Second, the DSL encourages a mode of thinking about instructions (both in the IR and at the machine level, in VCode) that is more value-oriented. The `LowerCtx` API is designed to be a sort of general query API, and instruction output registers are just attributes of an -instruction like any other. In contrast, the ISLE DSL design +instruction like any other. Similarly, allocating temporary registers +and emitting instructions are just imperative actions taken by +lowering code; there is no notion of the emitted instructions being +"equal" to values on the input side. In contrast, the ISLE DSL design privileges the notion of an instruction's "value". This makes the notation more natural when expressing algebraic-style reductions, and is consistent with many other instruction-selection systems. (As a @@ -778,9 +812,10 @@ possible. This is primarily because of the "equivalent-value" semantics that are inherent in a term-rewriting system. The denotational value of a term is the symbolic or concrete value produced by the instruction it represents (depending on the -interpretation); so "all" we have to do is to write, e.g., equations -for some SMT-solver or theorem-prover that describe the semantics of -instruction terms on either side of the translation. +interpretation); so "all" we have to do is to write, e.g., +pre/post-conditions for some SMT-solver or theorem-prover that +describe the semantics of instruction terms on either side of the +translation. Note that while externally-defined extractors and constructors at first glance may appear to make this more difficult, because they @@ -805,27 +840,43 @@ tree of constructor calls that build the output instructions. (Const (val u64)) ...) - (type Reg primitive) - (type Insn primitive) - (type OwnedInsn primitive) - (type Value primitive) + (type Reg primitive) ;; virtual register number in output + (type Insn primitive) ;; instruction ID + (type OwnedInsn primitive) ;; instruction ID; type indicates we are the only user + (type Value primitive) ;; SSA value number in input (type usize primitive) ;; Extractor/constructor to go between an instruction reference and its produced value. + ;; We can use `InsnValue` as an extractor to go from `Value` arguments to the producing + ;; instructions, walking "up the operand tree" as needed to match trees of instructions. (decl InsnValue (Insn) Value) (extractor InsnValue ...) (constructor InsnValue ...) + + ;; Constructor to indicate that a value should be lowered into a register. + (decl ValueReg (Value) Reg) + (constructor ValueReg ...) + ;; Extractor to get instruction that produces a value consumed *only* by ;; the currently-lowering instruction (and nowhere else). - (decl OwnedInput (OwnedInsn) Value) - (extractor OwnedInput ...) + (decl OwnedValue (OwnedInsn) Value) + (extractor OwnedValue ...) - (decl Opcode (Insn) Opcode) + ;; Extractor that takes an instruction ID and provides its opcode. + (decl Opcode (Opcode) Insn) (extractor Opcode ...) - - ;; TODO: build an `Add` convenience extractor with `Opcode`, `InsnValue`, - ;; and `InsnInput`. + ;; Convenience extractors/matchers for each defined instruction. + ;; These could be generated automatically in a build.rs step + ;; and then included in a prelude. + (decl Iadd (Value Value) Insn) + (rule (and (Opcode Iadd) (InsnInput 0 a) (InsnInput 1 b)) + (Iadd a b)) + (decl Iconst (u64) Insn) + (rule (and (Opcode Iconst) (InsnImmediate 0 imm)) + (Iconst imm)) + ; ... + ;; --- x86 backend --- ; Note that while existing VCode instructions for x86 have explicit destination @@ -838,6 +889,7 @@ tree of constructor calls that build the output instructions. (Move (a Reg)) (Add (a Reg) (b Reg)) (AddMem (a Reg) (b X86AMode)) + (AddConst (a Reg) (b u32)) (type X86AMode (...)) @@ -846,29 +898,29 @@ tree of constructor calls that build the output instructions. (decl LowerAMode (Input) X86AMode) ; Addressing mode: const + other - (rule (LowerAMode (InputInsn (Add (InputInsn (Const c)) other))) - (BasePlusOffset (InputToReg other) c)) + (rule (LowerAMode (InsnValue (Iadd (InsnValue (Iconst c)) other))) + (BasePlusOffset (ValueReg other) c)) ; Addressing mode: fallback (rule (LowerAMode input) - (BaseReg (InputToReg input))) + (BaseReg (ValueReg input))) ; Main lowering entry point: this term reduces an `Inst` as an arg to an `X86Inst`. (decl (Lower Inst) X86Inst) - ; Add with constant on one input. - (rule (Lower (Add (FromInst (Const c)) b) out) - (Emit (Move out (InputReg b))) - (Emit (AddConst out c))) + ; Add with constant on one input. `I32Value` extractor matches a `u64` + ; immediate that fits in an `i32` (signed 32-bit immediate field). + (rule (Lower (Iadd (InsnValue (I32Value (Iconst c))) b)) + (AddConst (ValueReg b) c)) ; Add with load on one input. ; Note the `FromInstOnlyUse` which ensures we match only if ; this is the *sole* use. - (rule (Lower (Add (OwnedInsn (Load addr)) b)) - (AddMem (InputReg b) (LowerAMode addr))) + (rule (Lower (Iadd (OwnedValue (Iload addr)) b)) + (AddMem (ValueReg b) (LowerAMode addr))) ; Fallback for Add. - (rule (Lower (Add a b)) - (Add a b)) ;; lookup of constructor name is type-dependent -- - ;; here we get X86Inst::Add. + (rule (Lower (Iadd a b)) + (Add (ValueReg a) (ValueReg b))) ;; lookup of constructor name is type-dependent -- + ;; here we get X86Inst::Add. ``` From 6c64ea46b8b7d28263b6578a73dcb305b9c4de25 Mon Sep 17 00:00:00 2001 From: Chris Fallin Date: Thu, 19 Aug 2021 12:33:48 -0700 Subject: [PATCH 4/4] Typo fix from afonso360 (thanks!). Co-authored-by: Afonso Bordado --- accepted/cranelift-isel-isle-peepmatic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/accepted/cranelift-isel-isle-peepmatic.md b/accepted/cranelift-isel-isle-peepmatic.md index 66ac653..03ca819 100644 --- a/accepted/cranelift-isel-isle-peepmatic.md +++ b/accepted/cranelift-isel-isle-peepmatic.md @@ -104,7 +104,7 @@ support multiple types of queries. ## Motivation -For a concrete example, let us assume we a term type representing a +For a concrete example, let us assume we have a term type representing a constant integer value ```plain