Since the RFC to use this DSL in Cranelift was accepted, we have merged the ISLE compiler into the Wasmtime/Cranelift repository. This repository's commit history was incorporated as part of that PR. It is now a read-only archive. Thank you for visiting and please send questions, comments, concerns, and suggestions for bikeshed colors 👉over that way👉!
ISLE is a DSL that allows one to write instruction-lowering rules for a compiler backend. It is based on a "term-rewriting" paradigm in which the input -- some sort of compiler IR -- is, conceptually, a tree of terms, and we have a set of rewrite rules that turn this into another tree of terms.
This repository contains a prototype meta-compiler that compiles ISLE rules down to an instruction selector implementation in generated Rust code. The generated code operates efficiently in a single pass over the input, and merges all rules into a decision tree, sharing work where possible, while respecting user-configurable priorities on each rule.
The ISLE language is designed so that the rules can both be compiled into an efficient compiler backend and can be used in formal reasoning about the compiler. The compiler in this repository implements the former. The latter use-case is future work and outside the scope of this prototype, but at a high level, the rules can be seen as simple equivalences between values in two languages, and so should be translatable to formal constraints or other logical specification languages.
Some more details and motivation are in BA RFC #15; additional documentation will eventually be added to carefully specify the language semantics.
Build islec
, the ISLE compiler:
$ cargo build --release
Compile a .isle
source file into Rust code:
$ target/release/islec -i isle_examples/test.isle -o isle_examples/test.rs
Include that Rust code in your crate and compile it:
$ rustc isle_examples/test_main.rs
This tutorial walks through defining an instruction selection and lowering pass for a simple, RISC-y, high-level IR down to low-level, CISC-y machine instructions. It is intentionally somewhat similar to CLIF to MachInst lowering, although it restricts the input and output languages to only adds, loads, and constants so that we can focus on ISLE itself.
The full ISLE source code for this tutorial is available at
isle_examples/tutorial.isle
.
The ISLE language is based around rules for translating a term (i.e. expression) into another term. Terms are typed, so before we can write rules for translating some type of term into another type of term, we have to define those types:
;; Declare that we are using the `i32` primitive type from Rust.
(type i32 (primitive i32))
;; Our high-level, RISC-y input IR.
(type HighLevelInst
(enum (Add (a Value) (b Value))
(Load (addr Value))
(Const (c i32))))
;; A value in our high-level IR is a Rust `Copy` type. Values are either defined
;; by an instruction, or are a basic block argument.
(type Value (primitive Value))
;; Our low-level, CISC-y machine instructions.
(type LowLevelInst
(enum (Add (mode AddrMode))
(Load (offset i32) (addr Reg))
(Const (c i32))))
;; Different kinds of addressing modes for operands to our low-level machine
;; instructions.
(type AddrMode
(enum
;; Both operands in registers.
(RegReg (a Reg) (b Reg))
;; The destination/first operand is a register; the second operand is in
;; memory at `[b + offset]`.
(RegMem (a Reg) (b Reg) (offset i32))
;; The destination/first operand is a register, second operand is an
;; immediate.
(RegImm (a Reg) (imm i32))))
;; The register type is a Rust `Copy` type.
(type Reg (primitive Reg))
Now we can start writing some basic lowering rules! We declare the top-level
lowering function (a "constructor term" in ISLE terminology) and attach rules to
it. The simplest case is matching a high-level Const
instruction and lowering
that to a low-level Const
instruction, since there isn't any translation we
really have to do.
;; Declare our top-level lowering function. We will attach rules to this
;; declaration for lowering various patterns of `HighLevelInst` inputs.
(decl lower (HighLevelInst) LowLevelInst)
;; Simple rule for lowering constants.
(rule (lower (HighLevelInst.Const c))
(LowLevelInst.Const c))
Each rule has the form (rule <left-hand side> <right-hand-side>)
. The
left-hand side (LHS) is a pattern and the right-hand side (RHS) is an
expression. When the LHS pattern matches the input, then we evaluate the RHS
expression. The LHS pattern can bind variables from the input that are then
available in the right-hand side. For example, in our Const
-lowering rule, the
variable c
is bound from the LHS and then reused in the RHS.
Now we can compile this code by running
$ islec isle_examples/tutorial.isle
and we'll get the following output (ignoring any minor code generation changes in the future):
// GENERATED BY ISLE. DO NOT EDIT!
//
// Generated automatically from the instruction-selection DSL code in:
// - isle_examples/tutorial.isle
// [Type and `Context` definitions removed for brevity...]
// Generated as internal constructor for term lower.
pub fn constructor_lower<C: Context>(ctx: &mut C, arg0: &HighLevelInst) -> Option<LowLevelInst> {
let pattern0_0 = arg0;
if let &HighLevelInst::Const { c: pattern1_0 } = pattern0_0 {
// Rule at isle_examples/tutorial.isle line 45.
let expr0_0 = LowLevelInst::Const {
c: pattern1_0,
};
return Some(expr0_0);
}
return None;
}
There are a few things to notice about this generated Rust code:
-
The
lower
constructor term becomes theconstructor_lower
function in the generated code. -
The function returns a value of type
Option<LowLevelInst>
and returnsNone
when it doesn't know how to lower an inputHighLevelInst
. This is useful for incrementally porting hand-written lowering code to ISLE. -
There is a helpful comment documenting where in the ISLE source code a rule was defined. The goal is to ISLE more transparent and less magical.
-
The code is parameterized by a type that implements a
Context
trait. Implementing this trait is how you glue the generated code into your compiler. Right now this is an empty trait; more onContext
later. -
Lastly, and most importantly, this generated Rust code is basically what we would have written by hand to do the same thing, other than things like variable names. It checks if the input is a
Const
, and if so, translates it into aLowLevelInst::Const
.
Okay, one rule isn't very impressive, but in order to start writing more rules
we need to be able to put the result of a lowered instruction into a Reg
. This
might internally have to do arbitrary things like update use counts or anything
else that Cranelift's existing LowerCtx::put_input_in_reg
does for different
target architectures. To allow for plugging in this kind of arbitrary logic,
ISLE supports external constructors. These end up as methods of the Context
trait in the generated Rust code, and you can implement them however you want
with custom Rust code.
Here is how we declare an external helper to put a value into a register:
;; Declare an external constructor that puts a high-level `Value` into a
;; low-level `Reg`.
(decl put_in_reg (Value) Reg)
(extern constructor put_in_reg put_in_reg)
If we rerun islec
on our ISLE source, instead of an empty Context
trait, now
we will get this trait definition:
pub trait Context {
fn put_in_reg(&mut self, arg0: Value) -> (Reg,);
}
With the put_in_reg
helper available, we can define rules for lowering loads
and adds:
;; Simple rule for lowering adds.
(rule (lower (HighLevelInst.Add a b))
(LowLevelInst.Add
(AddrMode.RegReg (put_in_reg a) (put_in_reg b))))
;; Simple rule for lowering loads.
(rule (lower (HighLevelInst.Load addr))
(LowLevelInst.Load 0 (put_in_reg addr)))
If we compile our ISLE source into Rust code once again, the generated code for
lower
now looks like this:
// Generated as internal constructor for term lower.
pub fn constructor_lower<C: Context>(ctx: &mut C, arg0: &HighLevelInst) -> Option<LowLevelInst> {
let pattern0_0 = arg0;
match pattern0_0 {
&HighLevelInst::Const { c: pattern1_0 } => {
// Rule at isle_examples/tutorial.isle line 45.
let expr0_0 = LowLevelInst::Const {
c: pattern1_0,
};
return Some(expr0_0);
}
&HighLevelInst::Load { addr: pattern1_0 } => {
// Rule at isle_examples/tutorial.isle line 59.
let expr0_0: i32 = 0;
let expr1_0 = C::put_in_reg(ctx, pattern1_0);
let expr2_0 = LowLevelInst::Load {
offset: expr0_0,
addr: expr1_0,
};
return Some(expr2_0);
}
&HighLevelInst::Add { a: pattern1_0, b: pattern1_1 } => {
// Rule at isle_examples/tutorial.isle line 54.
let expr0_0 = C::put_in_reg(ctx, pattern1_0);
let expr1_0 = C::put_in_reg(ctx, pattern1_1);
let expr2_0 = AddrMode::RegReg {
a: expr0_0,
b: expr1_0,
};
let expr3_0 = LowLevelInst::Add {
mode: expr2_0,
};
return Some(expr3_0);
}
_ => {}
}
return None;
}
As you can see, each of our rules was collapsed into a single, efficient match
expression. Just like we would have otherwise written by hand. And wherever we
need to get a high-level operand as a low-level register, there is a call to the
Context::put_in_reg
trait method, allowing us to hook whatever arbitrary logic
we need to when putting a value into a register when we implement the Context
trait.
Things start to get more interesting when we want to do things like sink a load into the add's addressing mode. This is only desirable when our add is the only use of the loaded value. Furthermore, it is only valid to do when there isn't any store that might write to the same address we are loading from in between the load and the add. Otherwise, moving the load across the store could result in a miscompilation where we load the wrong value to add:
x = load addr
store 42 -> addr
y = add x, 1
==/==>
store 42 -> addr
x = load addr
y = add x, 1
We can encode these kinds of preconditions in an external extractor. An extractor is like our regular constructor functions, but it is used inside LHS patterns, rather than RHS expressions, and its arguments and results flipped around: instead of taking arguments and producing results, it takes a result and (fallibly) produces the arguments. This allows us to write custom preconditions for matching code.
Let's make this more clear with a concrete example. Here is the declaration of
an external extractor to match on the high-level instruction that defined a
given operand Value
, along with a new rule to sink loads into adds:
;; Declare an external extractor for extracting the instruction that defined a
;; given operand value.
(decl inst_result (HighLevelInst) Value)
(extern extractor inst_result inst_result)
;; Rule to sink loads into adds.
(rule (lower (HighLevelInst.Add a (inst_result (HighLevelInst.Load addr))))
(LowLevelInst.Add
(AddrMode.RegMem (put_in_reg a)
(put_in_reg addr)
0)))
Note that the operand Value
passed into this extractor might be a basic block
parameter, in which case there is no such instruction. Or there might be a store
or function call instruction in between the current instruction and the
instruction that defines the given operand value, in which case we want to
"hide" the instruction so that we don't illegally sink loads into adds they
shouldn't be sunk into. So this extractor might fail to return an instruction
for a given operand Value
.
If we recompile our ISLE source into Rust code once again, we see a new
inst_result
method defined on our Context
trait, we notice that its
arguments and returns are flipped around from the decl
in the ISLE source
because it is an extractor, and finally that it returns an Option
because it
isn't guaranteed that we can extract a defining instruction for the given
operand Value
:
pub trait Context {
fn put_in_reg(&mut self, arg0: Value) -> (Reg,);
fn inst_result(&mut self, arg0: Value) -> Option<(HighLevelInst,)>;
}
And if we look at the generated code for our lower
function, there is a new,
nested case for sinking loads into adds that uses the Context::inst_result
trait method to see if our new rule can be applied:
// Generated as internal constructor for term lower.
pub fn constructor_lower<C: Context>(ctx: &mut C, arg0: &HighLevelInst) -> Option<LowLevelInst> {
let pattern0_0 = arg0;
match pattern0_0 {
&HighLevelInst::Const { c: pattern1_0 } => {
// [...]
}
&HighLevelInst::Load { addr: pattern1_0 } => {
// [...]
}
&HighLevelInst::Add { a: pattern1_0, b: pattern1_1 } => {
if let Some((pattern2_0,)) = C::inst_result(ctx, pattern1_1) {
if let &HighLevelInst::Load { addr: pattern3_0 } = &pattern2_0 {
// Rule at isle_examples/tutorial.isle line 68.
let expr0_0 = C::put_in_reg(ctx, pattern1_0);
let expr1_0 = C::put_in_reg(ctx, pattern3_0);
let expr2_0: i32 = 0;
let expr3_0 = AddrMode::RegMem {
a: expr0_0,
b: expr1_0,
offset: expr2_0,
};
let expr4_0 = LowLevelInst::Add {
mode: expr3_0,
};
return Some(expr4_0);
}
}
// Rule at isle_examples/tutorial.isle line 54.
let expr0_0 = C::put_in_reg(ctx, pattern1_0);
let expr1_0 = C::put_in_reg(ctx, pattern1_1);
let expr2_0 = AddrMode::RegReg {
a: expr0_0,
b: expr1_0,
};
let expr3_0 = LowLevelInst::Add {
mode: expr2_0,
};
return Some(expr3_0);
}
_ => {}
}
return None;
}
Once again, this is pretty much the code you would have otherwise written by hand to sink the load into the add.
At this point we can start defining a whole bunch of even-more-complicated lowering rules that do things like take advantage of folding static offsets into loads into adds:
;; Rule to sink a load of a base address with a static offset into a single add.
(rule (lower (HighLevelInst.Add
a
(inst_result (HighLevelInst.Load
(inst_result (HighLevelInst.Add
base
(inst_result (HighLevelInst.Const offset))))))))
(LowLevelInst.Add
(AddrMode.RegMem (put_in_reg a)
(put_in_reg base)
offset)))
;; Rule for sinking an immediate into an add.
(rule (lower (HighLevelInst.Add a (inst_result (HighLevelInst.Const c))))
(LowLevelInst.Add
(AddrMode.RegImm (put_in_reg a) c)))
;; Rule for lowering loads of a base address with a static offset.
(rule (lower (HighLevelInst.Load
(inst_result (HighLevelInst.Add
base
(inst_result (HighLevelInst.Const offset))))))
(LowLevelInst.Load offset (put_in_reg base)))
I'm not going to show the generated Rust code for these new rules here because
it is starting to get a bit too big. But you can compile
isle_examples/tutorial.isle
and verify yourself that it generates the code you
expect it to.
In conclusion, adding new lowering rules is easy with ISLE. And you still get
that efficient, compact tree of match
expressions in the generated Rust code
that you would otherwise write by hand.
This is an overview of islec
's passes and data structures:
+------------------+
| ISLE Source Text |
+------------------+
|
| Lex
V
+--------+
| Tokens |
+--------+
|
| Parse
V
+----------------------+
| Abstract Syntax Tree |
+----------------------+
|
| Semantic Analysis
V
+----------------------------+
| Term and Type Environments |
+----------------------------+
|
| Trie Construction
V
+-----------+
| Term Trie |
+-----------+
|
| Code Generation
V
+------------------+
| Rust Source Code |
+------------------+
Lexing breaks up the input ISLE source text into a stream of tokens. Our lexer is pull-based, meaning that we don't eagerly construct the full stream of tokens. Instead, we wait until the next token is requested, at which point we lazily lex it.
Relevant source files:
isle/src/lexer.rs
Parsing translates the stream of tokens into an abstract syntax tree (AST). Our parser is a simple, hand-written, recursive-descent parser.
Relevant source files:
isle/src/ast.rs
isle/src/parser.rs
Semantic analysis performs type checking, figures out which rules apply to which terms, etc. It creates a type environment and a term environment that we can use to get information about our terms throughout the rest of the pipeline.
Relevant source files:
isle/src/sema.rs
The trie construction phase linearizes each rule's LHS pattern and inserts them into a trie that maps LHS patterns to RHS expressions. This trie is the skeleton of the decision tree that will be emitted during code generation.
Relevant source files:
isle/src/ir.rs
isle/src/trie.rs
Code generation takes in the term trie and emits Rust source code that implements it.
Relevant source files:
isle/src/codegen.rs
Please see this Cranelift branch for an ongoing sketch of an instruction selector backend in Cranelift that uses ISLE.