-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: design of ISLE instruction-selector DSL. #15
RFC: design of ISLE instruction-selector DSL. #15
Conversation
This is based on extensive discussions surrounding the ideas in bytecodealliance#13 and is meant to be a refinement of earlier ideas that is compatible with our goals related to verification, and is able to build on top of Peepmatic's automaton abstraction.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
cc folks who participated in #13 and/or may be interested due to involvement in Cranelift backends: @fitzgen @sunfishcode @tschneidereit @abrown @jlb6740 @akirilov-arm @afonso360 @bjorn3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice and clean.
Sorry if I've missed it, but my main confusion is over how extractor functions are defined, is the nitty gritty matching done in Rust and will just reuse the existing code?
Also, would we be able to express instructions that have multiple outputs, such as pre/post indexed memory operations?
The basic strategy is to "inline" the rule-matching that would apply | ||
to the newly constructed value on the right-hand side of a rule. In | ||
other words, if a rule transforms `(A x y)` to `(B x y)` and there is | ||
another rule that matches on `(B x y)`, we generate code that chains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this mean we can write generic, and machine-level, pattern reducers as part of isel then? This suggests that 'B' could be a machine node and so we can use machine nodes as inputs. If so, could this be viewed as multiple peepmatic passes: (1) IR-to-IR reductions, (2) IR-to-MachInst ISel, (3) MachInst reductions? And then would having multiple passes have benefits over inlining while the tree is still being selected? Maybe splitting it into the two later phases would ease implementation and verification. And could we do some lowering with the same method - especially stuff like expanding unsupported SIMD operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the DSL itself is agnostic to whatever the input and output formats are (that depends on the "bindings" one writes with external extractor and constructor definitions), so in principle one could have all of the above.
Reductions in the legalization sense, e.g. lowering a 64-bit add to two 32-bit adds (to take a simple example), should be possible as well. This has implications for the term stratification approach to ensuring termination, so we would need to expand that a bit (the RFC notes that allowing substructural recursion, or maybe just "recurse during inlining up to limit k") to allow the lowering code to be generated properly. But that's absolutely a goal -- writing fallback rules like this then composing with machine-specific rules is much more powerful than the support-all-features-on-all-machines approach we currently have (e.g. adding i128 separately on x64 and aarch64).
We could also have multiple passes; @fitzgen and I discussed this a bit yesterday actually. For now I think it makes sense to start with the single lowering pass, because we have the existing machinery for that, but once the DSL compiler exists, it should be relatively straightforward to use it to generate another pass as well.
denotational value of a term is the symbolic or concrete value | ||
produced by the instruction it represents (depending on the | ||
interpretation); so "all" we have to do is to write, e.g., | ||
pre/post-conditions for some SMT-solver or theorem-prover that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds cool, do you know if someone else is doing this? Sounds like it would be a hard problem for instructions that read/write global state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are various research folks working on compiler verification; e.g. this paper from recent ASPLOS 2021. The general term is "translation validation" when it involves proving a specific input and output are equivalent, I think. I don't see the core Cranelift team diving into this fulltime but enabling it is very important, and there are academics who might be interested!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the paper - interesting stuff! From my brief interactions with cranelift so far, the key differences that I see with cranelift and LLVM are that neither levels of IR are defined as well in cranelift, for example:
- I don't think that the aarch64 backend actually models pstate flag setting.
- The CLIF icmp has some ugly target defined behaviour...
So I think having some verification would be an excellent driver for improving our internal semantics.
One question I have before reading this RFC, would it be possible to have backend agnostic rules for things like 128bit ints or 64bit ints on 32bit systems? Kind of like the old legalization system, except without having to edit the clif ir in-place. As different architectures work differently, it may make sense to have partially or fully different rule sets for architectures with and without flags. In addition it should be possible to override rules for specific backends if they can do it more efficiently. |
rules differently: for example, we might test/fuzz a backend by | ||
applying rules in a different order. One should also be free to tweak | ||
priorities to attempt to optimize, without fear of breaking the | ||
semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to tell when multiple rules can apply to the same input and maybe allow hints to acknowledge the overlap. These hints could then also contain the advisory priority or the rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this gives me an idea for a DSL-compiler mode that produces some "explainer" output -- which overlaps occurred and which prioritization decisions were made. That could possibly be useful. Also if a rule is never used (because it's overshadowed by some other higher-priority rule that completely overlaps it) that is probably worth a warning.
```plain | ||
|
||
(rule (Iadd a b) | ||
(X86Add a b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion for an alternative syntax to consider:
(Iadd a b) => (X86Add a b);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally hoping to stay true to S-expression form here (so the toplevel forms/clauses are (KEYWORD ...)
) but =>
as the keyword ((=> (A x) (B x))
), which is what Peepmatic does, could work. Others please let us know your favorite bikeshed color below!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more suggestion: Please enforce a specific formatting of the S-expressions on CI in rustfmt style. That is a 1-to-1 map from parsed AST to the resulting string. Not like eg clang-format which tries to preserve newlines and things like that and only normalizes things like spaces.
```plain | ||
;; 12-bit immediate form. | ||
(rule (iconst val) | ||
(when (is-aarch64-imm12-value val)) ;; Conditional predicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to infer this using "typechecker" that takes constraints from the output instructions and propagates them to the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels a bit too magic/implicit, but what can be done instead is that constructors that build the final instructions take strongly-typed values that can only be produced by extractors that check the relevant conditions. So the plan is to have e.g. an Aarch64Imm12
value type, and this can only represent legal values. Similar type-encoded conditions can be used for other properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also want to use this for things that are not typecheckable, such as a processor extension predicate.
(rule (prio 20) (iadd (i8x16 a) (i8x16 b))
(when (has-avx512))
(avx512-add a b))
(rule (prio 10) (iadd (i8x16 a) (i8x16 b))
(when (has-3dnow))
(3dnow-add a b))
Or, would these cases need a custom extractor to mach them? instead of a generic i8x16
?
Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Extractors are defined in Rust, yeah, so it's sort of an FFI to "runtime library code" (from the point of view of the DSL). But they should be low-level: I expect that we'll have extractors for the
This is an interesting one! In the most general form it needs multi-rooted DAG matching (i.e. we need to find the load/store and the add-to-pointer separately), which needs adjustments to our lowering loop as well. I don't see why a term-rewriting system in general couldn't express that, but it needs further thought. |
Indeed! The hope is that we can get back to that universe where we have fallback rules. In particular this should make e.g. fuller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this! I think that this would greatly simplify backends (at least the aarch64 backend).
I'm slightly concerned about compile times. Working on LLVM and editing a .td
file in one of the targets is not a great experience. It usually involves at least a 1 min build for my laptop, which is really disruptive. And while I now have a much better experience with a different machine, I would hate for us to find ourselves in the same situation.
I don't know if this is going to be an issue, but we may want to think about ways to mitigate it if it does.
Note that a constructor, unlike an extractor, cannot fail. More on the | ||
execution model below. | ||
|
||
## Defining Rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the aarch64 backend we have emit_shl_i128 which emits 9 instructions, and is used 4 times while lowering other instructions. In the rotr/rotl case we end up emitting 21 instructions (1 i128 shr + 1 i128 shl + 3 instructions), it would be nice not to have to repeat the shl/shr instructions 4 times in different orders.
Would we do this via a (shl-i128 a b
) constructor? Or can we do this as a separate rule which gets matched as well?
This feels like an edge case, not sure if it happens elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar, but somewhat different case here would be popcnt
, where we emit a number of instructions, and then match a single instruction depending on the type of the input.
But here, since the total size is about 6 instructions in 5 different input types it isn't as bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the RFC now, it looks like the shl case would be a good use of Virtual Internal Terms. popcnt
might be too small for this to make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, that's the intent of the ability to freely (in terms of runtime cost) create these virtual terms that chain rules together -- they allow additional factoring and abstraction.
I agree this could be a problem -- we'll have to see what it looks like when we have a prototype ready (it's coming soon-ish!). There are really three aspects to this, and we could make a difference to the first two with a certain change:
At least the first two (building the metacompiler, and invoking the metacompiler to generate Rust code from the DSL) could be removed from the Cranelift build time when the DSL code is unchanged by checking the generated Rust code into source control. I'll note that this is not the idiomatic Rust approach at all (that would be to run this from The edit-run debug loop depends on the last two only (presumably the DSL metacompiler will be built once then won't be changed if one is hacking on a backend); we get all three only when hacking on the metacompiler itself. Anyway, whether or not to check in the generated artifacts is a question we can discuss here as well -- thoughts? |
(AddConst (ValueReg b) c)) | ||
|
||
; Add with load on one input. | ||
; Note the `FromInstOnlyUse` which ensures we match only if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this FromInstOnlyUse
has disappeared below?
; Add with constant on one input. `I32Value` extractor matches a `u64` | ||
; immediate that fits in an `i32` (signed 32-bit immediate field). | ||
(rule (Lower (Iadd (InsnValue (I32Value (Iconst c))) b)) | ||
(AddConst (ValueReg b) c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at this, I'm thinking that constant propagation / strength reduction (maybe other optimizations) could be carried out within this framework as well, but those would operate in higher-level IR types (CLIF, not VCode) that aren't tied to a specific architecture. If I understand correctly, one could have two sets of declared patterns (i.e. in two files), each declaring its IR type, the set of rules, etc. and we could call the two resulting lower functions at different times during the optimization passes.
(type Inst extern (enum | ||
(Add (a Input) (b Input)) | ||
(Load (addr Input)) | ||
(Const (val u64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be harder to inspect some characteristics of the given resulting Rust enum: what are its size? which variant is the biggest (and thus determines the whole enum's size)? Reducing the memory footprint of IR data structures has been a fruitful exercise in the past, causing reductions in the number of malloc calls and thus compile-time speedups, and I'm a bit worried that this kind of optimization opportunities, that was very easy to implement in the past, will become unbearable within this framework, except for the framework author(s).
On the other hand, one could argue that any improvement to the generic framework would benefit all the backends at once (but that's just a benefit from a generic framework: errors/bugs would be shared too!). Still, it doesn't seem like it's fitting the initial goal of being welcoming and simple for newcomers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so a key thing here is that this extern
type decl is referring to an enum that's already defined in Rust code elsewhere. Perhaps we could find a better way of doing it later, but for now the idea is to declare the view of the types that we need to generate Rust code against them, but not to generate the type definitions themselves. So during the migration, we still have the enum Inst
in isa/*/inst/mod.rs
, and we can still e.g. write tests against its size and play with its packing.
One could argue that it'd be better to generate both from a single source of truth, and I think that has to be the eventual answer; in that case one could still write a unit-test against std::mem::size_of::<x64::Inst>()
or whatnot but changing it would mean editing the DSL.
A key factor in approachability IMHO (this may be controversial; please discuss!) could be to include the generated code in the git repo. Aside from implications on build-time above, it would mean that (i) the actual Rust definitions, at the stage in our evolution when they are all generated from the DSL, should be easy to find; and (ii) we can add a note to the top of the file saying "autogenerated, please see file.isle" and maybe on the actual enum a comment like "autogenerated from definition Foo in file.isle line 123".
I do think that there is an interesting philosophical debate about the definition of "welcoming and simple": building some DSL framework means that if we're careful, the "happy path" of adding a new instruction or a new lowering rule is much easier, and also means that there is less cognitive load / fewer subtle invariants to keep in mind. That's pretty much the driving factor I had in mind at least. But I agree that it can make systemic optimizations, like "make the Inst
enum smaller", more difficult it it means one has to step up into a meta-level, and it also puts a lot of pressure on the framework to be correct, lest we have difficult-to-find bugs. IMHO the tradeoff is worth it and does make the codebase more welcoming for the most common sort of contribution we anticipate in the future, but... please do discuss!
Greetings all! I've been churning on a little prototype of the DSL described in this RFC -- or at least something very close to it, with tweaks as needed as I discovered interesting issues in implementation -- in order to get a feel for its semantics. For your consideration, a small sketch of an instruction selector in this commit. It doesn't actually do anything yet; it's not wired up to the rest of Cranelift; but it does successfully pass through this prototype ISLE compiler and generate some Rust code. The little sketch in the commit above has some "prelude" that gives definitions to glue the DSL onto Cranelift, but the interesting bit is in
This all can be bikeshedded to fine-tune the actual types and definitions as much as we want; the point is that we have a term-rewriting system, and descriptive terms in the "extractors" (LHS terms) and "constructors" (RHS terms). To head off any quick reactions to the verbose terms like To the best of my thinking right now, all of these rules are compatible with our main goals -- both quick (single-pass) lowering as compiled by the prototype compiler above, and a value-equivalence-based view of the lowering to aid verification and other formal efforts. (To see how, think of e.g. Let me know what you think! I am going to context-switch away from this for a little bit but (i) I will plan to braindump a good amount about the internals / implementation details / what I've learned while prototyping the compiler-compiler, and (ii) I do intend the above prototype to be a "real" starting point (modulo some cleanups) that we can develop into the actual implementation, if this RFC is accepted. |
Is the idea that the Either way, I think that either building the coloring logic into the |
Yes, exactly, it's a wrapper (or will be once we write the trait impl) around the |
While I think a better way to handle instruction selection and generation is necessary, I have a few concerns about this proposal, especially around this not reaching the ideal goal of simplicity we're looking for, and I am unsure about the use of a custom DSL for this task. Moving the debate out of the comment, I think this doesn't get us to a better place in terms of simplicity. Simplicity/easiness-to-use is largely subjective anyways, and the creator bias/hindsight bias make it so that a system one creates likely appears as being simple to their own eyes. However, here even after re-reading the proposals a few times, I'm still hesitant about a few concepts that seem to be key to knowing how to do things (extractor/entrypoints/...). Introducing concepts is inevitable of course, but if these are hard to grasp for a majority of current and future contributors, the system wouldn't be very welcoming for users, becoming a large barrier to entry. I grasp from other people asking for detailed examples and supplementary blog posts explaining this new ISLE framework that I might not be the only one having uncertainties here; don't want to read too much into this, because I think I have been the only one expressing explicit concerns of this kind so far. In terms of complexity, I see it like this: adding a new single instruction? Probably fine, I can just copy-paste some other ISLE code, stealing bits here and there. Add a way to find a new kind of patterns? Unclear if I could do it. So adding a whole new target ISA? Sounds scary. All these things feel straightforward to me in the existing code base, but I am myself subject to the creator bias, of course :-) and navigating the code base is pretty easy thanks to the great Rust developer tools. And that relates to the second point: using a custom DSL for this kind of work would be a clear step backwards, in my opinion. We're talking about a small programming language here, which implementation will be tweaked, which "programs" (ISLE definitions) will need to be debugged, optimized, etc. Moving the old meta framework from Python to Rust was a clear win, because we got static typing which removed a whole range of error-proneness, and a better developer environment in general: being able to share code between codegen and meta, using Rust packages in the meta crate if needs be, using all the developer tools included in IDE (rust-analyzer ❤️). So getting back to a custom DSL, we'd lose debuggers, IDE functionalities, namely jump-to-definition i.e. the ability to look how things are implemented ("what's an extractor? oh I could just jump to its definition, owait no, it's done in a different language at a different level, might as well just grep around in the code base, hoping that i'll find where/how it's done"). This is not intractable of course, and all these tools could be reimplemented but that might be a waste of resources to do so. Any of Rust/Python/TypeScript would be better fits at this point, since they now all have good development ecosystems (with a clear preference for Rust, of course!). Please bear with me, here: I am fully in favor of a change in the way we're doing instruction selection, understanding all the benefits, but I am not convinced the proposal in its current state would solve our complexity issues. |
@bnjbvr thank you for taking the time to detail your concerns here! First, at a high level: I'm glad that we agree that some sort of DSL or systematized approach is necessary. The recent string of SIMD-related fuzzbugs, and other difficulties we've had in the past in communicating some of the subtle requirements and invariants in the current handwritten backends (e.g., regalloc metadata, when it is safe to use registers, when it is safe to sink an instruction, and some of the subtle bugs that have resulted when we get this wrong), have convinced me pretty solidly that we need to automate; a handwritten approach, while tempting below a certain level of intrinsic complexity (number of lowering cases / instructions), is just too error-prone. There are also the things that we just can't do today, that a systematized DSL would enable: large-scale refactors that alter how the lowering works (e.g., switching to a new regalloc API, or a different emission strategy), combining multiple layers of rewrite rules or manipulating/processing them in some other way (e.g. Peepmatic-style generated simplification patterns), and consuming the lowering patterns for other uses, such as verification. So, in my mind, we need a DSL; the question is just whether this one is the right one. If I'm reading your main concerns right, I think they distill down to this -- please correct me if I mis-state anything:
Both of these are real issues and I think that weighing these against the benefits is largely a subjective question. But I want to dive in a bit on each. Complexity of mental model: it might be useful to start from a pure term-rewriting system as a neutral, fairly mainstream sort of approach (see e.g. the Go compiler's DSL, or tablegen's expression trees). In your view, is this sort of approach reasonably approachable, or does it suffer from some of the same issues? In other words, I'm trying to distill whether some of the new concepts in ISLE specifically (extractors, single-pass lowering) are the issue, or instead if the issue is just thinking in terms of layered rewrite rules. If the former, I think we can definitely try to refine the terminology, or give better examples. If the idea of a programmable extractor in particular is ultimately too weird or objectionable, I can imagine some alternative ways of getting the same thing -- maybe we have constructors on the right-hand side instead that can fail to match. If the latter, and all term-rewriting systems raise concerns, then I think we're at more of a philosophical impasse, since that way of representing instruction lowering is so powerful for all the other things we want to do, but I'm still happy to discuss more :-) Discoverability: this one is a bit harder, and I agree that pure Rust code is the gold standard w.r.t. discoverable definitions and autocompletions. I do want to note re: your listed advantages above that ISLE, too, is strongly statically typed, unlike a lot of other past term-rewriting systems; so it will at least catch type errors, if not proactively help avoid them in an IDE. I'm curious, though, if you have better ideas to address it -- an EDSL in Rust somehow? If so, I'm not sure how we'd be able to do the transforms we want to do -- if you squint enough, the ISLE compiler is sort of doing a "transpose", in that it's weaving together a bunch of different lowering-pattern rules by their common prefixes. It's thus a kind of global transform, and would probably need a Rust compiler plugin, rather than just e.g. Rust macros that build data structures, as the cranelift-codegen-meta DSL does. I'm very open to suggestions here. Anyway, that's my braindump for now; hopefully we can come to a better consensus here, or at least distill out the subjective decision points that underlie the space! |
Hi @bnjbvr :) First off, I want to recognize that your concerns are valid, particularly around IDEs and developer tooling. I think that is just a fundamental trade off of using a DSL vs open coding, and something we need to explicitly acknowledge and accept if we are going to go down the DSL route. Aside: Although embedded (within the "main" language) DSLs (EDSLs, as @cfallin mentioned (apologies if you are already familiar with this terminology)) can sometimes have their cake and eat it too with regards to tooling vs DSL. In Rust I think that would probably look like some kind of macro sugar language because I don't think Rust's regular abstraction capabilities would allow us to express the kinds of transformations and combinations of rules that we want to express. However, (a) my experience has been that All that said, as someone who hasn't contributed to the new backends, I personally find ISLE more inviting than the open-coded backends. Many grains of salt here, since I did some research into Go's/GCC's/etc DSLs for instruction selection and peephole optimizations when designing and implementing Peepmatic, so I've already made myself familiar with this term-rewriting paradigm. It was also helpful to step through a few very small examples with @cfallin the other day. We started with just the rule to lower an arbitrary |
Thanks. As a note, I won't be able to attend this particular meeting because of a conflict. |
Just coordinated briefly with @bnjbvr and have now added an agenda item for the next (Mon Oct 4) Cranelift meeting -- anyone interested in continuing the discussion please feel free to join! |
FYI, I have added a tutorial and implementation overview to ISLE's |
Hi all -- thanks for the continuing input and patience on this long-running effort! @fitzgen gave an excellent update on his progress driving forward the prototype Cranelift integration for ISLE in the Cranelift biweekly today, and example implementations of several instruction lowerings. The general consensus from the meeting was that there are no strong objections to moving forward with this. So, with that in mind, I would like to make a... Motion to finalize with a disposition to mergeStakeholders sign-offArmDFINITYEmbark StudiosFastly
Google/EnvoyIntelMicrosoftMozillaIBMwasmCloudUnaffiliated |
Fwiw, here are the slides for the update I gave at today's cranelift meeting: https://docs.google.com/presentation/d/1b6psjaIKkVTZGaGzMcw9h_Ahgih-ctUcnz0Dbjn11GI/edit |
(sorry @alexcrichton, just noticed your name was missing -- I had copied the checkbox list from your latest RFC's motion-to-finalize comment. added now!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no objections
Now that we have signoffs from two separate groups, as per
we are moving into the Final Comment Period. The FCP will end on Monday, November 15. |
This is OK with me. |
Hi all -- now that the Final Comment Period has elapsed and there are no pending objections, it is time to merge this RFC! The next step will be review and landing of the initial code itself. I plan to put together some language-reference-like documentation for the DSL itself, and I'm happy to help answer questions and onboard folks in general as we transition the codegen backends to this new approach! |
Rendered
This RFC proposes a new DSL for the instruction selectors in Cranelift,
automating the generation of the backends based on a list of term-rewriting
rules provided by the backend author.
This is based on extensive discussions surrounding the ideas in
#13 and is meant to be a refinement of earlier
ideas that is compatible with our goals related to verification, and is
able to build on top of Peepmatic's automaton abstraction.