-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-RFC: Instruction Selector DSL for Cranelift. #13
Conversation
Now that we have established the need for some sort of DSL, let's | ||
examine what requirements are imposed by the problem at hand. | ||
|
||
## Requirement 1: First-Class Destructuring/Matching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing InstructionFormat
in favor of a big enum with a single variant for each instruction would already help with this even if there is no DSL by making it easy to match on said enum instead of the opcode, which allows binding the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but that's probably an orthogonal change to consider; or rather, it would make more sense to think about concrete quality-of-life refactors like this if we decide not to do a DSL, since whether to do a DSL is the high-order bit for developer experience.
A note to frame things a bit: this pre-RFC contains what I think are a reasonable set of requirements to consider, but the requirements themselves are very much up for discussion, and I'm interested to hear what others think about their relative importance, or if this is missing any important requirements. The discussion questions in the last part are also what I believe to be relevant design axes to talk about, but there may be others as well; please discuss! |
|
||
This is reminiscent of "unsafe" code in Rust: it allows one to build | ||
axiomatic building blocks with flexibility, but it requires one to | ||
carefully define *what* the building blocks do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that even after the migration is complete it should be possible to write arbitrary code for the output generation, but not necessarily the input pattern matching. This would allow for complex output code to handle for example turning fixed divisions into multiplications or shifts while still preserving analyzability and optimizability of the patterns if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there are a number of cases where "little building block of arbitrary logic" is useful. Creation of immediate operands is another good example: aarch64 has a very interesting logical-immediate format that can support only some values, with a complex algorithm to derive it. We'd want to make that a "primitive" in some sense by calling out to the existing implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @cfallin!
expressed in some sort of DSL-internal type system, so that we do not | ||
have to hardcode lowering rules into the DSL design itself. | ||
|
||
## Requirement 6: Helps to Advance Verification Efforts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to verification efforts: I think we should call out superoptimization as something we would like to support for our isel. We should be able to do the superoptimization offline, based on CLIF patterns that we've harvested from real world programs, and then take the learned CLIF->vcode pairs and dump them into our new DSL. We should be able to effectively fold the preopt pass (and many more peepholes!) into isel lowering.
This would give us
- better compilation throughput because we have fewer passes over the clif,
- correct by construction CLIF->vcode patterns, and
- optimal (according to some cost function) code generation for these CLIF->vcode patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, for sure!
I would hope that the format in which we express our basic lowering patterns is general enough to support arbitrary superoptimizer-derived patterns (i.e., if this is not even technically possible then something is very wrong), so in practice it seems this boils down to, I think:
- We should have or build a translator/bridge from a superoptimizer format into the lowering DSL (like
peepmatic-souper
) - We should ensure that the whole infrastructure supports the "enormous pile of complex rules" case efficiently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, well said :)
My gut feeling is that using an existing DSL (peepmatic), and having a layer that tries the generated patterns first, falling back to the existing matchers, would provide the lowest barrier to entry for users as well as keeping the build system simple. Any finger in the air guesses how compilation time would be effected against the simple match strategy used currently? I also would wonder whether a DSL for just isel would be a big enough carrot for getting people to port over... Considering the amount of C++ in LLVM used for isel, even though tablegen has been there for as long as I know, should not be underestimated :) So, I feel at least half of tablegens value is the ease in which it enables the encodings to be described and it would be great if we had something like that too. |
This is the "horizontal" integration in the pre-RFC, and I am inclined to agree, unless the vertical integration happens to fall out of the DSL's compilation/execution model "for free".
I wouldn't expect any significant slow downs, assuming that the DSL also compiles down to Rust code similar to what you would otherwise have written by hand (e.g. a |
@sparker-arm the goal is certainly to generate code equivalent to what we have today, so ideally we have zero slowdown in the Cranelift compile time, and in the future, possibly improvements that are enabled by more centralized control of the backend code's idioms (i.e., right now if we come up with a new way of matching, we have to modify all of the open-coded use sites; but if we generate this from patterns then we can transition instantly). The latter is especially interesting to me as it will let us eventually move to the native SSA-based API of regalloc2 which should give some speedups. I've got a reasonable design down on paper now and am working on refining the writeup before posting the RFC -- hope to have it up in the next few days :-) |
I will go ahead and close this pre-RFC, as I think it has served its purpose well in starting discussions and getting early feedback on ideas that have gone into a now more fully-formed RFC, #15. Thanks all for the input and please do give any thoughts you might have on the new RFC! |
Rendered
Summary
This pre-RFC aims to describe the case for developing or adopting a DSL to write instruction selection/lowering rules in Cranelift backends, and to introduce discussion points.
The goal is not (yet) to design a concrete DSL and start work. Rather, the goal here is to collect requirements, discuss different design choices and how they might work in our context, and generally to see what the community thinks about this and what folks might prefer.