Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-RFC: Instruction Selector DSL for Cranelift. #13

Closed
wants to merge 3 commits into from

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Aug 5, 2021

Rendered

Summary

This pre-RFC aims to describe the case for developing or adopting a DSL to write instruction selection/lowering rules in Cranelift backends, and to introduce discussion points.

The goal is not (yet) to design a concrete DSL and start work. Rather, the goal here is to collect requirements, discuss different design choices and how they might work in our context, and generally to see what the community thinks about this and what folks might prefer.

Now that we have established the need for some sort of DSL, let's
examine what requirements are imposed by the problem at hand.

## Requirement 1: First-Class Destructuring/Matching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing InstructionFormat in favor of a big enum with a single variant for each instruction would already help with this even if there is no DSL by making it easy to match on said enum instead of the opcode, which allows binding the fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but that's probably an orthogonal change to consider; or rather, it would make more sense to think about concrete quality-of-life refactors like this if we decide not to do a DSL, since whether to do a DSL is the high-order bit for developer experience.

@cfallin
Copy link
Member Author

cfallin commented Aug 5, 2021

A note to frame things a bit: this pre-RFC contains what I think are a reasonable set of requirements to consider, but the requirements themselves are very much up for discussion, and I'm interested to hear what others think about their relative importance, or if this is missing any important requirements.

The discussion questions in the last part are also what I believe to be relevant design axes to talk about, but there may be others as well; please discuss!


This is reminiscent of "unsafe" code in Rust: it allows one to build
axiomatic building blocks with flexibility, but it requires one to
carefully define *what* the building blocks do.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that even after the migration is complete it should be possible to write arbitrary code for the output generation, but not necessarily the input pattern matching. This would allow for complex output code to handle for example turning fixed divisions into multiplications or shifts while still preserving analyzability and optimizability of the patterns if desired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there are a number of cases where "little building block of arbitrary logic" is useful. Creation of immediate operands is another good example: aarch64 has a very interesting logical-immediate format that can support only some values, with a complex algorithm to derive it. We'd want to make that a "primitive" in some sense by calling out to the existing implementation.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @cfallin!

expressed in some sort of DSL-internal type system, so that we do not
have to hardcode lowering rules into the DSL design itself.

## Requirement 6: Helps to Advance Verification Efforts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to verification efforts: I think we should call out superoptimization as something we would like to support for our isel. We should be able to do the superoptimization offline, based on CLIF patterns that we've harvested from real world programs, and then take the learned CLIF->vcode pairs and dump them into our new DSL. We should be able to effectively fold the preopt pass (and many more peepholes!) into isel lowering.

This would give us

  • better compilation throughput because we have fewer passes over the clif,
  • correct by construction CLIF->vcode patterns, and
  • optimal (according to some cost function) code generation for these CLIF->vcode patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, for sure!

I would hope that the format in which we express our basic lowering patterns is general enough to support arbitrary superoptimizer-derived patterns (i.e., if this is not even technically possible then something is very wrong), so in practice it seems this boils down to, I think:

  1. We should have or build a translator/bridge from a superoptimizer format into the lowering DSL (like peepmatic-souper)
  2. We should ensure that the whole infrastructure supports the "enormous pile of complex rules" case efficiently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, well said :)

@sparker-arm
Copy link
Contributor

sparker-arm commented Aug 16, 2021

My gut feeling is that using an existing DSL (peepmatic), and having a layer that tries the generated patterns first, falling back to the existing matchers, would provide the lowest barrier to entry for users as well as keeping the build system simple. Any finger in the air guesses how compilation time would be effected against the simple match strategy used currently? I also would wonder whether a DSL for just isel would be a big enough carrot for getting people to port over... Considering the amount of C++ in LLVM used for isel, even though tablegen has been there for as long as I know, should not be underestimated :) So, I feel at least half of tablegens value is the ease in which it enables the encodings to be described and it would be great if we had something like that too.

@fitzgen
Copy link
Member

fitzgen commented Aug 17, 2021

having a layer that tries the generated patterns first, falling back to the existing matchers, would provide the lowest barrier to entry for users as well as keeping the build system simple.

This is the "horizontal" integration in the pre-RFC, and I am inclined to agree, unless the vertical integration happens to fall out of the DSL's compilation/execution model "for free".

Any finger in the air guesses how compilation time would be effected against the simple match strategy used currently?

I wouldn't expect any significant slow downs, assuming that the DSL also compiles down to Rust code similar to what you would otherwise have written by hand (e.g. a match that switches on opcode). Maaaayyyybe a little bit of overhead related to icache because there are now two code paths rather than one, but I wouldn't expect too much.

@cfallin
Copy link
Member Author

cfallin commented Aug 17, 2021

@sparker-arm the goal is certainly to generate code equivalent to what we have today, so ideally we have zero slowdown in the Cranelift compile time, and in the future, possibly improvements that are enabled by more centralized control of the backend code's idioms (i.e., right now if we come up with a new way of matching, we have to modify all of the open-coded use sites; but if we generate this from patterns then we can transition instantly). The latter is especially interesting to me as it will let us eventually move to the native SSA-based API of regalloc2 which should give some speedups.

I've got a reasonable design down on paper now and am working on refining the writeup before posting the RFC -- hope to have it up in the next few days :-)

@cfallin
Copy link
Member Author

cfallin commented Aug 19, 2021

I will go ahead and close this pre-RFC, as I think it has served its purpose well in starting discussions and getting early feedback on ideas that have gone into a now more fully-formed RFC, #15. Thanks all for the input and please do give any thoughts you might have on the new RFC!

@cfallin cfallin closed this Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants