[Apr 26 Discussion] Superoptimizer: A Look at the Smallest Program #320

JonathanDLTran · 2022-04-20T20:27:39Z

JonathanDLTran
Apr 20, 2022

This is the discussion thread for Superoptimiser: A Look at the Smallest Program by Henry Massalin. The Discussion leaders are Victor Giannakouris and Jonathan Tran.

anshumanmohan · 2022-04-23T00:05:23Z

anshumanmohan
Apr 23, 2022

I think this paper turns on an observation that is both liberating and frightening: our instruction sets are now (and by "now" I guess I mean 1987!) sufficiently complicated that we no longer know how best to use them. In some cases it just makes sense to define a slow reference implementation that we do understand, and then let a computer brute-force its way through an instruction set in search of the best equivalent version. The paper seems to define "best" as shortest, but I imagine it is easy to retool it to the cost-function of choice.

I'd be curious what folks thought of the Krumme & Ackley work (see the second half of §5). It is probably (discuss?) more mathematically elegant than the present work, but it has serious drawbacks of its own. The author proposes a combination of the superoptimizer and the Krumme & Ackley work, and a quick search seems to suggest that superoptimizers today are indeed used in that way.

2 replies

sampsyo Apr 23, 2022
Maintainer

Yeah, interesting to consider the alternative/simpler version: don't actually synthesize the code "from scratch"; just explore all possible decisions for ordering, register allocation, etc., as in Krumme & Ackley, 1982. I don't have a succinct opinion about what makes one approach better than the other in which circumstances… it seems like a big difference to give up on "surprising" interactions between instructions that can't be expressed by normal translation rules.

A modern example more in the Massalin vein is STOKE.

gsvic Apr 26, 2022

@anshumanmohan The custom cost-function sounds like a good idea! It's interesting that even recent works like this, which was published in 2017 they are still using a cost-function which only counts the instructions. I am wondering if there are any specific challenges in including a more sophisticated one.

ayakayorihiro · 2022-04-24T18:03:54Z

ayakayorihiro
Apr 24, 2022

This paper tackles a very daunting task: "find the shortest version of the original program"! And the approach towards this is an optimized search with equivalence checks, which also feels like a daunting task (especially nowdays, when we probably have much more convoluted? complicated? large? programs than what scholars at the time were facing). They describe two approaches towards checking whether a potential optimized program is equivalent to the original: a Boolean Test that checks the two programs expressed as boolean formulas (which is extremely costly in terms of both space and time), and a Probabilistic Test (which sounds like it's basically testing?) that checks input output behavior. I personally find the statement "It was found in practice that a program, has a very low probability of
passing this execution test and failing the boolean verification test." to be a little... flimsy? or at least, I was curious about what this very low probability actually was 😅 We all know that Program Equivalence in general is undecidable so we have to approximate, but there is a part of me that wonders how often Superoptimizer may produce a buggy "optimized program" (even though the author says they found no incorrect programs!).

6 replies

sampsyo Apr 25, 2022
Maintainer

Here are a couple of thoughts on this, mostly just echoing @JonathanDLTran… the statement about programs having "a very low probability of passing this execution test and failing the boolean verification test" seems very plausible to me in the context of an exhaustive search. That is, it is certainly true that some programs will be hard to distinguish by randomly generating inputs, but those programs will be extremely sparse in the space of all possible programs under a given length. Put differently, a randomly selected program in the search space will be very unlikely to be similar to any given spec; the vast majority of programs will not even be remotely similar. Of course, the tiny number of programs that are confusingly close are potentially a problem!

Next, I wanted to highlight that the idea is to use the empirical/unsound equivalence check as a filter:

First, Massalin says "The idea here is that most programs will fail this simple test, and a full program verification test will be done only for the few programs that this test fails to catch."
Second, she retreats on this a moment later and says "we can dispense with the boolean test and manually inspect the generated programs for correctness, without having to analyze a large number of wrong programs." So the formal equivalence check doesn't actually work for this paper, but the idea is still to have a person stand in for it.

sampsyo Apr 25, 2022
Maintainer

Also, just linking to the Denali superoptimizer that @JonathanDLTran mentioned, which is a great paper! Another modern one to check out is Souper, which is for LLVM. Like most modern superoptimizers, it uses a proper "Boolean check" that actually verifies equivalence (in this case, based on an SMT solver).

5hubh4m Apr 25, 2022

The probabilistic testing of program equivalence reminds me of fuzzing, where one tries random inputs to get the program to fail tests or crash. It might be interesting to maybe apply the techniques we developed for fuzzing to test program equivalence.

JonathanDLTran Apr 25, 2022
Author

Perhaps to build off what @5hubh4m mentioned, I also wonder if there is a way to apply techniques from mutation testing to find non-equivalence of the original and the enumerated program, because it seems like at least some of the enumerated programs will bear some syntactic similarity to the original program.

chhzh123 Apr 25, 2022

Same as what @5hubh4m mentioned, fuzzing is also the first thing that comes to my mind when the paper talks about probabilistic testing. I feel like the boolean test or general formal verifier may take a long time to run and is not scalable. Random inputs will always be useful to test whether a program has bugs, especially for large programs. Selection of random inputs may be more important to ensure the tests cover different paths of the original program and are able to deal with corner cases.

chhzh123 · 2022-04-25T21:35:34Z

chhzh123
Apr 25, 2022

I really like this short and concise paper. A really simple but powerful idea! Those optimized programs in the appendix are somehow like magic and are hard for programmers to figure out in the first place.

I wonder what is the performance of superoptimizer on emerging ISA like RISC-V. Do we still use this kind of brute-force method to find the best instruction implementation? Or we have other heuristics to speedup the process?

5 replies

charles-rs Apr 26, 2022

I was under the impression that it isn't really for implementing the ISA, but rather coming up with which instructions aren't necessary in the ISA since there is a short sequence of equivalent instructions

sampsyo Apr 26, 2022
Maintainer

I'd be interested to hear any thoughts you have about how RISC-V would offer different opportunities/challenges from other ISAs!

chhzh123 Apr 26, 2022

I was under the impression that it isn't really for implementing the ISA, but rather coming up with which instructions aren't necessary in the ISA since there is a short sequence of equivalent instructions

By "instruction implementation", I mean finding a sequence of existing instructions that has the same functionality as the original long program. I'm not sure if superoptimizer can guide ISA design. Determining "which instructions aren't necessary in the ISA" seems not an easy job. Even though some instructions can be replaced by other instruction sequences, they are left in the ISA probably because they are easier for hardware to implement and may be useful for specific applications.

chhzh123 Apr 26, 2022

I'd be interested to hear any thoughts you have about how RISC-V would offer different opportunities/challenges from other ISAs!

RISC-V provides spaces for users to extend their ISA, so in this case, the ISA is always evolving. Different vendors may have their own customized ISA. This perhaps places challenges for the superoptimizer since it needs to run optimizations for each new ISA. Some features like incremental optimization may greatly shorten this process.

sampsyo Apr 26, 2022
Maintainer

Oh yeah, good point! A search-based compiler of any kind is fundamentally more “portable” than a classic rule-based compiler.

orkosinha · 2022-04-25T21:37:00Z

orkosinha
Apr 25, 2022

Pretty cool paper, I like the idea of building idioms of functions that are often used like a library, which reminded me of the fast inverse square root, or using a superoptimizer as a benchmark for how succinct an instruction set.

I was a bit confused on how might a compiler actually use superoptimizers and when would it know to use it?

1 reply

JonathanDLTran Apr 25, 2022
Author

I would imagine that superoptimization idioms of functions that are often used would be rather fruitful, in that the compiler could choose from the library of functions, and replace any it sees with the superoptimized version. I would also guess in general that a superoptimizer would be used offline, to avoid making compilation excessively long.

It also seems that there is work that builds on this paper, in which numerous short sequences of code are taken from sample programs, and then optimized in the vein of this work. Then, any time these short sequences of code are seen again, the optimized version can be used. For instance, this paper uses this idea. Peephole Superoptimizer

michaelmaitland · 2022-04-26T01:10:29Z

michaelmaitland
Apr 26, 2022

I am left wondering what the impact of unenforced preconditions specified by programmers in the function specification has on this idea. For example, if there is a comment like the following:

/**
* Precondition: x is within the interval [-1024, 1024]
* /
f(int x);

Programmers do this all the time because language constructs do not exist or because they were lazy and expected nobody to call f with a number out of range, especially if it is a function that is private.

But I wonder what impact something like this would have on the size of the search for superoptimization. Could tight enough preconditions make the search space small enough for the superoptimizer to become more than just a toy?

1 reply

sampsyo Apr 26, 2022
Maintainer

This is not a direct answer to your question, but I recommend this STOKE follow-on paper about "conditionally correct" superoptimization if you're curious in general about the role of preconditions.

andrewb1999 · 2022-04-26T02:34:11Z

andrewb1999
Apr 26, 2022

I'm not well versed in the status of modern superoptimizers, but it would be interesting to see machine learning or machine learning adjacent techniques applied in this context to try to reduce the search space. Doing the simple search space pruning they perform in this paper surely rule out a lot of the potential search space, but I would hope that this search space could be narrowed more to improve the scalability of these techniques.

I also find the unexpected behavior that arises out of superoptimization to be very interesting. It's particularly cool to see a set of instructions that are equivalent to a more complex implementation, but seem like it shouldn't work at all. This reminds me of a relatively famous result in the FPGA world when machine learning was used for placement and routing. One design generated through machine learning worked only because of electromagnetic coupling between adjacent wires in the FPGA, completely departing from the digital circuit that was intended to be implemented. In both of these cases, I question how practical the potentially outlandish solutions are for debuggability and reuse reasons.

2 replies

gsvic Apr 26, 2022

As @5hubh4m mentioned in a previous discussion, the difficulty of trying to not applying machine learning in random domains is impossible! And it looks like that this is the case for superoptimization as well, as in the following work they're using sequence-to-sequence models in order to relax the search complexity :)

The paper: https://arxiv.org/pdf/2109.13498.pdf

sampsyo Apr 26, 2022
Maintainer

Good find, @gsvic! One challenge I can imagine in this domain is that it could be hard to find an "intelligent" but slow program generator that outperforms a "dumb" one that can simply generate millions of candidate programs per second via brute force. Maybe you'd ideally want to combine aspects of both.

Also, I've referenced it many times elsewhere, but STOKE is a great example of a superoptimizer that takes a more "data-driven" approach. It's not ML exactly—the search primitive is MCMC.

susan-garry · 2022-04-26T03:38:50Z

susan-garry
Apr 26, 2022

I am not familiar with how these techniques are used in modern programs, but it seems like they could be quite useful for compiling short library functions (like abs) that will be frequently used and likely not need to be rewritten/recompiled for a long time. I'm curious how many instructions this sort of compilation could actually save for these types of programs, and how these methods have been used in practice. It sadly seems like the sort of technique that requires too much labor and computational power for most people to both with it outside of academia, but saving a few instructions on these functions might be significant in ML/scientific computing programs (?)

1 reply

sampsyo Apr 26, 2022
Maintainer

Right; I think it's critical to think about where the (sometimes extreme) cost of searching for optimal programs is actually useful… surely not on arbitrary programs, every time you type gcc or whatever! But maybe to optimize libm or, as you say, the primitives in PyTorch…

zzzDavid · 2022-04-26T05:16:15Z

zzzDavid
Apr 26, 2022

I think this paper is a delightful read. Finding the shortest instruction sequence that is functionally equivalent to the input program given an instruction set is indeed a challenging task. This paper limits its scope within simple instruction set and register operations to make search possible.

I find one of the applications very interesting: the author states that the superoptimizer can be leveraged to design RISC ISA. Intuitively, if you want to add a new instruction, if would be helpful to know that whether this new instruction can be replaced by a combination of existing ones.

I am not sure if such "super optimization" is possible on today's ISA, since both search algorithms and ISA have been developed extensively. If this approach is possible, the benefit of having the shortest stream-line instructions would be huge, especially where the processors are all pipelined.

1 reply

sampsyo Apr 26, 2022
Maintainer

One tricky aspect with today's hardware is that the shortest instruction sequence may be only indirectly related to the fastest program. See Ithemal, for example, for a data-driven approach to modeling performance for instruction sequences.

alaiasolkobreslin · 2022-04-26T05:18:22Z

alaiasolkobreslin
Apr 26, 2022

One thing I have been thinking about is how to decrease the size of the search space. I think it would be cool if the user could provide a sketch to the super optimizer- maybe the programmer has an idea of an ideal ordering of instructions but isn't sure about which arguments to pass. Then the super optimizer could find a satisfying program, or default to the techniques described in the paper if the sketch is unsatisfiable. But after looking at some of these optimized programs, I think it would be highly unlikely for a programmer to be able to come up with a sketch that would produce the shortest program (at least I would not have been able to come up with these on my own).

1 reply

sampsyo Apr 26, 2022
Maintainer

I suppose that's the tricky part of the trade-off! Synthesis/superoptimization can find surprising and unpredictable solutions humans would never think of, but only if they are given the (costly) freedom to do so.

tonyjie · 2022-04-26T05:23:53Z

tonyjie
Apr 26, 2022

It's a very interesting paper with concise and clear idea! Like most of you, I'm interested in how it is used in modern tools.

I would think that lots of high-performance C library function's assembly code implementation would be inspired by the (similar) idea of superoptimizers. Just like the author said, he himself is the author of printf and superoptimizers help him write a very efficient assembly implementation.

Of course, the long consuming time of exhaustive search limits the program size (only 13 machine instructions), but it can still be helpful when we already know these optimal design. We can store them in a table (with some pattern), and the peephole optimization could just search these entries in the table. When a function hit, we know that this is the shortest program.

1 reply

sampsyo Apr 26, 2022
Maintainer

Indeed! See the above discussion also about the use of superoptimziation in peephole optimization.

Also, FYI, despite the name on the PDF we have, the author is now known as Alexia Massalin and uses she/her pronouns

charles-rs · 2022-04-26T09:39:13Z

charles-rs
Apr 26, 2022

Honestly top tier paper right here. Massalin had an idea, and she communicated it with startling efficiency

1987 was a while ago; Moore (and Proebsting!) have been working hard for us since then, so I wonder what the limitations are now? The nature of exponential things mean it's probably not a lot better, but I wonder just how little of an effect this has? 15 machine instructions? Maybe 20? Seems like it might be applicable to more than it was when it was written

1 reply

sampsyo Apr 26, 2022
Maintainer

As far as scalability goes, it looks like STOKE-FP was able to synthesize faster versions of entire libm functions like sin and log, but without a hard optimality guarantee.

Also, please see my note above about Massalin's pronouns.

[Apr 26 Discussion] Superoptimizer: A Look at the Smallest Program #320

Replies: 11 comments · 22 replies

sampsyo Apr 23, 2022 Maintainer

sampsyo Apr 25, 2022 Maintainer

sampsyo Apr 25, 2022 Maintainer

JonathanDLTran Apr 25, 2022 Author

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

JonathanDLTran Apr 25, 2022 Author

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

sampsyo Apr 26, 2022 Maintainer

Replies: 11 comments 22 replies

sampsyo Apr 23, 2022
Maintainer

sampsyo Apr 25, 2022
Maintainer

sampsyo Apr 25, 2022
Maintainer

JonathanDLTran Apr 25, 2022
Author

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer

JonathanDLTran Apr 25, 2022
Author

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer

sampsyo Apr 26, 2022
Maintainer