Reading for 11/14: Synthesis-Aided Compilers #414

yxd97 · 2023-11-06T21:09:51Z

yxd97
Nov 6, 2023

Hello everyone! Here is the discussion for the Chlorophyll paper on 11/14. Chlorophyll is a synthesis-aided compiler targeting GreenArrays GA144, a low-power processor array of 144 F18A Cores. Each core runs the colorForth ISA. You can click on the links to read more about this machine.

MelindaFang-code · 2023-11-10T06:07:47Z

MelindaFang-code
Nov 10, 2023

I really like the idea that "Partitioning a program can be thought of as a type inference on the partition types." It is cool that simply by extending the type system with partition type, we would be able to do a task using existing algorithms. Another thing I'm wondering is why do the authors choose GA144, which is pretty distinct as it is stack based instead of register based. It is mentioned that it is really energy efficient but is it brought by its characteristic as being stack-based? It seems that this property has made the optimization more complex (e.g. many superoptimizer techniques are not applicable.

3 replies

yxd97 Nov 13, 2023
Author

I think so. The power consumption is proportional to the number of bits flipping per second. In both cases, the number of bits flipping on the same operation on the same set of operands are the same. However, the loding of the operands from the register file/stack is different. On a register-based core, the addresses can be anything, but on a stack-based core, the address can only be the previous one +1/-1.

collinzrj Nov 14, 2023

I agree with the point on the type system. It feels natural to treat partition as a type since only data on the same partition can be directly computed with each other, which is similar to the idea of types.

sampsyo Nov 14, 2023
Maintainer

FWIW I agree that "optimization as type inference" is an incredibly cool insight from this paper!

I think the main reason they targeted GA144 is basically that it was hardware that was actually available. You can imagine many different design choices in the details, but I don't think there were (and still mostly there still aren't) many other ultra-low-power manycores that one can actually buy and run programs on.

jdroob · 2023-11-13T16:08:01Z

jdroob
Nov 13, 2023

As someone who was unfamiliar with spatial architectures prior to reading this paper, I really enjoyed learning about an approach that differs greatly from the typical, register-based approach. One concept that stood out to me was how different steps of an operation such as x*y can take place in different partitions and, after QAP is solved, possibly on different cores. The partition annotation language feature was also really neat and it's interesting that Chlorophyll gives the programmer the control to assign data structures and code to specific partitions.

It's intriguing to see how synthesis can be applied to solve each of the sub-problems of partitioning, layout, code separation, and code generation and if I had to write code for the GA144 architecture, I'd be interested in finding ways to apply program synthesis too. Usually seeing a compiler implementation that results in 65% worse performance would seem like a red flag but the sense I get is that showing that a synthesis-aided compiler is in the ballpark of expert-written programs should be considered a success.

3 replies

ryanwmao Nov 13, 2023

I agree! The partition annotation language feature also stands out to me; I think it's a unique approach that empowers developers to optimize for specific architectures, potentially unlocking performance gains that might be challenging to achieve with traditional compilers. I'm curious about how this paradigm shift could inspire further innovations in compiler design and if there are specific scenarios where this synthesis approach outshines traditional optimization strategies. There seems to be a lot to consider regarding the balance between control given to programmers and the potential complexities introduced by this level of fine-grained optimization.

sampsyo Nov 14, 2023
Maintainer

Just as one single example of the very large space of later work on synthesis-based optimization for "interesting" architectures, I think the paper on Swizzle Inventor for GPUs is pretty neat:
https://dl.acm.org/doi/10.1145/3297858.3304059

jiahanxie353 Nov 14, 2023

Agreed, the partition part is really intriguing. The concept of distributing different operation steps across partitions, cores, and the flexibility provided by partition annotations, is a significant departure from register-based architectures. And this departure really broaden my view.

I'm curious about how this paradigm shift could inspire further innovations in compiler design and if there are specific scenarios where this synthesis approach outshines traditional optimization strategies.

Indeed, me as well. In particular, what stands out the most to me is that this approach can offer a new way of thinking about optimization tailored for specific architects, and that it can bring interesting possibilities for how compilers might be able to offer more granular control while still managing the complexity. It would be fascinating to see how this approach might excel in cases where traditional compilers fall short, especially in highly specialized or constrained computing capabilities.

willwng · 2023-11-13T21:10:04Z

willwng
Nov 13, 2023

Very interesting read. In a certain view I wonder if we can classify a "traditional compiler as a special-case of a program synthesizer in that it just so happens to have a very small search space (whereas a "traditional synthesizer" has a broad search space). The chlorophyll compiler with its super-optimizations sits somewhere between the two.

I was at first skeptical of the "comparable to highly-optimized expert-written programs" claim, since the gap between the expert-written programs were still notably larger than the gap between no-superopt and superopt. But this had me wonder, just how "good" are these expert programmers? Certainly they are writing better optimized code than I could - but we can't rely on specialized software developers trained on specific hardware. This almost circles back to the first problem, where the paper quotes "it may take a decade to build a mature compiler with optimizations for the target hardware."

4 replies

matth2k Nov 13, 2023

In a certain view I wonder if we can classify a "traditional compiler as a special-case of a program synthesizer in that it just so happens to have a very small search space (whereas a "traditional synthesizer" has a broad search space).

I think this comparison is a pretty useful mental exercise. Typical CPU cores have a lot of features that offload some of the responsibility of the compiler: dynamically scheduling for out-of-order execution, load-store queues, branch predictors, register renaming, caching, dynamic memory allocation, and more. CPUs provide a best-effort execution. In the case of more spatial architectures, the memory dependencies, control-flow, memory footprint all need to be modeled in the compilation and optimization problem more deterministically.

stephenverderame Nov 14, 2023

I also found their claim about their super-optimizer being comparable to the hand-written programs a bit of a stretch. I think the authors could have focused more on their 5th hypothesis, Chloyophyll being a big success for programming productivity. I think the big success of the paper is getting something that isn't too bad in a lot less time and effort than it would be for someone to learn the architecture and write the programs themself. As they mention, it took a grad student a whole summer and they only got through two out of the nine benchmarks.

sampsyo Nov 14, 2023
Maintainer

In general, the "expert baseline" is a tricky problem in any compilers paper that attempts to make the claim "we can do this as good/better with our compiler than a human could do with a lot of manual effort" (which, in some sense, is all of them). The authors have a burden of proof to show that their experts are at least reasonable, but there is no expert in the world at whom the concern cannot be leveled (how do we know they couldn't have done better if they just tried really, really hard?). So there is inevitably some trust involved.

rcplane Nov 14, 2023

Comparing synthesized program performance relative to hand-optimized with a difficult domain in a restricted operation set like a vectorized homomorphic encryption implementation in Porcupine where the researchers observe 51% speedup shows a domain where synthesis benefits.

obhalerao · 2023-11-14T01:17:27Z

obhalerao
Nov 14, 2023

I very much enjoyed reading this paper and specifically about learning more about spatial architectures; they were also new to me, and I found it interesting to learn more about their use cases. In addition, this paper provides insights about how program synthesis can be used in compilation tools themselves (through superoptimizations). This paper does restrict the use of program synthesis to this architecture, though; I wonder if there's the potential (or if there already exist) program synthesizers in more mainstream compilers for more widely-used languages (for instance, I could imagine a compiler inferring a precondition and postcondition for a certain code snippet, and then potentially synthesizing an efficient program to meet those conditions if the code snippet is small enough).

1 reply

sampsyo Nov 14, 2023
Maintainer

I posted another one elsewhere, but here is another paper in the very broad area of "synthesis-aided compilers." Souter is a synthesis-based superoptimizer for LLVM:
https://arxiv.org/abs/1711.04422

collinzrj · 2023-11-14T03:03:43Z

collinzrj
Nov 14, 2023

It's really interesting to see how different parts of Chlorophyll fits together. The constraints faced by Chlorophyll breaks many of our assumptions about writing a program: different cores don't have a shared memory, the size of code on each core is limited, each cores communicate with each other by passing messages to its neighbours... It's especially interesting to see that the physical distance between different cores matter in programming. Under such a different computation model, we have to rethink about the optimizations we can make, and we also need to rethink what's an efficient algorithm.

2 replies

zachary-kent Nov 14, 2023

I'd also argue that many of these assumptions no longer hold in general during the era of heterogenous architectures. For example CUDA and other languages for GPGPU programming operate under a distributed memory model, where the CPU and GPU have that own memory requiring explicit communication through message passing. Code size on accelerators can also be fairly limited.

sampsyo Nov 14, 2023
Maintainer

This is a good point, and to some degree even on plain ol' multicore CPUs, physical distance is starting to matter. That is, there is no practical way to scale a multicore's core count without exposing some NUMA effects.

keikun555 · 2023-11-14T03:09:17Z

keikun555
Nov 14, 2023

I wonder why STOKE "did not translate well from a register-based system to the stack-based GA144"

Is it because it's more difficult to keep track of memory?

1 reply

sampsyo Nov 14, 2023
Maintainer

It's interesting that this statement implies (but does not directly state) that they actually tried it. That is, that they hacked STOKE to manipulate the GA144 ISA and discovered that it just didn't find very good programs. I'm not sure they actually did this, but if they did, I too would love to read more about this negative result!

keikun555 · 2023-11-14T03:31:44Z

keikun555
Nov 14, 2023

It wasn't clear to me how they chose their benchmarks.

I see they chose from multi core and single core programs, though the sampling methodology wasn't given to us.

2 replies

bcarlet Nov 14, 2023

I agree that the benchmarks seem lacking; my impression was that they might simply not have had many benchmarks to chose from for such a niche architecture? At least for the comparison to expert-written programs, they mentioned that there was only one multicore application published on the GreenArrays website (here, I think) that they could compare to, and they mentioned that a grad student working with the architecture only produced two programs over the course of a summer.

sampsyo Nov 14, 2023
Maintainer

I agree that this seems like a correct inference about the backstory.

AliceSzzze · 2023-11-14T05:17:08Z

AliceSzzze
Nov 14, 2023

This paper was eye-opening in that it uses solutions I didn't know about to solve problems that I hadn't thought about. For instance, data and other programming constructs have to be partitioned in GA. Given a partially annotated or unannotated program, chlorophyll can represent communication count and partition space with a formula in terms of symbolic variables, which becomes a constraint for Rosette's back-end solver, which iteratively finds a solution with a decreasing upper bound on communication count until it cannot find a solution. This enables chlorophyll to accept partial or no manual annotation,

based on the philosophy that the programmer and compiler generally have different strengths and that we should let the programmer provide high-level insights to help the compiler. This makes our synthesizer more scalable.

I thought this ties in nicely with the discussion of programmability. I also imagine that fully manually annotated programs might be unmaintainable and unportable if the architecture (e.g. storage per core) could change, rendering partition annotations invalid.

2 replies

zachary-kent Nov 14, 2023

Yeah; maybe the storage per core could be baked into the syntax of type annotations

sampsyo Nov 14, 2023
Maintainer

FWIW, I also think this "human collaboration" aspect of the type-based formulation is especially compelling. That is, it seems to provide an extremely elegant way to let the programmer write down whatever they know and let the synthesizer fill in the rest. The fact that this happens at the level of types seems more tidy than ordinary Sketch-like synthesis, where this kind of division can only happen at the level of syntax. It's also something I haven't seen reused much; there must be more interesting work here to be done about human/synthesizer collaboration!

xalbt · 2023-11-14T06:59:35Z

xalbt
Nov 14, 2023

I really enjoyed this paper! I was really intrigued by the problems that this architecture presented and the ways the authors chose to address them through the many phases of Chlorophyll. The amount of information the authors were able to pack in to 7 pages was also really impressive! When reading the paper, I found myself thinking a lot about the engines behind declarative languages like SQL and graphQL and program synthesis like that we discussed in class. These engines "compile" high-level specifications (e.g. a SQL query) into efficient, lower-level "instructions" (e.g. the series of operations executed by a SQL engine), and this optimized path is chosen from many possible paths. Then, at the high level, both program synthesis and these engines both search over a large space to find an optimized solution to satisfy some requirements. However, while SQL engines and the like use relatively inexpensive heuristics and algorithms to optimize a query, program synthesizers rely on expensive SAT solvers. Is it possible that some program synthesis problems can be solved without SAT solvers or is the problem space only amenable to SAT solvers? Perhaps the "Partitioning Synthesizer" could be replaced with a faster heuristics-based algorithm, maybe based off of an existing one from a processor that distributes work to resource-constrained cores.

3 replies

zachary-kent Nov 14, 2023

Optimizing SQL is also fundamentally easier because it's not Turing-complete. I would also be wary of using a heuristic for partitioning; heuristics are good when occasional worst-case results are okay, but this isn't really the case with partitioning. It would be frustrating from a programmer's perspective if the heuristic failed to successfully partition a program that the synthesizer could.

sampsyo Nov 14, 2023
Maintainer

Yeah! And to be clear, the dominant paradigm in databases is to use heuristics. Infusing some fun compilersy ideas like this into query planners/optimizers seems like a fruitful avenue for research.

Some work in this area I'm aware of:

emwangs Nov 14, 2023

I was also reminded of database query languages when reading this paper, especially as they did implement their own heuristic partitioner, and compared it to the partitioning synthesizer. In particular, there is a lot of discussion today on transitioning a lot of large DB query optimizers from heuristic to something that is more like dynamic programming. It seems that the heuristic partitioner in Chlorophyll required specific params to each program, which does make it far less robust than in comparison, heuristic-based query optimizers which always work -- but I found the comparison between the two quite useful! Especially as someone not too familiar with the program synthesis space.

zachary-kent · 2023-11-14T14:17:43Z

zachary-kent
Nov 14, 2023

I do wonder if a 65% slowdown compared to hand-written programs is acceptable for the intended applications of GA144. I'd imagine that the hardware is fundamentally pretty slow due to the high synchronization cost and that every bit of performance is valuable, especially in embedded domains where the program might be running for years.

2 replies

sampsyo Nov 14, 2023
Maintainer

This is related to a different thread on here about what makes a good "expert baseline" in the first place. In the same way that it's hard to say what makes an implementation expert enough, it is also really really hard to say what counts as close enough to such an expert implementation. All we can say for sure is that Chlorophyll "should" be better than an unoptimized/heuristic-based baseline and worse than expert-written code. It's hard to say much about what we hope to see beyond that!

The ideal, of course, is that we could somehow translate these disparate quantities—programming effort and performance—into a common unit, such as dollars. This would hypothetically answer questions about how much effort you're willing to expend to get how much performance. But that itself seems really hard... and this is not just about synthesis-based compilers but kinda a problem that affects all optimizing compilers.

bennyrubin Nov 14, 2023

My first impression is that it would be acceptable, especially given the ease of writing programs with the compiler compared to by hand. It seems that whoever uses this architecture is more concerned with power consumption than raw performance.
It was especially interesting to me that claim that it could take a decade to write a fully optimized compiler and by that point the target will have moved and it would no longer be relevant. My takeaway from this is that a 65% slowdown is by far the best we're going to get (using synthesis) than we would be able to get with a conventional compiler in the same amount of development time, so it's not like there is a better option, unless you'd want to write everything by hand. I also liked the point about how the quick iterations of writing programs could lead to experimenting with different amounts and types of parallelism, allowing a programmer to make more performant code in much less time. Perhaps they could use the synthesis aided compiler to find the right structure for the concurrency/partitioning and then further optimize by hand to bridge the 65% gap.

vivianyyd · 2023-11-14T14:51:23Z

vivianyyd
Nov 14, 2023

This paper kind of makes me realize how broadly the term "program synthesis" applies. Steps 1 and 2 use a solver to fill in type annotations or infer allocation of resources to minimize communication. This sounds a lot like the research project I've been working on, where types include labels denoting security requirements and the compiler can infer weakest necessary labels by using a solver. The key difference I see is that in this system, the user is allowed to leave things underspecified (where the synthesis comes in), whereas with more classic type inference any ambiguity must be resolved.

Step 4 feels like the most "classic" use of program synthesis to me.
I understand why the authors refer to the compiler as applying multiple small synthesis steps; the first two steps are indeed solving for program holes based on a specification.

The point brought up above about how compilation might even be seen as program synthesis with a significantly smaller search space due to more detailed specification made me think about this - inferring these type labels to me almost feels somewhere between "compilation", where the specifications impose quite a restriction on the search space, and synthesis problems like programming by example or functional synthesis with no syntax guidance, where the search space is nearly the space of all programs.
But perhaps this feeling is not quite right, because in the case of this paper, it's not that the search space is restricted by the specifications from a particular instance of the problem (as in compilers), it's that the search space is small to begin with.
In any case, perhaps labels for what "type" of work or problem is being solved are not important; I feel my eyes have been opened to how these subfields of applied PL are not only closely interlinked but sometimes not even easy to distinguish.

0 replies

alifarahbakhsh · 2023-11-14T15:28:35Z

alifarahbakhsh
Nov 14, 2023

The main point that came to my mind is the similarity between the partitioning task the paper is trying to solve, and the much more general task of partitioning a computation between a number of servers. Different criteria become important in this partitioning, e.g., performance and security. Interestingly, some solutions to the security concerns of such a partitioning also use type systems (via information flow) to ensure that wrong servers do not see private data. Researchers have realized that there are information-theoretic limits to the task of partitioning computation while minimizing communication, and drawing the analogy would help us to come up with a theoretical limit against which the performance of the synthesizer can be evaluated.

0 replies

Reading for 11/14: Synthesis-Aided Compilers #414

Replies: 12 comments · 23 replies

yxd97 Nov 13, 2023 Author

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

sampsyo Nov 14, 2023 Maintainer

Replies: 12 comments 23 replies

yxd97 Nov 13, 2023
Author

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer

sampsyo Nov 14, 2023
Maintainer