Reading for 11/28 - MLIR: A Compiler Infrastructure for the End of Moore's Law #419

jdroob · 2023-11-25T17:43:59Z

jdroob
Nov 25, 2023

Hi everyone! Here is the discussion for the MLIR paper on 11/28. MLIR is a novel approach to building reusable and extensible compiler infrastructure. MLIR was developed as a solution to the problem of modern high level languages often building their own high level IRs in front of LLVM. In doing so, many of the same kinds of technologies were reinvented across different high level IRs. Other motivations for the development of MLIR included improving the compilation for heterogeneous hardware and reducing the cost of building compilers for DSLs. What do you think of MLIR as a solution to these "post-Moore's Law" challenges?

rcplane · 2023-11-27T18:54:08Z

rcplane
Nov 27, 2023

MLIR seems to be finding success as a solution to these "post-Moore's Law" challenges of being useful to target heterogeneous hardware and building compilers for DSLs. Approaching the problem from the startup-like approach of building a platform for others to build upon, MLIR saw adoption from leading ML companies re-writing ML framework backends as MLIR dialects, namely OpenAI's TritonGPU dialect. Recent funding for the Mojo language builds on this success leveraging MLIR as backend to more easily surface performance optimizations and heterogeneous hardware auto-tuning to more high level application developers.

5 replies

MelindaFang-code Nov 28, 2023

I was first a bit confused as why the author mentioned "End of Moore's Law" in the title but didn't mention it else where in the paper, which makes it seems a bit meaningless and just to make the article eyecatching. Now this makes much more sense to me. So essentially the "End of Moore's Law" here means hardware accelerations brought by TPUs and CPUs and also parallel computing, and that explains the motivation as why we can't just use LLVm as we need more generic infrastructure.

bcarlet Nov 28, 2023

On the "End of Moore's Law" characterization, I do agree that that seems like just one application among many that seemed to be arbitrarily chosen for the title. There are of course plenty of uses for an IR spanning multiple levels of abstraction that have nothing to do with heterogeneous hardware, e.g., polyhedral compilation and Rust's borrow checking.

he-andy Nov 28, 2023

I too thought the "End of Moore's Law" was a surprising way to present this paper. We learned earlier in the class that compiler improvements have only sped up code by around ~3% yearly, which (even now, although less so) have generally been dwarfed by improvements in computer architecture and CPU design. Especially as we are beginning to reach the limits of general purpose CPU, I wonder where infrastructures like MLIR can take us, especially as more and more attention is being given to ASICs. I also wonder if we will ever see widespread proliferation of von Neumann-esque programming languages enabled by tools like MLIR and CIRCT.

sampsyo Nov 28, 2023
Maintainer

Yeah, it's kind of a clever trick that the paper simply "name-drops" that concept in the title but leaves it to the reader to draw the connection!

jdroob Nov 28, 2023
Author

Agreed that the title is a bit provocative. I know one of the big selling points of MLIR over LLVM is that MLIR is sufficiently customizable to target an array of hardware targets, whereas from what I understand LLVM is an infrastructure that was originally designed to target CPUs. I know there has been work on things like LLVM AMDGPU backend but my guess is that it's easier to write code for heterogeneuous hardware targets if you're using MLIR than it is with LLVM.

zachary-kent · 2023-11-28T01:42:45Z

zachary-kent
Nov 28, 2023

I have always been a huge fan of MLIR due to the pains that one endures when compiling functional language features to LLVM IR. As noted in the paper, LLVM really is a backend designed for C-like languages--integrating dispatch and pattern matching are pretty difficult. This MLIR-based functional IR demonstrates how MLIR might address these issues. I'd be super excited to see someone come up with an MLIR-based IR for lazy languages.

3 replies

bennyrubin Nov 28, 2023

I think the fundamental ideas in MLIR's design philosophy of keeping high level semantics around until they are not needed is key here. This is especially true for functional languages or languages like rust, where a traditional IR (like LLVM), does not allow for a lot of the important semantic information you could benefit from in the higher level language. MLIR came at the perfect time where there is a huge influx of DSLs that are targeting heterogenous hardware, so I'm also excited to see where they are able to take it in the future!

evanmwilliams Nov 28, 2023

I agree! Going straight to LLVM seems like a bit of a stretch - lowering done in phases definitely seems like the way to go. In fact, the Rust compiler switched from just having two IRs to having a third IR (called MIR) a little bit ago. There's an interesting article on it here. The key point is that certain compiler features that are crucial in Rust (like the borrow checker) were a pain to get right with only the jump from HIR to LLVM, so this new MIR can allow for more efficient borrow checking, faster compilation, and better optimizations. I'm curious if the introduction of MLIR would have changed the way some of these other programming languages would have been implemented. I guess it's precisely the pain in implementing languages like Rust that motivated the need for MLIR in the first place.

jdroob Nov 28, 2023
Author

I think Rust, Swift, Julia and other modern, high-level languages all had similar issues which was at least one of the primary motivations behind MLIR. I'm interested in finding out what Rust Compiler Engineers think of MLIR. Specifically, I wonder if all of the optimizations that can be made using the Rust Source Code -> HIR -> MIR -> LLVM IR -> Machine Code pipeline can also be made when MLIR replaces the 'HIR->MIR->LLVM IR' piece.

obhalerao · 2023-11-28T01:44:11Z

obhalerao
Nov 28, 2023

As someone with past experience with ML and is interested into diving into the world of ML compilers, this was quite an interesting paper to read. Intuitively, it makes sense that LLVM can be at too low of a level to immediately be useful as an IR for many programming languages. As such, providing a unified general framework for higher-level IRs that eventually can be taken down to LLVM seems incredibly useful, and seems of a similar philosophy to the unified frameworks for garbage collection and dataflow analysis we discussed previously (though this seems far more useful).

2 replies

ryanwmao Nov 28, 2023

I agree! MLIR's approach to addressing the limitations of LLVM by offering higher-level IRs aligns well with the need for a more abstract and versatile framework. Its ability to cater to different levels of abstraction and eventually map down to LLVM seems promising, essentially serving as a versatile "middle ground" that supports the diversity of different programming languages with the power and efficiency of LLVM.

jdroob Nov 28, 2023
Author

It certainly seems like a useful platform for the use case of providing enough flexibility / combinations of dialects to be able to serve as an IR for very high level programming ( e.g. when you're operating at the Machine Learning level of abstraction). It's been over 3 years since this paper came out and from what I can tell, MLIR has been widely adopted. I wonder how easy it is to implement the source-level optimizations in MLIR versus the custom, high-level IRs discussed in the paper (e.g. SIL IR, MIR IR, Julia IR). If it's too much of a pain point, I could see MLIR not being used in the long term.

keikun555 · 2023-11-28T03:46:58Z

keikun555
Nov 28, 2023

In the conclusion the authors write

we are eager to see how established compiler communities (e.g. the Clang C and C++ compiler) as well [as] domain experts can benefit from the introduction of higher level, language-specific IRs

It's only been 3 years since this paper but I wonder how much the community has embraced this idea.

5 replies

zachary-kent Nov 28, 2023

I'd say very much so!

SanjitBasker Nov 28, 2023

That's really cool! I was reading the section about optimization interfaces (in short, how the optimizer allows you to "borrow" its optimization pass if you give it some info on how to treat your new operation/dialect) and I thought that this was a really promising idea for making the IR infrastructure more approachable.

sampsyo Nov 28, 2023
Maintainer

While CIR is awesome, I do want to point out that it is not yet adopted in Clang! That will be a glorious day when it happens. (Notably, the other frontend-specific IL that undoubtedly inspired the need for MLIR, SIL, also remains independent/non-MLIR-based so far.)

jdroob Nov 28, 2023
Author

Why are these ILs non-MLIR based 3 years later? Is this due to the engineering resources that would be required to integrate CIR into Clang / create an MLIR-based SIL? Would there be a performance dropoff by converting to an MLIR-based IL?

zachary-kent Nov 28, 2023

Well, the AST -> ClangIR and ClangIR -> LLVM IR translations just aren't complete yet. IMO it would not be a good use of engineering effort to replace SIL with an MLIR based IR--that would likely require many, many years of work for little benefit. Also, because the Swift compiler is migrating to an on-demand compilation model, some of the benefits offered by MLIR like parallel compilation are not so relevant. More importantly, it also runs against the work towards implementing more of the Swift compiler in Swift.

yxd97 · 2023-11-28T03:58:09Z

yxd97
Nov 28, 2023

MLIR aims to address the challenges brought by heterogeneity by building an extremely extensible compiler infrastructure. It seems to me that MLIR can do everything related to programming. However, one key part of heterogeneous system that is not well discussed in the paper is the orchestration of different compute units (CPUs, GPUs, accelerators, ...); inferring the scheduling and data movements may exceed the scope of a traditional compiler. I think MLIR has the potential to combine compiler technologies (tracing, program synthesis, ...) and eventrally solve the problem.

4 replies

matth2k Nov 28, 2023

inferring the scheduling and data movements may exceed the scope of a traditional compiler

Right. That is the hard part. MLIR makes it much less costly to build new compilers, but there are still a lot of unsolved research questions in this domain. CIRCT has some infrastructure for scheduling, but it is limited. As for data movement, I agree that is really hard to model in a unified way for heterogenous systems.

xalbt Nov 28, 2023

When running compiled PyTorch graphs on vectorized CPUs, GPUs, and accelerators, the graphs representing a model are usually linearized and compiled further into a sequence of primitive operations that are tied to (usually hand-written) kernels. These hand-written kernels do usually set up the memory model for the computation and assign compute units to a task. Since these operations are executed sequentially, the data movements and scheduling are basically implicit. I think this might be a reason why the paper doesn't talk much about orchestrating the different compute units. However, I agree very much that having these capabilities would be really beneficial. Something that could benefit a lot from this could be Meta's KNYFE, which, in short, is a DSL that compiles to performant kernels for their 64-processing element AI accelerator chip. These have to take into account the efficient memory streaming model of the processor, the different computation units of each element, and flow of data between the elements.

sampsyo Nov 28, 2023
Maintainer

There are some efforts in the MLIR ecosystem to address this heterogeneity problem: see, for example, the async dialect and, to some extent, the gpu dialect, which includes not only the "content" of the kernels but also the launching of GPU kernels from the CPU.

jdroob Nov 28, 2023
Author

That's a great point that the authors did not specifically address how the heterogeneity problem would be solved - just that with MLIR, this problem can be solved. Prior to reading this paper, I was under the impression that MLIR was a solution to the heterogeneity problem but now I know that MLIR provides a platform to implement a solution.

keikun555 · 2023-11-28T04:06:35Z

keikun555
Nov 28, 2023

I was wondering about DSLs that allow users to write DSLs. Is there some result that talks about the expressivity of DSLs with respect to the language it is written in? I assume it can't be as expressive was the parent language since then there would be a contradiction on the expressivity of the parent language.

5 replies

zachary-kent Nov 28, 2023

I think "expressive" is fairly subjective here. For example, you could imagine implementing an embedded DSL for JSON parsing in Haskell, which would probably allow for more "expressive" code than an ad-hoc approach. I think the key here is that expressivity and the power of a language's model of computation are distinct.

yxd97 Nov 28, 2023

I think that's why they are called domain specific languages. DSLs make it easier to program some applications but harder for others.

sampsyo Nov 28, 2023
Maintainer

This may not be what you're after, but this is basically the goal of Racket's "language-oriented programming" approach.

jdroob Nov 28, 2023
Author

So would the idea here be that the original DSL would be used to write a program to capture user requirements and then create the implementation language according to a set of "expressiveness" requirements? So the final result should be a DSL that's "more expressive" or "less expressive" than the original based on the user requirements?

sampsyo Nov 28, 2023
Maintainer

Here's another citation that kinda embodies what you're looking for: a DSL for expressing MLIR dialects!
https://dl.acm.org/doi/abs/10.1145/3519939.3523700

collinzrj · 2023-11-28T04:11:09Z

collinzrj
Nov 28, 2023

When operating on secure code such as cryptographic protocols or algorithms operating on privacy-sensitive data, the compiler often faces seemingly redundant or cumbersome computations that embed a security or privacy property

This point is really interesting. This also reminds me of what we discussed last lecture about how compiler optimizations make it impossible to implement threading as a library. Such "WYSINWYX" property makes it really hard to make sure if a piece of code is secure.

2 replies

vivianyyd Nov 28, 2023

I also noticed this quote. While reading this paper I was reminded of my current project, which works with cryptography and employs multiple IRs. One of the main challenges I'm currently facing is lowering from one IR to another while preserving high-level security properties. It was neat that this problem was also mentioned in the paper.
I'm intrigued by how MLIR attempts to capture a variety of high-level structures while still being unifying. I get the sense that being able to capture all languages in a high-level yet unifying way is useful for applications beyond compiler implementation. One example that comes to mind is program synthesis - if we can describe an extensible set of target languages in a unifying way, we can perhaps synthesize programs in all of them without needing to implement new synthesizers for each.

AliceSzzze Nov 28, 2023

I also found the secure compilation point interesting and I didn't know about it before reading the paper. I found a nice little example from WYSINWYX: What You See Is Not What You eXecute, for anyone else who might want to know what a compiler-induced vulnerability looks like.

Consequently, analyses that are performed on source code can fail to detect certain bugs and vulnerabilities due to the WYSINWYX phenomenon: “What You See Is Not What You eXecute”. The following source-code fragment, taken from a login program, illustrates the issue [27]:
memset(password, ‘\0’, len);
free(password);
The login program temporarily stores the user’s password—in clear text—in a dynamically allocated buffer pointed to by the pointer variable password. To minimize the lifetime of the password, which is sensitive information, the code fragment shown above zeroes-out the buffer pointed to by password before returning it to the heap. Unfortunately, a compiler that performs useless-code elimination may reason that the program never uses the values written by the call on memset, and therefore the call on memset can be removed—thereby leaving sensitive information exposed in the heap. This is not just hypothetical; a similar vulnerability was discovered during the Windows security push in 2002 [27]. This vulnerability is invisible in the source code; it can only be detected by examining the low-level code emitted by the optimizing compiler.

willwng · 2023-11-28T06:16:37Z

willwng
Nov 28, 2023

A small detail but I found interesting that, when listing MLIR's motivations, the authors' first issue with current infrastructures was "poor error messages." It's nice knowing there are lots of these smaller quality-of-life improvements (i.e., not necessarily performance-related) happening for languages. I think a good reason why people like is because of how clear the messages are. On the other hand I found Julia rather challenging to learn at first since the error messages were mostly just stack trace dumps (cite: Pragmatic Programmer on error messages).

3 replies

matth2k Nov 28, 2023

Error messages are still unhelpful a lot of the times, just because of C++ memory bugs though. So the learning curve is still really rough.

sampsyo Nov 28, 2023
Maintainer

Helpful error messages: the greatest unsolved problem in compilers!

jiahanxie353 Nov 28, 2023

It's true that error messages are still not that super helpful, MLIR can isolate a pass that is buggy using -mlir-print-ir-before-all. Using Op emitRemark(), emitWarning(), emitError() functions are also useful to inspect what's going, they can also print out the location of the Op. Although they are not super helpful, they are still better than llvm:errs().

matth2k · 2023-11-28T08:20:22Z

matth2k
Nov 28, 2023

I think its fair to say that the multi-layered approach is the crux of MLIR: we get a compiler infrastructure that is very good at mixing different levels of abstraction, and that generally makes implementing optimization passes more idiomatic. We can now optimize around higher-level semantics, instead of trying to extract it from a CFG. For example, extracting natural loops from CFG analysis in LLVM versus affine.for and scf.for in MLIR. However, I definitely think that MLIR in some cases only moves the complexity elsewhere, instead of reducing it.

For example, I think final code generation can be made difficult by MLIR, because you potentially have more Op types to lower and arbitrary Op nestings to deal with. On the other hand, LLVM CFG's provide a uniform way to generate code. This is purely a software engineering problem, not a limitation of MLIR IR itself. But I do think its a fair problem to point out that needs to be fixed eventually. There are so many dialects, and it's hard to manage so many interfaces to fight code duplication.

2 replies

sampsyo Nov 28, 2023
Maintainer

Thanks for bringing some counterbalancing pessimism to the discussion; these are reasonable criticisms!

jiahanxie353 Nov 28, 2023

It's a valid concern. Mostly, MLIR has to go through LLVM dialect for code generation. The process of lowering all the Ops is complicated and likely to have code duplication.
One thing I found really interesting is people are working on ISA dialects. This presentation has x86 as its target.

emwangs · 2023-11-28T14:37:43Z

emwangs
Nov 28, 2023

I really enjoyed this paper, as it read almost like a design document. One think that really struck me as impressive about this was MLIR's possible contributions to HPC and ML. MLIR is flexible enough to support different data models, and can infer specific "Op" implementations from particular data shapes, which is of course very important in ML computations. I thought this was cool because looking at CUDA code -- it seems like creating a generalizable framework that can support something like that is kind of a beautiful abstraction. Also because we learned in class how parallelism and concurrency is poorly supported in many compilers, but also knowing that it is very important for HPC -- it seemed really promising how MLIR offers the ability to represent parallel constructs as first-class ops, which could possibly simplify the modeling and optimization of parallelism in ML and HPC applications.

1 reply

jdroob Nov 29, 2023
Author

MLIR's nesting capabilities certainly made it easier to represent parallel constructs. This talk by one of MLIR's core developers touches on how, like Deep Learning, there are many opportunities to use MLIR in the HPC domain.

stephenverderame · 2023-11-28T14:43:28Z

stephenverderame
Nov 28, 2023

The DRR DSL kind of reminded me a bit of the Alive paper we read. The more general expressivity of MLIR's rewrite system seems like it could pose quite a challenge to any kind of verification, however, it would be really cool if we see such a system in the future, at least to a certain extent. There's also the trouble of handling the extensibility of operations and dialects, but like the optimization interfaces, I'd imagine there would need to be an opt-in way of providing formal semantics for custom operations.

It's interesting to observe the return of nested IRs. Their motivations for doing so make perfect sense, and the gradual lowering approach and ability to define passes at different abstraction layers seems like a great way to get the best of both worlds. Yet it still feels... strange? going against more recent trends for flat IRs. It makes sense, MLIR is trying to be one infrastructure for all levels of IRs, not just the middle-end where it seems flat representations have become popular. But regardless, I was reminded of this line in the Bril docs

Bril is an instruction-oriented language, like most good IRs.

2 replies

jdroob Nov 28, 2023
Author

Moving away from flat IRs does feel like we're going backward, yet it also feels like an obvious solution after reading the paper. Prior to reading this paper, I only considered a fixed number of lowering steps in the compilation process, rather than an arbitrary number of lowering steps depending on the domain, dialects used, etc. I think Progressive Lowering and Dialects are the 2 most immediately useful features that MLIR offers since, in tandem, these features actually seem like a solution to the software fragmentation problem.

zachary-kent Nov 28, 2023

I also don't think the adoption of MLIR necessarily means we are moving away from flat IRs--it just allows for the introduction of IRs that are somewhere in-between. Additionally, flattened IRs can still be a conceptual tree even if their in-memory layout is "flat"; see Adrian's blog post.

Reading for 11/28 - MLIR: A Compiler Infrastructure for the End of Moore's Law #419

Replies: 11 comments · 34 replies

sampsyo Nov 28, 2023 Maintainer

jdroob Nov 28, 2023 Author

jdroob Nov 28, 2023 Author

jdroob Nov 28, 2023 Author

sampsyo Nov 28, 2023 Maintainer

jdroob Nov 28, 2023 Author

sampsyo Nov 28, 2023 Maintainer

jdroob Nov 28, 2023 Author

sampsyo Nov 28, 2023 Maintainer

jdroob Nov 28, 2023 Author

sampsyo Nov 28, 2023 Maintainer

sampsyo Nov 28, 2023 Maintainer

sampsyo Nov 28, 2023 Maintainer

jdroob Nov 29, 2023 Author

jdroob Nov 28, 2023 Author

Replies: 11 comments 34 replies

sampsyo Nov 28, 2023
Maintainer

jdroob Nov 28, 2023
Author

jdroob Nov 28, 2023
Author

jdroob Nov 28, 2023
Author

sampsyo Nov 28, 2023
Maintainer

jdroob Nov 28, 2023
Author

sampsyo Nov 28, 2023
Maintainer

jdroob Nov 28, 2023
Author

sampsyo Nov 28, 2023
Maintainer

jdroob Nov 28, 2023
Author

sampsyo Nov 28, 2023
Maintainer

sampsyo Nov 28, 2023
Maintainer

sampsyo Nov 28, 2023
Maintainer

jdroob Nov 29, 2023
Author

jdroob Nov 28, 2023
Author