forked from GPUOpen-Drivers/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes Tags: #mlir Differential Revision: https://reviews.llvm.org/D83527
- Loading branch information
1 parent
553dbb6
commit c20c196
Showing
1 changed file
with
328 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,328 @@ | ||
# MLIR Python Bindings | ||
|
||
Current status: Under development and not enabled by default | ||
|
||
|
||
## Building | ||
|
||
### Pre-requisites | ||
|
||
* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to | ||
be located by CMake. | ||
* A relatively recent Python3 installation | ||
|
||
### CMake variables | ||
|
||
* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` | ||
|
||
Enables building the Python bindings. Defaults to `OFF`. | ||
|
||
* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` | ||
|
||
Links the native extension against the Python runtime library, which is | ||
optional on some platforms. While setting this to `OFF` can yield some greater | ||
deployment flexibility, linking in this way allows the linker to report | ||
compile time errors for unresolved symbols on all platforms, which makes for a | ||
smoother development workflow. Defaults to `ON`. | ||
|
||
* **`PYTHON_EXECUTABLE`**:`STRING` | ||
|
||
Specifies the `python` executable used for the LLVM build, including for | ||
determining header/link flags for the Python bindings. On systems with | ||
multiple Python implementations, setting this explicitly to the preferred | ||
`python3` executable is strongly recommended. | ||
|
||
|
||
## Design | ||
|
||
### Use cases | ||
|
||
There are likely two primary use cases for the MLIR python bindings: | ||
|
||
1. Support users who expect that an installed version of LLVM/MLIR will yield | ||
the ability to `import mlir` and use the API in a pure way out of the box. | ||
|
||
2. Downstream integrations will likely want to include parts of the API in their | ||
private namespace or specially built libraries, probably mixing it with other | ||
python native bits. | ||
|
||
|
||
### Composable modules | ||
|
||
In order to support use case #2, the Python bindings are organized into | ||
composable modules that downstream integrators can include and re-export into | ||
their own namespace if desired. This forces several design points: | ||
|
||
* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` | ||
global constructor. | ||
|
||
* Introduce headers for C++-only wrapper classes as other related C++ modules | ||
will need to interop with it. | ||
|
||
* Separate any initialization routines that depend on optional components into | ||
its own module/dependency (currently, things like `registerAllDialects` fall | ||
into this category). | ||
|
||
There are a lot of co-related issues of shared library linkage, distribution | ||
concerns, etc that affect such things. Organizing the code into composable | ||
modules (versus a monolithic `cpp` file) allows the flexibility to address many | ||
of these as needed over time. Also, compilation time for all of the template | ||
meta-programming in pybind scales with the number of things you define in a | ||
translation unit. Breaking into multiple translation units can significantly aid | ||
compile times for APIs with a large surface area. | ||
|
||
### Submodules | ||
|
||
Generally, the C++ codebase namespaces most things into the `mlir` namespace. | ||
However, in order to modularize and make the Python bindings easier to | ||
understand, sub-packages are defined that map roughly to the directory structure | ||
of functional units in MLIR. | ||
|
||
Examples: | ||
|
||
* `mlir.ir` | ||
* `mlir.passes` (`pass` is a reserved word :( ) | ||
* `mlir.dialect` | ||
* `mlir.execution_engine` (aside from namespacing, it is important that | ||
"bulky"/optional parts like this are isolated) | ||
|
||
In addition, initialization functions that imply optional dependencies should | ||
be in underscored (notionally private) modules such as `_init` and linked | ||
separately. This allows downstream integrators to completely customize what is | ||
included "in the box" and covers things like dialect registration, | ||
pass registration, etc. | ||
|
||
### Loader | ||
|
||
LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with | ||
other non-trivial native extensions. As such, the native extension (i.e. the | ||
`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol | ||
(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` | ||
and siblings which loads and re-exports it. This split provides a place to stage | ||
code that needs to prepare the environment *before* the shared library is loaded | ||
into the Python runtime, and also provides a place that one-time initialization | ||
code can be invoked apart from module constructors. | ||
|
||
To start with the `mlir/__init__.py` loader shim can be very simple and scale to | ||
future need: | ||
|
||
```python | ||
from _mlir import * | ||
``` | ||
|
||
### Limited use of globals | ||
|
||
For normal operations, parent-child constructor relationships are realized with | ||
constructor methods on a parent class as opposed to requiring | ||
invocation/creation from a global symbol. | ||
|
||
For example, consider two code fragments: | ||
|
||
```python | ||
|
||
op = build_my_op() | ||
|
||
region = mlir.Region(op) | ||
|
||
``` | ||
|
||
vs | ||
|
||
```python | ||
|
||
op = build_my_op() | ||
|
||
region = op.new_region() | ||
|
||
``` | ||
|
||
For tightly coupled data structures like `Operation`, the latter is generally | ||
preferred because: | ||
|
||
* It is syntactically less possible to create something that is going to access | ||
illegal memory (less error handling in the bindings, less testing, etc). | ||
|
||
* It reduces the global-API surface area for creating related entities. This | ||
makes it more likely that if constructing IR based on an Operation instance of | ||
unknown providence, receiving code can just call methods on it to do what they | ||
want versus needing to reach back into the global namespace and find the right | ||
`Region` class. | ||
|
||
* It leaks fewer things that are in place for C++ convenience (i.e. default | ||
constructors to invalid instances). | ||
|
||
### Use the C-API | ||
|
||
The Python APIs should seek to layer on top of the C-API to the degree possible. | ||
Especially for the core, dialect-independent parts, such a binding enables | ||
packaging decisions that would be difficult or impossible if spanning a C++ ABI | ||
boundary. In addition, factoring in this way side-steps some very difficult | ||
issues that arise when combining RTTI-based modules (which pybind derived things | ||
are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). | ||
|
||
|
||
## Style | ||
|
||
In general, for the core parts of MLIR, the Python bindings should be largely | ||
isomorphic with the underlying C++ structures. However, concessions are made | ||
either for practicality or to give the resulting library an appropriately | ||
"Pythonic" flavor. | ||
|
||
### Properties vs get*() methods | ||
|
||
Generally favor converting trivial methods like `getContext()`, `getName()`, | ||
`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is | ||
primarily a matter of calling `def_property_readonly` vs `def` in binding code, | ||
and makes things feel much nicer to the Python side. | ||
|
||
For example, prefer: | ||
|
||
```c++ | ||
m.def_property_readonly("context", ...) | ||
``` | ||
|
||
Over: | ||
|
||
```c++ | ||
m.def("getContext", ...) | ||
``` | ||
|
||
### __repr__ methods | ||
|
||
Things that have nice printed representations are really great :) If there is a | ||
reasonable printed form, it can be a significant productivity boost to wire that | ||
to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). | ||
|
||
### CamelCase vs snake_case | ||
|
||
Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As | ||
a mechanical concession to Python style, this can go a long way to making the | ||
API feel like it fits in with its peers in the Python landscape. | ||
|
||
If in doubt, choose names that will flow properly with other | ||
[PEP 8 style names](https://pep8.org/#descriptive-naming-styles). | ||
|
||
### Prefer pseudo-containers | ||
|
||
Many core IR constructs provide methods directly on the instance to query count | ||
and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. | ||
|
||
For example, a direct mapping of blocks within regions could be done this way: | ||
|
||
```python | ||
region = ... | ||
|
||
for block in region: | ||
|
||
pass | ||
``` | ||
|
||
However, this way is preferred: | ||
|
||
```python | ||
region = ... | ||
|
||
for block in region.blocks: | ||
|
||
pass | ||
|
||
print(len(region.blocks)) | ||
print(region.blocks[0]) | ||
print(region.blocks[-1]) | ||
``` | ||
|
||
Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate | ||
them to appropriate `__dunder__` methods and iterator wrappers in the bindings. | ||
|
||
Note that this can be taken too far, so use good judgment. For example, block | ||
arguments may appear container-like but have defined methods for lookup and | ||
mutation that would be hard to model properly without making semantics | ||
complicated. If running into these, just mirror the C/C++ API. | ||
|
||
### Provide one stop helpers for common things | ||
|
||
One stop helpers that aggregate over multiple low level entities can be | ||
incredibly helpful and are encouraged within reason. For example, making | ||
`Context` have a `parse_asm` or equivalent that avoids needing to explicitly | ||
construct a SourceMgr can be quite nice. One stop helpers do not have to be | ||
mutually exclusive with a more complete mapping of the backing constructs. | ||
|
||
## Testing | ||
|
||
Tests should be added in the `test/Bindings/Python` directory and should | ||
typically be `.py` files that have a lit run line. | ||
|
||
While lit can run any python module, prefer to lay tests out according to these | ||
rules: | ||
|
||
* For tests of the API surface area, prefer | ||
[`doctest`](https://docs.python.org/3/library/doctest.html). | ||
* For generative tests (those that produce IR), define a Python module that | ||
constructs/prints the IR and pipe it through `FileCheck`. | ||
* Parsing should be kept self-contained within the module under test by use of | ||
raw constants and an appropriate `parse_asm` call. | ||
* Any file I/O code should be staged through a tempfile vs relying on file | ||
artifacts/paths outside of the test module. | ||
|
||
### Sample Doctest | ||
|
||
```python | ||
# RUN: %PYTHON %s | ||
|
||
""" | ||
>>> m = load_test_module() | ||
Test basics: | ||
>>> m.operation.name | ||
"module" | ||
>>> m.operation.is_registered | ||
True | ||
>>> ... etc ... | ||
Verify that repr prints: | ||
>>> m.operation | ||
<operation 'module'> | ||
""" | ||
|
||
import mlir | ||
|
||
TEST_MLIR_ASM = r""" | ||
func @test_operation_correct_regions() { | ||
// ... | ||
} | ||
""" | ||
|
||
# TODO: Move to a test utility class once any of this actually exists. | ||
def load_test_module(): | ||
ctx = mlir.ir.Context() | ||
ctx.allow_unregistered_dialects = True | ||
module = ctx.parse_asm(TEST_MLIR_ASM) | ||
return module | ||
|
||
|
||
if __name__ == "__main__": | ||
import doctest | ||
doctest.testmod() | ||
``` | ||
|
||
### Sample FileCheck test | ||
|
||
```python | ||
# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck | ||
|
||
# TODO: Move to a test utility class once any of this actually exists. | ||
def print_module(f): | ||
m = f() | ||
print("// -----") | ||
print("// TEST_FUNCTION:", f.__name__) | ||
print(m.to_asm()) | ||
return f | ||
|
||
# CHECK-LABEL: TEST_FUNCTION: create_my_op | ||
@print_module | ||
def create_my_op(): | ||
m = mlir.ir.Module() | ||
builder = m.new_op_builder() | ||
# CHECK: mydialect.my_operation ... | ||
builder.my_op() | ||
return m | ||
``` |