Add Python bindings guide.

Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes Tags: #mlir Differential Revision: https://reviews.llvm.org/D83527
jaebaek · Jul 10, 2020 · c20c196 · c20c196
1 parent 553dbb6
commit c20c196
Showing 1 changed file with 328 additions and 0 deletions.
diff --git a/mlir/docs/Bindings/Python.md b/mlir/docs/Bindings/Python.md
@@ -0,0 +1,328 @@
+# MLIR Python Bindings
+
+Current status: Under development and not enabled by default
+
+
+## Building
+
+### Pre-requisites
+
+* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
+  be located by CMake.
+* A relatively recent Python3 installation
+
+### CMake variables
+
+* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL`
+
+  Enables building the Python bindings. Defaults to `OFF`.
+
+* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL`
+
+  Links the native extension against the Python runtime library, which is
+  optional on some platforms. While setting this to `OFF` can yield some greater
+  deployment flexibility, linking in this way allows the linker to report
+  compile time errors for unresolved symbols on all platforms, which makes for a
+  smoother development workflow. Defaults to `ON`.
+
+* **`PYTHON_EXECUTABLE`**:`STRING`
+
+  Specifies the `python` executable used for the LLVM build, including for
+  determining header/link flags for the Python bindings. On systems with
+  multiple Python implementations, setting this explicitly to the preferred
+  `python3` executable is strongly recommended.
+
+
+## Design
+
+### Use cases
+
+There are likely two primary use cases for the MLIR python bindings:
+
+1. Support users who expect that an installed version of LLVM/MLIR will yield
+   the ability to `import mlir` and use the API in a pure way out of the box.
+
+2. Downstream integrations will likely want to include parts of the API in their
+   private namespace or specially built libraries, probably mixing it with other
+   python native bits.
+
+
+### Composable modules
+
+In order to support use case #2, the Python bindings are organized into
+composable modules that downstream integrators can include and re-export into
+their own namespace if desired. This forces several design points:
+
+* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
+  global constructor.
+
+* Introduce headers for C++-only wrapper classes as other related C++ modules
+  will need to interop with it.
+
+* Separate any initialization routines that depend on optional components into
+  its own module/dependency (currently, things like `registerAllDialects` fall
+  into this category).
+
+There are a lot of co-related issues of shared library linkage, distribution
+concerns, etc that affect such things. Organizing the code into composable
+modules (versus a monolithic `cpp` file) allows the flexibility to address many
+of these as needed over time. Also, compilation time for all of the template
+meta-programming in pybind scales with the number of things you define in a
+translation unit. Breaking into multiple translation units can significantly aid
+compile times for APIs with a large surface area.
+
+### Submodules
+
+Generally, the C++ codebase namespaces most things into the `mlir` namespace.
+However, in order to modularize and make the Python bindings easier to
+understand, sub-packages are defined that map roughly to the directory structure
+of functional units in MLIR.
+
+Examples:
+
+* `mlir.ir`
+* `mlir.passes` (`pass` is a reserved word :( )
+* `mlir.dialect`
+* `mlir.execution_engine` (aside from namespacing, it is important that
+  "bulky"/optional parts like this are isolated)
+
+In addition, initialization functions that imply optional dependencies should
+be in underscored (notionally private) modules such as `_init` and linked
+separately. This allows downstream integrators to completely customize what is
+included "in the box" and covers things like dialect registration,
+pass registration, etc.
+
+### Loader
+
+LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
+other non-trivial native extensions. As such, the native extension (i.e. the
+`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
+(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
+and siblings which loads and re-exports it. This split provides a place to stage
+code that needs to prepare the environment *before* the shared library is loaded
+into the Python runtime, and also provides a place that one-time initialization
+code can be invoked apart from module constructors.
+
+To start with the `mlir/__init__.py` loader shim can be very simple and scale to
+future need:
+
+```python
+from _mlir import *
+```
+
+### Limited use of globals
+
+For normal operations, parent-child constructor relationships are realized with
+constructor methods on a parent class as opposed to requiring
+invocation/creation from a global symbol.
+
+For example, consider two code fragments:
+
+```python
+
+op = build_my_op()
+
+region = mlir.Region(op)
+
+```
+
+vs
+
+```python
+
+op = build_my_op()
+
+region = op.new_region()
+
+```
+
+For tightly coupled data structures like `Operation`, the latter is generally
+preferred because:
+
+* It is syntactically less possible to create something that is going to access
+  illegal memory (less error handling in the bindings, less testing, etc).
+
+* It reduces the global-API surface area for creating related entities. This
+  makes it more likely that if constructing IR based on an Operation instance of
+  unknown providence, receiving code can just call methods on it to do what they
+  want versus needing to reach back into the global namespace and find the right
+  `Region` class.
+
+* It leaks fewer things that are in place for C++ convenience (i.e. default
+  constructors to invalid instances).
+
+### Use the C-API
+
+The Python APIs should seek to layer on top of the C-API to the degree possible.
+Especially for the core, dialect-independent parts, such a binding enables
+packaging decisions that would be difficult or impossible if spanning a C++ ABI
+boundary. In addition, factoring in this way side-steps some very difficult
+issues that arise when combining RTTI-based modules (which pybind derived things
+are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
+
+
+## Style
+
+In general, for the core parts of MLIR, the Python bindings should be largely
+isomorphic with the underlying C++ structures. However, concessions are made
+either for practicality or to give the resulting library an appropriately
+"Pythonic" flavor.
+
+### Properties vs get*() methods
+
+Generally favor converting trivial methods like `getContext()`, `getName()`,
+`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
+primarily a matter of calling `def_property_readonly` vs `def` in binding code,
+and makes things feel much nicer to the Python side.
+
+For example, prefer:
+
+```c++
+m.def_property_readonly("context", ...)
+```
+
+Over:
+
+```c++
+m.def("getContext", ...)
+```
+
+### __repr__ methods
+
+Things that have nice printed representations are really great :)  If there is a
+reasonable printed form, it can be a significant productivity boost to wire that
+to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
+
+### CamelCase vs snake_case
+
+Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
+a mechanical concession to Python style, this can go a long way to making the
+API feel like it fits in with its peers in the Python landscape.
+
+If in doubt, choose names that will flow properly with other
+[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
+
+### Prefer pseudo-containers
+
+Many core IR constructs provide methods directly on the instance to query count
+and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
+
+For example, a direct mapping of blocks within regions could be done this way:
+
+```python
+region = ...
+
+for block in region:
+
+  pass
+```
+
+However, this way is preferred:
+
+```python
+region = ...
+
+for block in region.blocks:
+
+  pass
+
+print(len(region.blocks))
+print(region.blocks[0])
+print(region.blocks[-1])
+```
+
+Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
+them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
+
+Note that this can be taken too far, so use good judgment. For example, block
+arguments may appear container-like but have defined methods for lookup and
+mutation that would be hard to model properly without making semantics
+complicated. If running into these, just mirror the C/C++ API.
+
+### Provide one stop helpers for common things
+
+One stop helpers that aggregate over multiple low level entities can be
+incredibly helpful and are encouraged within reason. For example, making
+`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
+construct a SourceMgr can be quite nice. One stop helpers do not have to be
+mutually exclusive with a more complete mapping of the backing constructs.
+
+## Testing
+
+Tests should be added in the `test/Bindings/Python` directory and should
+typically be `.py` files that have a lit run line.
+
+While lit can run any python module, prefer to lay tests out according to these
+rules:
+
+* For tests of the API surface area, prefer
+  [`doctest`](https://docs.python.org/3/library/doctest.html).
+* For generative tests (those that produce IR), define a Python module that
+  constructs/prints the IR and pipe it through `FileCheck`.
+* Parsing should be kept self-contained within the module under test by use of
+  raw constants and an appropriate `parse_asm` call.
+* Any file I/O code should be staged through a tempfile vs relying on file
+  artifacts/paths outside of the test module.
+
+### Sample Doctest
+
+```python
+# RUN: %PYTHON %s
+
+"""
+  >>> m = load_test_module()
+Test basics:
+  >>> m.operation.name
+  "module"
+  >>> m.operation.is_registered
+  True
+  >>> ... etc ...
+
+Verify that repr prints:
+  >>> m.operation
+  <operation 'module'>
+"""
+
+import mlir
+
+TEST_MLIR_ASM = r"""
+func @test_operation_correct_regions() {
+  // ...
+}
+"""
+
+# TODO: Move to a test utility class once any of this actually exists.
+def load_test_module():
+  ctx = mlir.ir.Context()
+  ctx.allow_unregistered_dialects = True
+  module = ctx.parse_asm(TEST_MLIR_ASM)
+  return module
+
+
+if __name__ == "__main__":
+  import doctest
+  doctest.testmod()
+```
+
+### Sample FileCheck test
+
+```python
+# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
+
+# TODO: Move to a test utility class once any of this actually exists.
+def print_module(f):
+  m = f()
+  print("// -----")
+  print("// TEST_FUNCTION:", f.__name__)
+  print(m.to_asm())
+  return f
+
+# CHECK-LABEL: TEST_FUNCTION: create_my_op
+@print_module
+def create_my_op():
+  m = mlir.ir.Module()
+  builder = m.new_op_builder()
+  # CHECK: mydialect.my_operation ...
+  builder.my_op()
+  return m
+```