Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLOG] Add python-moa-introduction #633

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 233 additions & 0 deletions apps/labs/posts/python-moa-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
title: 'MOA: a theory for composable and verifiable tensor computations'
author: chris-ostrouchov
published: April 17, 2019
description: 'Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullin. The MOA principles allow for high level reductions, that other compilers will miss, and for optimization of algorithms as a whole.'
category: [PyData ecosystem]
featuredImage:
src: /posts/python-moa-introduction/frontend.png
alt: 'Clip from a diagram of the python-moa compiler.'
hero:
imageSrc: /posts/python-moa-introduction/blog_hero_var1.svg
imageAlt: 'An illustration of a brown hand holding up a microphone, with some graphical elements highlighting the top of the microphone.'
---

Python-moa (mathematics of arrays) is an approach to a high level tensor
compiler that is based on the work of [Lenore
Mullin](https://www.albany.edu/ceas/lenore-mullin.php) and her
[dissertation](https://www.researchgate.net/publication/308893116_A_Mathematics_of_Arrays).
A high level compiler is necessary because there are many optimizations
that a low level compiler such as `gcc` will miss. It is trying to solve
many of the same problems as other technologies such as the [taco
compiler](http://tensor-compiler.org/) and the [xla
compiler](https://www.tensorflow.org/xla). However, it takes a much
different approach than others guided by the following principles.

1. What is the shape? Everything has a shape. scalars, vectors, arrays,
operations, and functions.
2. What are the given indicies and operations required to produce a
given index in the result?

Having a compiler that is guided upon these principles allows for high
level reductions that other compilers will miss and allows for
optimization of algorithms as a whole. Keep in mind that MOA is **NOT**
a compiler. It is a theory that guides compiler development. Since
[python-moa](https://github.com/Quansight-Labs/python-moa) is based on
theory we get unique properties that other compilers cannot guarantee:

- No out of bounds array accesses
- A computation is determined to be either valid or invalid at compile
time
- The computation will always reduce to a deterministic minimal form
(dnf) (see
[church-rosser](https://en.wikipedia.org/wiki/Church%E2%80%93Rosser_theorem)
property)
- All MOA operations are composable (including black box functions and
[gufuncs](https://docs.scipy.org/doc/numpy-1.13.0/reference/c-api.generalized-ufuncs.html))
- Arbitrary high level operations will compile down to a minimal
backend instruction set. If the shape and indexing of a given
operation is known it can be added to python-moa.

## Frontend

Lenore Mullin originally developed a [moa
compiler](https://github.com/saulshanabrook/psi-compiler/) in the 90s
with programs that used a symbolic syntax heavily inspired by
[APL](<https://en.wikipedia.org/wiki/APL_(programming_language)>)
([example
program](https://github.com/saulshanabrook/psi-compiler/blob/master/examples/ex1.m)).
This work was carried into python-moa initially with a lex/yacc compiler
with an example program below.

```python
from moa.frontend import parse

context = parse('<0> psi (tran (A ^ <n m> + B ^ <k l>))')
```

Upon pursuing this approach it became apparent that MOA should not
require that a new syntax be developed since it is only a theory! So a
pythonic interface to MOA was developed that expressed the same ideas
which look much like the current numeric python libraries. Ideally MOA
is hidden from the user. The python-moa compiler is broken into several
pieces each which their own responsibilities: shape, DNF, and ONF.

```python
from moa.frontend import LazyArray

A = LazyArray(shape=('n', 'm'), name='A')
B = LazyArray(shape=('k', 'l'), name='B')

expression = (A + B).T.reduce('+')[0]
expression
```

![Diagram of the python-moa compiler.](/posts/python-moa-introduction/frontend.png)

## Shape Calculation

The shape calculation is responsible for calculating the shape at every
step of the computation. This means that operations have a shape. Note
that the compiler handles symbolic shapes thus the exact shape does not
need to be known, only the dimension. After the shape calculation step
we can guarantee that an algorithm is a valid program and there will be
no out of bound memory accesses. Making MOA extremely compelling for
[FPGAs](https://en.wikipedia.org/wiki/Field-programmable_gate_array) and
compute units with a minimal OS. If an algorithm makes it past this
stage and fails then it is an issue with the compiler not the algorithm.

```python
expression.visualize(stage='shape')
```

![Diagram of shape calculation.](/posts/python-moa-introduction/shape_calculation.png)

## Denotational Normal Form (DNF)

The DNF\'s responsibility is to reduce the high level MOA expression to
the minimal and optimal machine independent computation. This graph has
all of the indexing patterns of the computation and resulting shapes.
Notice that several operations disappear in this stage such a transpose
because transpose is simply index manipulation.

```python
expression.visualize(stage='dnf')
```

![Diagram of Denotational Normal Form.](/posts/python-moa-introduction/denotational_normal_form.png)

## Operational Normal Form (ONF)

The ONF is the stage of the compiler that will have to be the most
flexible. At its current stage the ONF is a naive compiler that does not
perform many important optimizations such as [PSI
reduction](https://www.researchgate.net/publication/264758384_Effective_data_parallel_computation_using_the_Psi_calculus)
which reduces the number of loops in the calculation, loop ordering, and
minimize the number of accumulators. MOA has ideas of dimension lifting
which make parallization and optimizing for cache sizes much easier.
Additionally algorithms must be implemented differently for sparse,
column major, row major. The ONF stage is responsible for any
\"optimal\" machine dependent implementation. \"optimal\" will vary from
user to user and thus will have to allow for multiple programs: optimal
single core, optimal parallel, optimal gpu, optimal low memory, etc.

```python
print(expression.compile(use_numba=True, include_conditions=False))
```

@numba.jit
def f(A, B):


n = A.shape[0]

m = A.shape[1]

k = B.shape[0]

l = B.shape[1]

_a21 = numpy.zeros(())

_a19 = numpy.zeros(())

_a21 = 0

for _i10 in range(0, m, 1):

_a21 = (_a21 + (A[(0, _i10)] + B[(0, _i10)]))

_a19[()] = _a21
return _a19
Comment on lines +138 to +161
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indented code does not work in MDX.

Also, this looks like output to me, so we have the same issue here that we have described before:


## Performance

MOA excels at performing reductions and reducing the amount of actual
work done. You will see that the following algorithm only requires the
first index of the computation. Making the naive implementation `1000x`
more expensive for `1000x1000` shaped array. The following benchmarks
have been performed on my laptop with an intel i5-4200U. However, more
benchmarks are always available on the [Travis
CI](https://travis-ci.org/Quansight-Labs/python-moa) (these benchmarks
test python-moa\'s weaknesses). You will see with the benchmarks that if
**any** indexing is required MOA will be significantly faster unless you
hand optimize the numerical computations.

```python
import numpy
import numba

n, m = 1000, 1000

exec(expression.compile(backend='python', use_numba=True, include_conditions=False))

A = numpy.random.random((n, m))
B = numpy.random.random((n, m))
```

Here we execute the MOA optimized code with the help of
[numba](https://github.com/numba/numba) which is a JIT LLVM compiler for
python.

```python
%%timeit

f(A=A, B=B)
```

2.14 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The following numpy computation is obviously the worst case expression
that you could write but this brings up the point that often times the
algorithm is expressed differently than the implementation. This is one
of the problems that MOA hopes to solve.

```python
%%timeit

(A + B).T.sum(axis=0)[0]
```

2.74 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

We notice that even with the optimized version MOA is faster. This is
mostly due to the transpose operation the numpy performs that we have no
way around.

```python
%%timeit

(A[0] + B[0]).T.sum(axis=0)
```

6.67 µs ± 91.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

## Conclusions

I hope that this walk through has shown the promising results that the
MOA theory can bring to tensor computations and the python ecosystem as
a whole. Please feel free to try out the project at [Quansight
Labs/python-moa](https://github.com/Quansight-Labs/python-moa). I hope
that this work can allow for the analysis and optimization of algorithms
in a mathematically rigorous way which allows users to express their
algorithms in an implementation independent manner.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.