-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BLOG] Add python-moa-introduction #633
Open
MarsBarLee
wants to merge
7
commits into
develop
Choose a base branch
from
python-moa-introduction
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+234
−0
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
fc2e349
Add ipynb to md conversion
MarsBarLee 7f1fd9b
Update image filepaths
MarsBarLee e7f4ce2
Update file paths
MarsBarLee 2d7ed4f
Update description
MarsBarLee fb1179b
Add images
MarsBarLee c9cc8cf
Update image file paths
MarsBarLee 67cd4d4
change blog thumbnail
gabalafou File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
--- | ||
title: 'MOA: a theory for composable and verifiable tensor computations' | ||
author: chris-ostrouchov | ||
published: April 17, 2019 | ||
description: 'Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullin. The MOA principles allow for high level reductions, that other compilers will miss, and for optimization of algorithms as a whole.' | ||
category: [PyData ecosystem] | ||
featuredImage: | ||
src: /posts/python-moa-introduction/frontend.png | ||
alt: 'Clip from a diagram of the python-moa compiler.' | ||
hero: | ||
imageSrc: /posts/python-moa-introduction/blog_hero_var1.svg | ||
imageAlt: 'An illustration of a brown hand holding up a microphone, with some graphical elements highlighting the top of the microphone.' | ||
--- | ||
|
||
Python-moa (mathematics of arrays) is an approach to a high level tensor | ||
compiler that is based on the work of [Lenore | ||
Mullin](https://www.albany.edu/ceas/lenore-mullin.php) and her | ||
[dissertation](https://www.researchgate.net/publication/308893116_A_Mathematics_of_Arrays). | ||
A high level compiler is necessary because there are many optimizations | ||
that a low level compiler such as `gcc` will miss. It is trying to solve | ||
many of the same problems as other technologies such as the [taco | ||
compiler](http://tensor-compiler.org/) and the [xla | ||
compiler](https://www.tensorflow.org/xla). However, it takes a much | ||
different approach than others guided by the following principles. | ||
|
||
1. What is the shape? Everything has a shape. scalars, vectors, arrays, | ||
operations, and functions. | ||
2. What are the given indicies and operations required to produce a | ||
given index in the result? | ||
|
||
Having a compiler that is guided upon these principles allows for high | ||
level reductions that other compilers will miss and allows for | ||
optimization of algorithms as a whole. Keep in mind that MOA is **NOT** | ||
a compiler. It is a theory that guides compiler development. Since | ||
[python-moa](https://github.com/Quansight-Labs/python-moa) is based on | ||
theory we get unique properties that other compilers cannot guarantee: | ||
|
||
- No out of bounds array accesses | ||
- A computation is determined to be either valid or invalid at compile | ||
time | ||
- The computation will always reduce to a deterministic minimal form | ||
(dnf) (see | ||
[church-rosser](https://en.wikipedia.org/wiki/Church%E2%80%93Rosser_theorem) | ||
property) | ||
- All MOA operations are composable (including black box functions and | ||
[gufuncs](https://docs.scipy.org/doc/numpy-1.13.0/reference/c-api.generalized-ufuncs.html)) | ||
- Arbitrary high level operations will compile down to a minimal | ||
backend instruction set. If the shape and indexing of a given | ||
operation is known it can be added to python-moa. | ||
|
||
## Frontend | ||
|
||
Lenore Mullin originally developed a [moa | ||
compiler](https://github.com/saulshanabrook/psi-compiler/) in the 90s | ||
with programs that used a symbolic syntax heavily inspired by | ||
[APL](<https://en.wikipedia.org/wiki/APL_(programming_language)>) | ||
([example | ||
program](https://github.com/saulshanabrook/psi-compiler/blob/master/examples/ex1.m)). | ||
This work was carried into python-moa initially with a lex/yacc compiler | ||
with an example program below. | ||
|
||
```python | ||
from moa.frontend import parse | ||
|
||
context = parse('<0> psi (tran (A ^ <n m> + B ^ <k l>))') | ||
``` | ||
|
||
Upon pursuing this approach it became apparent that MOA should not | ||
require that a new syntax be developed since it is only a theory! So a | ||
pythonic interface to MOA was developed that expressed the same ideas | ||
which look much like the current numeric python libraries. Ideally MOA | ||
is hidden from the user. The python-moa compiler is broken into several | ||
pieces each which their own responsibilities: shape, DNF, and ONF. | ||
|
||
```python | ||
from moa.frontend import LazyArray | ||
|
||
A = LazyArray(shape=('n', 'm'), name='A') | ||
B = LazyArray(shape=('k', 'l'), name='B') | ||
|
||
expression = (A + B).T.reduce('+')[0] | ||
expression | ||
``` | ||
|
||
![Diagram of the python-moa compiler.](/posts/python-moa-introduction/frontend.png) | ||
|
||
## Shape Calculation | ||
|
||
The shape calculation is responsible for calculating the shape at every | ||
step of the computation. This means that operations have a shape. Note | ||
that the compiler handles symbolic shapes thus the exact shape does not | ||
need to be known, only the dimension. After the shape calculation step | ||
we can guarantee that an algorithm is a valid program and there will be | ||
no out of bound memory accesses. Making MOA extremely compelling for | ||
[FPGAs](https://en.wikipedia.org/wiki/Field-programmable_gate_array) and | ||
compute units with a minimal OS. If an algorithm makes it past this | ||
stage and fails then it is an issue with the compiler not the algorithm. | ||
|
||
```python | ||
expression.visualize(stage='shape') | ||
``` | ||
|
||
![Diagram of shape calculation.](/posts/python-moa-introduction/shape_calculation.png) | ||
|
||
## Denotational Normal Form (DNF) | ||
|
||
The DNF\'s responsibility is to reduce the high level MOA expression to | ||
the minimal and optimal machine independent computation. This graph has | ||
all of the indexing patterns of the computation and resulting shapes. | ||
Notice that several operations disappear in this stage such a transpose | ||
because transpose is simply index manipulation. | ||
|
||
```python | ||
expression.visualize(stage='dnf') | ||
``` | ||
|
||
![Diagram of Denotational Normal Form.](/posts/python-moa-introduction/denotational_normal_form.png) | ||
|
||
## Operational Normal Form (ONF) | ||
|
||
The ONF is the stage of the compiler that will have to be the most | ||
flexible. At its current stage the ONF is a naive compiler that does not | ||
perform many important optimizations such as [PSI | ||
reduction](https://www.researchgate.net/publication/264758384_Effective_data_parallel_computation_using_the_Psi_calculus) | ||
which reduces the number of loops in the calculation, loop ordering, and | ||
minimize the number of accumulators. MOA has ideas of dimension lifting | ||
which make parallization and optimizing for cache sizes much easier. | ||
Additionally algorithms must be implemented differently for sparse, | ||
column major, row major. The ONF stage is responsible for any | ||
\"optimal\" machine dependent implementation. \"optimal\" will vary from | ||
user to user and thus will have to allow for multiple programs: optimal | ||
single core, optimal parallel, optimal gpu, optimal low memory, etc. | ||
|
||
```python | ||
print(expression.compile(use_numba=True, include_conditions=False)) | ||
``` | ||
|
||
@numba.jit | ||
def f(A, B): | ||
|
||
|
||
n = A.shape[0] | ||
|
||
m = A.shape[1] | ||
|
||
k = B.shape[0] | ||
|
||
l = B.shape[1] | ||
|
||
_a21 = numpy.zeros(()) | ||
|
||
_a19 = numpy.zeros(()) | ||
|
||
_a21 = 0 | ||
|
||
for _i10 in range(0, m, 1): | ||
|
||
_a21 = (_a21 + (A[(0, _i10)] + B[(0, _i10)])) | ||
|
||
_a19[()] = _a21 | ||
return _a19 | ||
|
||
## Performance | ||
|
||
MOA excels at performing reductions and reducing the amount of actual | ||
work done. You will see that the following algorithm only requires the | ||
first index of the computation. Making the naive implementation `1000x` | ||
more expensive for `1000x1000` shaped array. The following benchmarks | ||
have been performed on my laptop with an intel i5-4200U. However, more | ||
benchmarks are always available on the [Travis | ||
CI](https://travis-ci.org/Quansight-Labs/python-moa) (these benchmarks | ||
test python-moa\'s weaknesses). You will see with the benchmarks that if | ||
**any** indexing is required MOA will be significantly faster unless you | ||
hand optimize the numerical computations. | ||
|
||
```python | ||
import numpy | ||
import numba | ||
|
||
n, m = 1000, 1000 | ||
|
||
exec(expression.compile(backend='python', use_numba=True, include_conditions=False)) | ||
|
||
A = numpy.random.random((n, m)) | ||
B = numpy.random.random((n, m)) | ||
``` | ||
|
||
Here we execute the MOA optimized code with the help of | ||
[numba](https://github.com/numba/numba) which is a JIT LLVM compiler for | ||
python. | ||
|
||
```python | ||
%%timeit | ||
|
||
f(A=A, B=B) | ||
``` | ||
|
||
2.14 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) | ||
|
||
The following numpy computation is obviously the worst case expression | ||
that you could write but this brings up the point that often times the | ||
algorithm is expressed differently than the implementation. This is one | ||
of the problems that MOA hopes to solve. | ||
|
||
```python | ||
%%timeit | ||
|
||
(A + B).T.sum(axis=0)[0] | ||
``` | ||
|
||
2.74 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) | ||
|
||
We notice that even with the optimized version MOA is faster. This is | ||
mostly due to the transpose operation the numpy performs that we have no | ||
way around. | ||
|
||
```python | ||
%%timeit | ||
|
||
(A[0] + B[0]).T.sum(axis=0) | ||
``` | ||
|
||
6.67 µs ± 91.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) | ||
|
||
## Conclusions | ||
|
||
I hope that this walk through has shown the promising results that the | ||
MOA theory can bring to tensor computations and the python ecosystem as | ||
a whole. Please feel free to try out the project at [Quansight | ||
Labs/python-moa](https://github.com/Quansight-Labs/python-moa). I hope | ||
that this work can allow for the analysis and optimization of algorithms | ||
in a mathematically rigorous way which allows users to express their | ||
algorithms in an implementation independent manner. |
1 change: 1 addition & 0 deletions
1
apps/labs/public/posts/python-moa-introduction/blog_hero_var1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+76.8 KB
apps/labs/public/posts/python-moa-introduction/denotational_normal_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indented code does not work in MDX.
Also, this looks like output to me, so we have the same issue here that we have described before: