Skip to content

Meeting Notes

Saul Shanabrook edited this page Jan 8, 2019 · 15 revisions

2019.01.08 Development Meeting

Hameer wrote up a design doc: https://github.com/Quansight-Labs/uarray/blob/master/docs/index.rst

Hameer explanation of it

Dictionary of backends. Registering them would register event handlers to backend.

Strict rules: shapes are tuple of indices. Getitem always takes integer.

ua.backend only affects creation routines. Dispatching only takes place based on types.

del x to remove it. Maybe look at locals to see what should be kept around.

All you need for computation is getitem, setitem, or construct.

Ragged arrays? ua_shape returns None if dimension is jagged.

Sparse arrays? Don't define setitem. Would define custom implementations of other things.

Pearu: User should be able to define backends to try. Have different layers of implementations so that we can select different backends.

Pearu: Small vs large arrays, different contexts. Maybe have costs associated?

Maybe arbitrary dispatching function...

User should be able to choose. Maybe they should specify list of backends.

Ragged arrays?? Let's just make a new shape method for ragged arrays. don't overload existing shape We need to understand more algorithms and descriptions of ragged arrays.

Use case for Dask?

In with statement use this particular replacement...

What order do we want? Want things to correspond constraints. When we have multiple how do we decide?

Most specific wins? Throw an error and make the user choose.

Match against subclasses against base classes.

What takes precedence when?

Context manager? global?

2018.12.11 Development meeting

This call is independent of Mathematics of Arrays work, to focus on implementation.

Should this call be held in the open? Yes.

For the array interface compatibility, it should probably coexist with the current approach: https://github.com/Quansight-Labs/uarray/pull/76

Should we have concat be a primary operator? If the goal is to avoid having conditional statement, to make things simpler, we could make concat not reduce down in the ONF. We could also have some sense of partial functions generate multiply loops, like is done in the current C compiler:

see this function in the psi compiler

test(array A^2 <5 4>, array C^2 <4 6>)
{
  const array B^2 <2 4>=<1 2 3 4 1 2 3 4>;

  A=B cat (<1 2> drop C);
}

Chris added graph visualizations for the ASTs and a MoA parser.

Next steps

  • Come up with plan for what we want to achieve during next meeting
  • Come up with some use cases on slack

2018.12.06 Team meeting

Saul's notes

  • Paper is in!!
  • Lenore is pursuing research on arrays of arrays
  • Lenore would like to give some background on her experience building this type of system over the next month or so
  • New people will come into work on this and they will have to get up to speed quickly. We need material to help educate people as they come in
  • Outline of material lenore would like to teach:
    • Given algorithm
    • express in MoA
    • derive
    • build ONF
    • predict performance based on machine
    • compare to other optimizing compilers
  • Pearu: We all have different backgrounds less. We should be able to do understand very basic definitions in the same way. Should provide these strict math definitions.
    • Give order of papers to read
  • DNF: Just the math, AST simplified
  • ONF: How to implement optimally, how to scale it properly based on hardware
  • Chris: Going over examples has been very helpful to understand MoA.
  • Lenore: Current uarray work is prototype. We need to reformulate all of it with more group consensus on how to implement it.

2018.11.10 Saul Notes

How to represent dataframe? We need primitives to get this like struct. Struct can be thought of as list but with labels. So instead of Sequence(length, getitem) you have a Mapping, which has keys()...

So how do we represent this? If we have some mapping form, we also need to register that is a Sequence/array type. Sort of like how we have different forms but all have Initializer.

This speaks to having better names. Look to category theory to help us here instead of relying on Python.

2018.11.09 Travis David Saul call

High level way: Everything is high level, non machine specific.

You can imagine a factory that takes high level syntax and produces bits that executes.

But this is a hard concept, since we have many users acting at different levels.

So today we need lots of language and lots of compilers and how do they work together?

The real world is we have runtime interpreter layers. Where we compile to graph/AST and that ends up being interpreted.

The runtime right now is ad hoc. Crazy scientists went out and built some objects, then built some functions and we end up with a runtime.

So in an ideal setting we define a nice crystalline structure. We need to support edge cases. So we need to have lots of escape hatches and lay breadcrumbs, to let these things connect.

So everything may not be complete, but we can have parts of it that are. We need something to put together these different parts and snap them together.

The use case is we want to enable pydata libraries to have an interface than code to that doesn't restrict them to implementations.

We need python level API as well C level API.

Release series of modules. ulinalg ustats upoly ufft

SciPy is the API

Cupy can be a backend

"You can write things and get it faster and run on anything." is the carrot. end result is we move to a world where we can reuse work more easily and allow greater interop.

How do we show reduction?

Calculated matrices: https://jacobgil.github.io/deeplearning/tensor-decompositions-deep-learning use uarray for this

TODO:

  • for talk:
    • Inner/outer product reduction example
  • uarray:
    • Clarify high level goal -> build scipy/numpy compat modules thatt support differentt backends
  • xnd
    • None

2018.11.08 David - Saul call

We really have three levels of systems, from low level to high level:

  • ./uarray/machinary.py: base algebra/replacement system (currently using pattern matching with matchpy for this)
  • ./uarray/core.py: categories/types in the algebra (these are things like Arrays, callables, contents, AST statements)
  • ./uarray/moa.py: Mathematics of arrays algebra
  • ./uarray/ast.py: compiling to backend algebra. right now we just have one for Python ASTs

We want a nice beautiful structure exposed, but we also need release valves at every level, when we need to break that for practical purposes. Right now, I am working on using Python mypy typing to harden this structure and precisely define the core and so on top of that moa and ast. But, in the end that is just a compile time understanding of what is going on for the developer to know what is allowed where. Under the covers, you can always slip down to "raw" matchpy finagling if need be.

For example, this is how we define the Sequence matchpy operation now:

@operation
def Sequence(length: CContent, getitem: CGetitem) -> CArray:
    ...

It is typed properly so that when you compose it, it checks to make sure it's args are the rights categories.

2018.11.08 Weekly Call

Saul's update

Last week, I was working on compiling uarray to Python AST. I made enough progress with this to get some example working.

This past week, I have been working on making the system easier to develop on. Work is active in this PR https://github.com/Quansight-Labs/uarray/pull/33

  • Adding tutorial to readme for MatchPy and our usage of it
  • Thinking about the "categories" of operations, describing those, and being able to type matchpy operations using Python's typing so that we have can have compile time verification that we are composing things correctly
  • Adding replace_scan function to give iterable to all replacements on expressions
  • Renaming operations according to more math-ey (lambda calculus) terms so that they are distinct from Python concepts
  • I also did some background reading on "Compiling to Categories" (http://conal.net/papers/compiling-to-categories/) which I think is parallel to our work. They compile haskell code to FPGA.

Once all that is done, I plan to continue work on inner and outer product, making some examples showing their replacement and understanding their execution in other forms. That's my goal to have these examples for the talk next week.

2018.11.02 David - Saul Call

Conclusion: I should specify how this Meta system works. What is required to define things correctly.

Category theory might yield helpful names.

declared categories

you must define functions over these

What i am doing

build general functor mapping between python expressions and backends

intermediate category that we transform python into

run that through functor to get target representation

category of categories: registry

replacement rule = functor

Category A

Category A' In space A:

f(a) = g(b)

then

f(a) = g(b) in category b

natural transformation

forgetful functor

2018.10.26 Travis - Saul call

2018.10.25 Meeting

Actions:

  • Going to bi-weekly meeting for the larger discussion

Other Decisions:

  • We need to produce a pattern for how to port the NumPy ecosystem to uarray -- show examples and lead
  • We will work to get that pattern defined by the end of the year
  • Going to have separate discussions for:
    • Relationship to category theory (David C and Tony F)
    • NumPy API --> Defined uarray subset (Travis and Hameer and Pearu)
    • NumPy Ecosystem (PyData, SciPy, etc) as Operations build-out (Travis, Saul, Pearu, Hameer)
    • MatchPy development (Saul, etc.)
  • Keep discussion public
  • Use XND has an example for how to port SciPy code to uarray

There was a lot of discussion about the generality of what we are working on and continuing work on helping each other with a common vocabulary.

We hypothesize that the concept of an array as "shape, psi-function, and mtype" with optimized implementations and generalized ufuncs is the foundation of an array protocol.

The "matchpy" rule-reduction mechanism with Operations serving as "QFunctions" continues to hold water.

2018.09.17 Raggedness in Psi Calculus

Question: Are we trying to map between ragged arrays and contiguous memory?

What if limit to fixed dimensionality but variable shapes? Vectors of vectors. This could cover most cases. If we do this, we could just store offsets.

But maybe there aren't applications of more general case, because software doesn't support it. So maybe if we create software that supports this then people will use it.

The most general case is where the arrangement of arrays is completely random. But in applications you usually have some order.

Types are like sub arrays. Integers with different bit widths are like sub arrays of bits.