Framework for symbolic optimizations #122

ViralBShah · 2011-07-16T05:11:07Z

We need a framework to express certain mathematical optimizations in julia itself. These may be expressed as rules that are run after types have been inferred. Examples are:

A' * B, A' \ B: Can be computed without computing the transpose by calling BLAS/LAPACK routines
A - B + C .* D: Can be computed without temporaries
A[m:n, p:q] + B: Avoid computing the ref.
A[m:n, p:q] * B: Avoid computing the sref, and call DGEMM directly in cases where this is possible.
[ a b c; d f; e]: Compute the entire concatenation in one shot.
A[p,q] = B[r,s]: Temporary can be avoided by calling assign(A, sub(B, r, s), p, q)

In all cases, temporaries can be avoided. Either we can do it with optimizations, or have the parser such expressions down to the runtime. If a matching runtime implementation for such cases is not found, the existing base case should certainly be used.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2011-07-16T06:48:27Z

#5 above is easy since it doesn't generalize (it only deals with 2 dimensions). How should it be called? Maybe

hvcat({a, b, c}, {d, f}, {e})

?

Of course a fallback definition can be provided so implementing it is only an optimization.

Same idea applies to a'*b, a*b', etc. We could have aTb() and abT() methods.

For #2, we already parse runs of + and * to single calls, so +(as::Array...) can be implemented.
Is it worth having special methods for s*x+y etc.?

ViralBShah · 2011-07-16T07:01:03Z

Yes, #5 is easier - and the suggested syntax seems ok.

I guess aTb() and abT() are good starts for a first release, until a more general framework can be implemented. Ideally, BLAS allows you to efficiently do A[I,J]^T*B[I,J]^T + C[I,J]^T where I and J are ranges. I guess the indexing part can be handled through subarrays, but then the user has to use subarrays explicitly.

For #2, I was only thinking of element-wise operations for a start.

Think of a stencil operation. It will be essentially the element-wise stuff, but with some indexing thrown in. It would be nice to handle these cases with indexing. Perhaps we need to use subarrays more automatically in more places.

JeffBezanson · 2011-07-16T07:53:12Z

Well I'm saying we can already handle element-wise a+b+c+d+....

What special methods do we want? I guess I could also make special versions of the operators where you know that the first and/or second argument can be overwritten.

ViralBShah · 2011-07-16T07:55:18Z

I was referring to element-wise a-b._c+4_d?

JeffBezanson · 2011-07-16T07:59:36Z

Seems to me passing this to something at run time amounts to handling arithmetic operators with an interpreter?

ViralBShah · 2011-07-17T03:55:34Z

Would be cool if there were a way to do this. I guess that for Dense Arrays, one can just hack it into the compiler as a special case.

StefanKarpinski · 2011-07-17T22:43:44Z

The A*B' and A'*B cases could be handled by allowing cheap transpose by storing strides for each dimension. That way A' would be a trivial operation that uses the same data as A and just puts a new matrix object around it with different strides. The the multiply call could just pass dgemm the correct stride values and it would work.

Unfortunately, this presumes that you can do T = A' without any copying, which in the current implementation you can't. This brings us back to immutability vs. mutability issues...

ViralBShah · 2011-07-18T02:23:49Z

I have often thought of this but am not sure the resulting complexity is worthwhile. Also it annoys the user who know what they are doing.

StefanKarpinski · 2011-07-18T02:40:18Z

It really only annoys the user who knows what they're doing in Matlab and doesn't know what they're doing in Julia. There could be ways to force the actual transpose to be done. With the mutability vs. immutability stuff, this approach wouldn't just apply to just transpose, but also subarray refs — since those could safely be references into the same data as the original array with different strides — and many other cases too.

ViralBShah · 2011-07-18T02:49:09Z

Here's a suggestion for this - How about extend * to allow:

For A*B:
*(A,B)

For A'*B'+C', where either of the transposes may or may not happen.
*(A, transA, B, transB, C, transC)

Also want one more form for x' *A' *y'. Perhaps * can take a 7th argument that says what type of operation it is - a'b'+c' or x'A'y'.

StefanKarpinski · 2011-07-18T03:39:37Z

Ahh! These all go away if we deal with mutability and allow cheap transposes.

ViralBShah · 2011-07-18T03:43:36Z

I don't like the idea of the cheap transpose. Cheap transposes are not always desirable, as they just move the problem elsewhere. Most fast matlab code is written with an assumption on stride and storage, and the cheap transpose just messes up that performance model. Also, every single operation has to be made stride-aware, and also will become a bit slower due to dealing with strides everywhere.

I'd be ok with a cheap transpose that is evaluated lazily, but not one that just changes the stride.

JeffBezanson · 2011-07-18T04:04:44Z

We can have functions for A'_B'+C', x'_A'_y' etc. but I'd rather they not be part of _ since they don't just multiply. We need other names for them.

ViralBShah · 2011-07-18T06:19:40Z

muladd() and bilinear()?

ViralBShah · 2011-07-18T13:09:42Z

I tried a simple experiment on randmatstat, minimizing the array allocations and concatenation with the following change:

    ab = randn(n, 2*n)
    cd = randn(n, 2*n)
    P = [ab cd]
    Q = [ab; cd]

It gives a 10% improvement right away, and we can perhaps get a bit more. Optimizing P'_P and Q'_Q will possibly give a bit more improvement.

called as hvcat((rows...), arguments...) where the first argument is a tuple giving the number of arguments in each block row addresses issue #122 also related to issue #38

ViralBShah · 2011-12-18T05:08:44Z

commit 361588d adds transposed multiplication.

ViralBShah · 2011-12-18T05:14:57Z

Things such as A[m:n, p:q] + B, A[m:n, p:q] * B can be done efficiently with subArrays, but not automatically.

o-jasper · 2012-02-27T20:13:53Z

Common Lisp has Compiler Macros, these should be fairly easy to implement. Basically, they are just macros except they may choose to do nothing and are intended for optimization.

If you do this, it might actually be more desirable to allow the compiler macros to do things after type calculation, so that information is available? Though then it might have to be careful not to break whatever has been type-calculated, or the type calculation has to run again. Of course the disadvantage of this is that the macros can do anything whatsoever, care has to be taken in making them. (And of course you can make a macro to turn conversion rules into compiler macros)

Also, do you care about float-behavior specifics? Maybe if you want to, using the types Real32,Real64 could 'mean' that something like log(1+x) is converted to log1p(x) or exp(x)-1 to expm1(x).

StefanKarpinski · 2012-02-27T20:38:44Z

I find it quite comical that no matter what language feature X is, the statement "Common Lisp has X" always holds.

fph · 2012-10-28T22:46:33Z

We can have functions for A'*B'+C' [...] We need other names for them.

I think the standard name for this operation is axpy.

tknopp · 2013-12-29T13:35:12Z

Given that array views seem to be the way to go (see #5003), wouldn't it be consequent to also let the transpose operator return a view? I am not sure what to do when assigning the transposed to a new variable (i.e. x=y') but this should be handled the same way as array views (which are the right choice IMHO)

johnmyleswhite · 2013-12-29T13:56:17Z

I believe there's another issue open for making transpose operations produce a new immutable Transposed type.

jiahao · 2013-12-29T14:46:50Z

#4774

lindahua · 2013-12-29T14:48:42Z

FWIW, I have been exploring something along this line in Accelereval.jl. I am considering to use this to replace Devectorize.jl. The progress is not very fast, as I only worked on it when I got some free time to spend.

This work is inspired by Theano, but take advantage of Julia's multiple dispatch mechanism and meta-programming to simplify interface. The goal is to automatically devectorize expressions or turn them into BLAS calls whenever appropriate.

jiahao · 2013-12-29T14:51:40Z

We already have generic axpy! (#5189‎).

#4950‎ provides syr2k!/her2k! for the rank-2k updates of the form C=a*A*B'+conj(a)*B*A' + b*C. The possibility of rewrite rules to turn expressions of this form into calls to syr2k!/her2k! were mentioned.

tknopp · 2013-12-29T20:14:15Z

#4774 is about how to handle transpose for 1d arrays. I would propose that transpose in General returns a view where the strides are reversed.

@lindahua your metaprogramming work is impressive!

StefanKarpinski · 2013-12-29T20:53:03Z

Having slicing return views may actually help alleviate the issue that v'' is not of the same dimensionality as v'. If you follow along with my ArrayView pull request (#5003), when v::Array{T,1} you could have v'::ArrayView{T,2,1} and then know that since the original v is one-dimensional, v'' should also be one-dimensional. It's a bit sketchy and @lindahua's proposed contiguous array views may be necessary to make it work.

malmaud · 2014-01-05T17:54:21Z

Neal has been implementing some of these features in R recently: http://radfordneal.wordpress.com/2014/01/01/new-version-of-pqr-now-with-task-merging/. Might be worth taking some inspiration.

The new version extends use of this mechanism to merge transpose and matrix multiply operations — now t(A)%*%B does not actually compute the transpose of A, but instead calls a routine that directly computes the transpose of A times B.

JeffBezanson · 2014-01-05T20:12:30Z

This is well-trod ground. I'm not sure I've heard this optimization called "task merging" before; it is usually called "stream fusion".

We could hack something into the compiler to recognize and replace certain patterns, I just don't find it very satisfying. Library-defined rewrite rules are probably the best option I'm aware of, but they still feel like a hack. An easier case might be high-latency operations like on DArrays, where you could do delayed evaluation at run time, all at the library level.

mlubin · 2014-01-06T01:57:19Z

What would these library-defined rewrite rules look like? Right now it's hard to implement something like this with a macro because inferred types aren't available at compile time. Solving this issue in a sufficiently general way will directly help JuMP.

lindahua · 2014-01-06T02:11:06Z

@StefanKarpinski mentioned the staged function idea in the maillist.

lindahua · 2014-01-06T02:14:13Z

This might be a complicated issue -- suppose the compiler managed to make the inferred type available to the "later-staged" macro, what if that macro some how modify the types ...

mlubin · 2014-01-06T02:27:21Z

Operator overloading with lazy objects is a very intuitive way to do this "task merging". If only we could have some interface where these lazy objects can be computed once at compile time, so long as the compiler can guarantee the types of the objects from type inference (otherwise it's done at runtime).

mlubin · 2014-02-05T18:55:09Z

I keep coming back to this for JuMP. I essentially need a post-macro step that lets me write code based on the inferred types of objects in an expression tree. It's not feasible to write a code that dispatches on the types of the objects or generates isa branches because there's an exponential number of possible type combinations. How crazy is it to make this work in Julia?

ViralBShah · 2015-04-16T10:24:02Z

Much of this issue is addressed in many different ways - with staged functions, array views, and the various mul operators.

Reorganize the source files (fixes #79)

ghost assigned JeffBezanson Jul 16, 2011

JeffBezanson added a commit that referenced this issue Aug 11, 2011

adding hvcat 2d concatenation function

f02b76a

called as hvcat((rows...), arguments...) where the first argument is a tuple giving the number of arguments in each block row addresses issue #122 also related to issue #38

ViralBShah mentioned this issue Dec 18, 2011

framework for symbolic optimizations #32

Closed

jiahao mentioned this issue Jan 8, 2014

A_mul_B and all that jazz #5332

Closed

ViralBShah removed the feature label Feb 14, 2015

ViralBShah closed this as completed Apr 16, 2015

Keno pushed a commit that referenced this issue Oct 9, 2023

Merge pull request #122 from JuliaDebug/teh/reorg

a3b5baf

Reorganize the source files (fixes #79)

Framework for symbolic optimizations #122

Framework for symbolic optimizations #122

Comments

ViralBShah commented Jul 16, 2011

JeffBezanson commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

JeffBezanson commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

JeffBezanson commented Jul 16, 2011

ViralBShah commented Jul 17, 2011

StefanKarpinski commented Jul 17, 2011

ViralBShah commented Jul 18, 2011

StefanKarpinski commented Jul 18, 2011

ViralBShah commented Jul 18, 2011

StefanKarpinski commented Jul 18, 2011

ViralBShah commented Jul 18, 2011

JeffBezanson commented Jul 18, 2011

ViralBShah commented Jul 18, 2011

ViralBShah commented Jul 18, 2011

ViralBShah commented Dec 18, 2011

ViralBShah commented Dec 18, 2011

o-jasper commented Feb 27, 2012

StefanKarpinski commented Feb 27, 2012

fph commented Oct 28, 2012

tknopp commented Dec 29, 2013

johnmyleswhite commented Dec 29, 2013

jiahao commented Dec 29, 2013

lindahua commented Dec 29, 2013

jiahao commented Dec 29, 2013

tknopp commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

malmaud commented Jan 5, 2014

JeffBezanson commented Jan 5, 2014

mlubin commented Jan 6, 2014

lindahua commented Jan 6, 2014

lindahua commented Jan 6, 2014

mlubin commented Jan 6, 2014

mlubin commented Feb 5, 2014

ViralBShah commented Apr 16, 2015