Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Ac_mul_B(Vector, Matrix) to avoid temporary #360

Closed
goszlanyi opened this issue Aug 16, 2016 · 35 comments
Closed

implement Ac_mul_B(Vector, Matrix) to avoid temporary #360

goszlanyi opened this issue Aug 16, 2016 · 35 comments
Labels
help wanted Extra attention is needed

Comments

@goszlanyi
Copy link

goszlanyi commented Aug 16, 2016

Let mx be a 3x3 matrix, and vv be an nx3 matrix.

Up to now v=vv[row,:] was a row vector, i.e. a 1x3 matrix,
so the matrix multiplication v*mx worked, and provided a 1x3 row vector.

In 0.5.0 Julia this has changed,
v=vv[row,:] rows are now real 1-dimensional vectors.

This is welcome change.
But as a side effect, it became more obvious and painful
that v*mx does not work for real 1-dimensional vectors.

I know that transpose(mx)*v is a workaround,
but I think this code rewrite is an unnecessary requirement.

Considering the original definition of matrix multiplication,
summing inner indices should naturally work in this case.
A good example is Fortran's matmul(v,mx): it just works fine.

I would greatly appreciate your opinion,
and any pointers to your future plans.

@andreasnoack
Copy link
Member

What about either doing

v'mx

or

v=vv[row:row,:]

?

@goszlanyi
Copy link
Author

goszlanyi commented Aug 16, 2016

Yes, these are all working tricks.
But conceptually not clean.

I think Julia is still struggling with some Matlab-like behaviour.
Having true 1-dimensional vectors is very important,
and with 0.5.0 Julia made a big step in this direction.
It would be nice reaching a really consistent state of its own.

@andreasnoack
Copy link
Member

How does Fortran handle complex conjugation in matmul(Vector,Matrix)? Is x*A the same as A'x or conj(A'x)?

@ararslan
Copy link
Member

Personally I don't see v'mx as a trick; it seems like the sensible way to go about this. If you think of two vectors in some space, the proper way to multiply them would be to multiply one by the transpose of the other, as vectors from a purely mathematical standpoint don't necessarily have an "orientation" (i.e. row or column).

I think Julia is still struggling with some Matlab-like behaviour.

Actually, what we had prior to APL-style slicing was more similar to Matlab, which makes a distinction between column and row vectors.

@goszlanyi
Copy link
Author

goszlanyi commented Aug 16, 2016

@andreasnoack

As far as I see, there is no implicit complex conjugation involved.

This is what the Metcalf-book says:
matmul(matrix_a,matrix_b) performs matrix multiplication.
For numeric arguments three cases are possible:

i) matrix_a has shape (n,m) and matrix_b has shape (m,k).
The result has shape (n,k) and element (i,j) has the value
sum( matrix_a(i,:) * matrix_b(:,j) )

ii) matrix_a has shape(m) and matrix_b has shape (m,k).
The result has shape (k) and element (j) has the value
sum( matrix_a * matrix_b(:,j) )

iii) matrix_a has shape (n,m) and matrix_b has shape (m).
The result has shape (n) and the elemeny (i) has the value
sum( matrix_a(i,:) * matrix_b )

@andreasnoack
Copy link
Member

Hm. Thanks. That was a surprising choice. In any case, I think this is very much #42 territory.

@goszlanyi
Copy link
Author

Also note, that in Fortran transpose is defined only for an array of rank two.
In Fortran-speak creating a one-column matrix from a 1-dimensional vector is not a transpose.

@goszlanyi
Copy link
Author

@ararslan
I agree that from the current options v'mx seems to be the best.
But is there a performance penalty?

@ararslan
Copy link
Member

Performance penalty compared to what?

@goszlanyi
Copy link
Author

goszlanyi commented Aug 16, 2016

Compared to storing predefined mxT=mx' and using mxT*v,
that also returns a nice 1-dimensional vector.

@ararslan
Copy link
Member

I wouldn't think so offhand (though I'm certainly no expert), but you could check with @time. :)

@KristofferC
Copy link
Member

No performance penalty, no.

@goszlanyi
Copy link
Author

goszlanyi commented Aug 16, 2016

I have just tested.
There is a significant performance penalty.

n = Int(10e6)
vv = rand(n,3)
mx = rand(3,3)
mxT = mx'

f1(vv,mx) = (@time for i=1:n; y=vv[i,:]'*mx; end)
f2(vv,mxT) = (@time for i=1:n; y=mxT*vv[i,:]; end)

f1(vv,mx)
  4.843726 seconds (120.00 M allocations: 5.215 GB, 8.36% gc time)
  4.836846 seconds (120.00 M allocations: 5.215 GB, 8.33% gc time)

f2(vv,mxT)
  2.714492 seconds (80.00 M allocations: 3.278 GB, 9.10% gc time)
  2.709794 seconds (80.00 M allocations: 3.278 GB, 9.07% gc time)

@andreasnoack
Copy link
Member

andreasnoack commented Aug 16, 2016

Just checked and v'A actually calls the fallback in operators.jl so it allocates an unnecessary temporary. That could easily be avoided though.

@goszlanyi
Copy link
Author

I have made a mistake.
Of course n should also be passed as a function argument.

f1(n,vv,mx) = (@time for i=1:n; y=vv[i,:]'*mx; end)
f2(n,vv,mxT) = (@time for i=1:n; y=mxT*vv[i,:]; end)

It brings a speedup, but the performance penalty becomes even worse:

f1(n,vv,mx)
  3.529541 seconds (100.00 M allocations: 4.768 GB, 11.86% gc time)
  3.434869 seconds (100.00 M allocations: 4.768 GB, 8.80% gc time)

f2(n,vv,mxT)
  1.490498 seconds (60.00 M allocations: 2.831 GB, 11.13% gc time)
  1.507337 seconds (60.00 M allocations: 2.831 GB, 10.67% gc time)

@ararslan
Copy link
Member

Dunno if it'll make a difference but you should try moving @time outside of the functions and using @inbounds on the loops.

@goszlanyi
Copy link
Author

@ararslan
I have done it. The results are basically the same.

@goszlanyi
Copy link
Author

@andreasnoack
Does it mean that you can cure the performance penalty while keeping the current syntax?

@andreasnoack
Copy link
Member

If you by y=vv[i,:]'*mx mean the current syntax then it should be a yes.

@goszlanyi
Copy link
Author

Yes.
At present your suggestion seems to be the best,
and if there is no performance penalty involved, then I would happily use it.

@andreasnoack
Copy link
Member

Hm. Maybe I spoke too soon. It seems like OpenBLAS' GEMM is slower than GEMV

julia> A = randn(3,3);

julia> x = randn(3);

julia> xt = x';

julia> let A = A, x = x
       @time for i in 1:10^6; BLAS.gemv('T', 1.0, A, x); end
       end
  0.113836 seconds (2.00 M allocations: 122.070 MB, 8.78% gc time)

julia> let A = A, xt = xt
       @time for i in 1:10^6; BLAS.gemm('N', 'N', 1.0, xt, A); end
       end
  0.188009 seconds (2.00 M allocations: 137.329 MB, 7.56% gc time)

which I don't think is reasonable.

@goszlanyi
Copy link
Author

Even so, your testcode is instructive for me.

@andreasnoack andreasnoack changed the title vector-matrix product à la Fortran's matmul Implement Ac_mul_B(Vector, Matrix) to avoid temporary ~~vector-matrix product à la Fortran's matmul~~ Aug 17, 2016
@andreasnoack andreasnoack changed the title Implement Ac_mul_B(Vector, Matrix) to avoid temporary ~~vector-matrix product à la Fortran's matmul~~ Implement Ac_mul_B(Vector, Matrix) to avoid temporary (was "vector-matrix product à la Fortran's matmul") Aug 17, 2016
@andreasnoack
Copy link
Member

I've updated the title. I think the remaining aspects of this discussion is fully covered by #42.

@StefanKarpinski StefanKarpinski changed the title Implement Ac_mul_B(Vector, Matrix) to avoid temporary (was "vector-matrix product à la Fortran's matmul") implement Ac_mul_B(Vector, Matrix) to avoid temporary Aug 17, 2016
@goszlanyi
Copy link
Author

goszlanyi commented Aug 17, 2016

@andreasnoack @StefanKarpinski
My original issue (and title) was about making v*mx work
when v is a real 1-dimensional vector and not a 1xn matrix,
and returning the result as a real 1-dimensional vector.

Is this your present target?

Or is it just making v'mx work efficiently (without creating a temporary)
but still returning the result as a 1xn matrix?

@goszlanyi
Copy link
Author

goszlanyi commented Aug 17, 2016

It seems that this simple code does what I miss.
It is fast and it returns a real 1-dimensional vector.

import Base.*
*(v::Vector{Float64},mx::Matrix{Float64}) = BLAS.gemv('T',mx,v)

v = rand(3)
mx = rand(3,3)

v*mx
3-element Array{Float64,1}:
 0.786774
 0.821454
 0.444747

Update:
Unfortunately, this is not the general solution.
Integer vectors and matrices are not handled by BLAS.gemv.

@andreasnoack
Copy link
Member

Or is it just making v'mx work efficiently (without creating a temporary) but still returning the result as a 1xn matrix?

Yes. The idea was that this issue tracks the problem with the temporary. For you other suggestion please refer to #42

@Jutho
Copy link
Contributor

Jutho commented Aug 21, 2016

I don't understand this issue. v*mx is not standard mathematical notation right. Why not just write it as you suggest to implement it: BLAS.gemv('T',mx,v) is obtained by writing mx.'*v or just mx'*v if your working over the reals.

@goszlanyi
Copy link
Author

I respectfully disagree.

v*mx is standard mathematical notation that contracts
the last (and only) index of v and the first index of mx.

Otherwise we should also leave mx*v undefined
in the case when v is a true 1-d vector.

@Jutho
Copy link
Contributor

Jutho commented Aug 21, 2016

I don't think there is a or one standard mathematical definition for *; it's highly context dependent. But you are right to suggest other definitions than the one currently adopted in julia, which I agree is indeed a bit hard to capture in one `mathematical' operation.

Can I assume that your suggestion generalises to using it in a kind of multilinear algebra context as some default tensor contractions operator: contract the last index of the first object with the first index of the second, irrespective of the rank of the two tensors involved?

@goszlanyi
Copy link
Author

goszlanyi commented Aug 21, 2016

I did not dare to go as far as multilinear algebra.
I have just suggested to clearly separate the three types of products in linear algebra.
See #42.

@Jutho
Copy link
Contributor

Jutho commented Aug 22, 2016

What are the three types of product in linear algebra? I still don't see where vector times matrix is some standard mathematical product in any given context.

Either one works in MATLAB style where everything is matrices and * always denote matrix multiplication without discussion, or you work in some more abstract setting and you indeed need to define what * means. In abstract linear algebra, I know of inner and outer products, but most of the time * in code is neither of those two.

In an abstract setting, matrices correspond to linear mappings from vectors to vectors. So in mx*v I read * as function application. Composing such linear mappings gives rise to matrix matrix multiplication. So in m1*m2 I read * as function composition . Given that these two different operations are rather close, I think it is ok to use the same symbol for both.

There is also the concept of linear forms / covectors which map vectors to numbers. These can be represented as one-dimensional objects and are isomorphic to vectors, namely using the inner product. To any linear form f, there corresponds a vector w such that f(v) = dot(w,v) for any v. You can also compose such a linear form with a linear mapping (represented by a matrix), which I think is the operation I think you are referring to. But I don't see why this would be a standard use of * in mathematics. Such linear forms are isomorphic to vectors, but not identical, and thus julia currently forces you to act with the transpose on a vector in order to get the corresponding covector (well, not exactly, but that is indeed the point of issue #42 ).

I am not defending that this choice is ideal or optimal, but I also don't see how v*mx would in any way be more standard? If any, I would argue that you actually need to use the dot product for this operation, so that this should be written as dot(v,mx). Maybe that's actually a possibility which is open, since currently dot is not defined for a combination of a vector and a matrix argument.

@Jutho Jutho closed this as completed Aug 22, 2016
@Jutho Jutho reopened this Aug 22, 2016
@Jutho
Copy link
Contributor

Jutho commented Aug 22, 2016

My apologies, wrong button.

@goszlanyi
Copy link
Author

goszlanyi commented Aug 22, 2016

I find instructive all what you wrote.
Maybe it should be copied to #42, and keep the present issue to track
just the problem with the temporary, as suggested by @andreasnoack.

@Jutho
Copy link
Contributor

Jutho commented Aug 22, 2016

Yes indeed. My apologies for leading the original discussion astray. I'll make a similar post there and agree that, for now, v'mx should work efficiently and produce 1xn matrix.

@StefanKarpinski StefanKarpinski added help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Oct 27, 2016
@andreasnoack
Copy link
Member

Fixed by JuliaLang/julia#19670

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants