-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staged bsxfun and other broadcast operations #3100
Conversation
Very very cool!! And at 200 lines of code, admirably compact really. I think it's pretty likely that we will want this. |
Yes, very nice. Thanks for tackling this – it's been hanging over our heads ever since you convinced us it was the right thing to do. How about calling the module and file something less inscrutable like |
As a non-native speaker of Matlab and a native speaker of English, I really prefer the term broadcast over |
|
At least you can read |
I'm excited about this, and as usual you've done a remarkable job. My main concern was about performance with small-array inputs, where the overhead of computing sizes might be quite noticeable. I did some profiling, using
with the following highlights: out of 1100 total samples,
I tried writing an alternative version of
The main difference is that this goes to some effort to avoid allocation, but it's actually not that much longer or harder to understand. The results (after warm-up):
So this might be worthwhile. I suspected it was going to be harder to write a more efficient version of
This one is admittedly less pretty, but the timing results are surprising:
So this work seems to suggest we should be able to get a roughly 2x further performance increase for small-array inputs. |
This is a great piece of work! |
Really happy for the great response, everyone! @StefanKarpinski: Yeah, I've been feeling kinda the same way :) About the naming, I'm not a big fan of @timholy: Thanks for taking the time to go over those functions! I've incorporated your versions into the pull request. I've primarily been trying to get decent performance with big matrices so far, so there might be more places that could benefit from a gentle hand when it comes to small ones. One possibility to push that aspect even further could be to generate more of the code dependent on the number of arrays and their dimensions, but at the same time that would probably make it less maintainable. It's probably better to see what we can get by simple means first. |
It seems like everyone dislikes the name |
In MATLAB, If this function in Julia can accept more than two arguments, then |
+1 for |
Ok, I can do the renaming and take one final look over it, then it should be ready to merge. One thought: As I wrote it, There's still a number of things that would be nice to add, but those could come in subsequent pull requests:
I'm also hoping to find a nice abstraction to the broadcasting machinery, so that users can implement things similar to |
Another question: Is there a trick to make the makefile recognize new test scripts? I added
|
I'm also on-board for renaming it to |
I think the |
Ah, right, I forgot the operator. How about |
You want the operator first so that you can provide it using the do-block syntax. It also just seems better there, imo. |
Ok, I think that this should be good to go now. I did the rename (haven't touched the original |
Right. I always forget about that. |
Now I'll have to change my handle to @broadcastfan. Jokes aside, I'm very happy to see this development. As an exercise in learning Julia, I have been playing around with some operator overloading implementations of forward and reverse-mode automatic differentiation. Broadcasting operators require some care when propagating derivatives through them---you end up having to either broadcast or sum the derivatives. I'm hoping to find a nice way to use @toivoh's solution to do this. |
I thought about that. Still a good inside reference :-) |
This is really nice, and yes +1 for calling it It would be nice to also have an update to the manual and std library reference as part of this pull request, so that new users can actually learn and start using this. |
@bsxfan: It would definitely be nice if we can arrange so that you can reuse the broadcast machinery for AD, and I think that it should be quite doable. At least for forward AD, I guess that if you can do AD on The machinery already does support devectorization in the sense that you can pass custom elementwise functions such as I hope that I can get some time soon to update documentation. |
@toivoh, to stimulate your thinking about updating (in-place) broadcasting operations, here is an example of the kind of thing I need for AD: https://github.com/bsxfan/ad4julia/blob/master/Procrustes.jl As mentioned, I sometimes need to broadcast and sometimes to reduce dimensionality by summing. But the thing that kept me busy for a while was when and how to do the in-place updates.
|
I added |
… = f() Also provide separate broadcast!_function
Ok, I realized that it was too cute since |
So, what else should I do before this gets merged? Squash the commits again, anything else? |
I think it's fine as is. |
Staged bsxfun and other broadcast operations
This is fun: julia> rand(5,5) .+ [1:5]
5x5 Float64 Array:
1.87304 1.60196 1.67241 1.29966 1.28922
2.55065 2.6901 2.31729 2.18837 2.67263
3.31203 3.44544 3.34805 3.19786 3.40516
4.90888 4.3661 4.88753 4.14251 4.49846
5.31663 5.58011 5.70678 5.36226 5.4865
julia> rand(5,5) .+ [1:5]'
5x5 Float64 Array:
1.28892 2.52568 3.0037 4.18982 5.24107
1.10544 2.91998 3.52396 4.34637 5.44386
1.52712 2.58849 3.52277 4.30855 5.22099
1.39967 2.66229 3.74268 4.64086 5.93869
1.58634 2.55684 3.78382 4.36793 5.44376 |
Unfortunately the julia> A = rand(5,5);
julia> A ./ sum(A,1)
ERROR: argument dimensions must match
in promote_shape at operators.jl:161
in ./ at array.jl:909
julia> A ./ sum(A,2)
ERROR: argument dimensions must match
in promote_shape at operators.jl:161
in ./ at array.jl:909 That would be extremely handy for matrix normalization. |
Thanks! |
This can totally replace bsxfun, right? We should get rid of the old one. |
@StefanKarpinski: Ah, of course. There's an old method for @JeffBezanson: I should probably write some kind of fallback to deal with non-strided arrays first. Then we can get rid of the old |
EDIT: sorry for problem 1, that should not be a problem. |
Yeah, I don't see a problem with broadcasting for About |
It would be nice to have broadcast for the boolean operators as well. |
Definitely. I think that I need to figure out how read and write BitArrays with reasonable efficiency in a broadcast loop first, though; I don't want to introduce any major performance regressions along with the broadcast stuff. Will have to look at the My long term plan is to have an abstract type
There could be different implementations for different array types, and perhaps for special cases, such as an Then |
Looking at the |
@toivoh Here are two feature requests for your broadcasting tools:
|
You can already do this by e.g.
It's perfectly safe to use the destination as a source, but of course there will be some efficiency loss. I could provide a version of I must admit that I am quite unfamiliar with BLAS, so I'm not sure of in which kind of cases it could be used. If someone who is more familiar wants to special case some broadcast operations to call into BLAS, I could provide some guidance for the broadcast side of things. |
@toivoh I've just compared your suggested broadcasting update solution,
Now define:
And now call into BLAS:
Notes:
|
This is probably getting killed by the use of a lambda in |
One can achieve a faster broadcast-based solution like this:
I think the main reason why this is still slower than the handcoded loop is that the latter avoids creating a temporary 500 by 1000 matrix to store the RHS. |
@ViralBShah, @andreasnoackjensen I think the ger! function from https://github.com/bsxfan/ad4julia/blob/master/BlasX.jl should be added to Base.LinAlg.BLAS (and an syr! function too even though it is not in the ad4julia package). Also, perhaps the |
+1 for adding |
@bsxfan: can you try
instead of |
You are correct about |
@toivoh Unfortunately, predeclaring |
For this case, unless |
After several conference deadlines, I now have some time to spend on Julia. I am considering to revisit Devectorize.jl to let it support broadcasting, such that, this can be written as @devec d + x .* y |
I'm still surprised that it doesn't seem to get inlined. I could probably provide macro versions of @lindahua: Do you think that it would make sense to collaborate on a common back end to generate the broadcast loops? I still want to extend the machinery in |
This patch adds an implementation of
bsxfun(op::Function, As::Array...)
that generates inner loop functions depending on
op
, the number of argumentsAs
, and the dimension of the result.The machinery is also used to provide
.+ .- .* ./
forArray
argumentsbsxfun!(op::Function, As::Array...)
, which stores the result inAs[1]
Edit: Removed this one, pending a rewrite with separate output argument.get_bsxfun(op::Function)
andget_bsxfun!(op::Function)
to instantiate abroadcasting operation from a given per-element operation
bsx_getindex(A::AbstractArray, inds::Array...)
,a pointwise broadcasting indexing operation as described in Broadcasting pointwise indexing operator #2591,
and a matching
bsx_setindex!(A::AbstractArray, X::Array, inds::Array...)
It should be straightforward to extend the functions to work with
StridedArray
arguments instead of justArray
, but I'm uncertain of the proper interface to work with subarrays using manual linear indexing.The code generates inner loop functions such as
The example above is for
bsxfun(+, A, B)
with two-dimensional result.A linear index is updated using strides for each array, which should allow to accomodate
both broadcasting and subarrays.
On my machine, the timing example from #2991 gives
i.e. the generated
.-
is only 16% slower than the handcoded example in #2991.It's a somewhat complex piece of code (that generates code that generates code), and I think that it warrants some discussion. I could polish this forever, but I think that I need some reactions before moving forward. What do you guys think? Would something like this be useful? What parts of it?
Edit: Renamed
bsxfun
andbsx
tobroadcast
.