-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
^ is slow #2741
Comments
Good point. This should be noted in the performance tips. This is something that is clearly on the roadmap for improvement - and it is not that the vectorized code performance is bad, but we need better memory allocation and GC. Also check out the |
Hi @jtravs, if you could show a specific code example it will help me determine if there is a performance bug that could be fixed. |
Hi Jeff (and others), thanks for the responses. Here is the code I first noticed this issue on: function f(in::Array{Float64,2}, out::Array{Float64,2})
out = in.^3
end
function g(in::Array{Float64,2}, out::Array{Float64,2})
for i=1:size(in,2)
for j=1:size(in,1)
out[j,i] = in[j,i]*in[j,i]*in[j,i]
end
end
end
A = rand(8192, 200);
P = similar(A)
@time f(A, P);
@time f(A, P);
@time g(A, P);
@time g(A, P);
Q = similar(A)
f(A,P);
g(A,Q);
println(all(Q == P)); I get:
|
One source of your performance trouble is that your
|
I realize my notion of what is or isn't intuitive in programming languages may not be typical, but I don't see how this could possibly do anything else. But, yes, John is totally right about these points. |
The problem is that intuitively accessible and logical consistent are largely distinct properties: Julia might not be able to function in any other way, but the fact that |
The real problem is that taking a number to a general power is much slower than 3 multiplies. |
@timholy go to the head of the class! All of the time is in the It turns out openlibm pow has a special case for |
I added cases for 3 and 4 in openlibm. Closing this as unrelated to array expressions. |
This is the Apple implementation: In general, their libm seems to have a bunch of stuff that is done with SSE. |
Maybe we should use system libm on mac? Can you try a quick performance comparison of |
Being able to choose the libm would be nice; then for example some users could use the optimized intel libM: http://software.intel.com/en-us/articles/implement-the-libm-math-library |
We already have the ability to choose between Cc: @staticfloat |
@JeffBezanson See experiments in https://gist.github.com/ViralBShah/5309148 Basically, except for your performance hack for small integer powers, Apple's pow is faster. |
APSL qualifies as a free software license according to FSF: It may be worth pulling in a few of these functions into openlibm. |
FSF-free doesn't necessarily mean it's license compatible (for instance, APSL is not GPL compatible). I'm not sure what the rules are for MIT. |
Ah yes, I forgot there are all those fine distinctions. It is such a shame that after all these decades of numerical computation, we do not have one open source high quality high performance libm implementation. |
The main reason it's not GPL compatible seems to be that you can't combine it with GPL code and simply release the whole thing as GPL, because the APSL requires that you still include license notices, and the GPL wants to be the whole shebang. Compliance with the license mainly seems to entail
It explicitly allows linking with other libraries, even non-free ones, as long as the license terms are followed. So linking and releasing don't seem to be an issue, as long as you're willing to comply with the license. Some obvious issues:
The first two might be cleared up by contacting Apple and/or the original developers, if anyone cares to (and hey, maybe the developers would be interested in Julia). |
I'd say GPL-incompatible is not acceptable. |
Yes, incompatibility with GPL is certainly a deal breaker. |
No problem, of course, but I'm curious why being incompatible with the GPL Don't get me wrong--I'm actually a fan of the GPL, though I don't think |
There are many GPL libraries everybody uses with julia, so the base system has to be at least GPL-compatible. |
Okay, so that would, e.g., allow julia to be included/embedded in a GPL |
openlibm is also something that could potentially be used more widely, and it would be nice to be GPL compatible. |
Is the discussion in this thread: factor = h/(norm(dr)^3) (where dr is a 3 element array) with: denom = sqrt(dr[1]_dr[1] + dr[2]_dr[2] + dr[3]_dr[3]) gives a very big speed-up. Thanks, |
It's worth noting that part of the speedup is due to replacing BLAS.norm with the explicit formula, but the majority of the benefit comes from avoiding even the new-and-improved
to
leads to a substantial performance hit. |
This issue has a misleading title. Perhaps we should have a new issue. |
I suppose another question is if this optimization is safe. I tested |
That is a very good point. |
That's interesting because pairwise summation tends to be more accurate than iterative summation. This is a case where you've found pairwise multiplication of the same value to be less accurate than iterative multiplication. Clearly it's a rather different issue because since you're dealing with the same value over and over again, but I wonder if it's really generalizable or not. Is iterative always worse than pairwise? Or are there situations where the pairwise optimization gives better results than iterative? |
Actually, he compared pairwise against libm pow, not against iterated multiplication. I've just updated the gist with it. |
Ah, sorry. I misunderstood that. |
I added a special case to openlibm for y==4. Perhaps that was ill-advised. However we are using the |
Yes, it seems this must be calling something besides openlibm, which would also explain why |
|
For me |
It's definitely not calling libm, because otherwise |
No, but LLVM does. It replaces |
Ah, didn't realize that. That is nice. |
Any other work to be done here? Seems like this has evolved over a few different issues. |
Still waiting on a fix for the LLVM bug so that we can use |
Yes, that is fixed in LLVM 3.6 |
So, once we upgrade to LLVM 3.6, we can re-enable the |
We, can do it right now with |
This is my second day using Julia and I'm getting increasingly excited!
However, I nearly didn't make it even this far, as my early test programs had awful performance, and I nearly gave up with the common statement "beautiful language but the performance sucks". This was after carefully reading the documentation, especially the performance tips section.
Luckily, I happened across this discussion:
https://groups.google.com/d/topic/julia-dev/_UZ2A_Jp8Jc/discussion
and after changing one line of code from using A.^3, where A is a large two dimensional array, to using a nested for loop, I got performance faster than my C++ code (with an almost fair comparison).
I am merely suggesting here that a sentence is added to the performance tips section to note that currently, explicit loops can be much faster than array operations. Such a note would have saved me time, and prevented me from almost abandoning julia.
Thanks for making a truly beautiful (and fast) language,
John
The text was updated successfully, but these errors were encountered: