-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of bit shift #13
Comments
Thanks for the report! I had not noticed this, the shift operations used in this package are directly generated by LLVM. There might be some possible tweaking to improve the situation, for "small" values in a "big" type, or even by converting to |
Note that @btime $(Ref(Int1024(2131)))[] << $(Ref(Unsigned(1)))[] is much faster (but still not fast enough). |
I got the at least part of the issue. When shift argument is signed, LLVM shifts both right and left and selects the result latter, which seems to drop performance for 256-bits+ integers I tried to change the Other then that, there is another issue with shifting big integers. I believe we are to expect some overhead for shifting many words. After all, there needs to be at least 3 operations for each word (e.g. if it is shifting left, then it needs to copy the upper portion of the first word, shift it right, and OR it with the second word shifted left). And the algorithm can get quite complex if you shift more bits than the word size. However, the amount of overhead for 512-bit and 1024-bit seems insane (and I confess I known nothing about assembly to understand WTH LLVM is doing). |
That seems indeed to be the problem. When LLVM cannot tell how large the shift is, it emits code for several shift lengths (small shift, more than one word, more than two words...) For instance, if the second argument is an UInt8 instead of an UInt, the time for shifting an 1024-bit integer is divided by two. If the shift length is constant, the time is again divided by 5. This sheds some light on how to handle the problem. I'll try to make some tests here and see if I can come up with a good solution. |
Great! julia> @btime $(Ref(Int1024(2131)))[] << $(Ref(1))[]
276.963 ns (0 allocations: 0 bytes)
4262
julia> @btime $(Ref(Int1024(2131)))[] << $(Ref(Unsigned(1)))[]
131.362 ns (0 allocations: 0 bytes)
4262
julia> @btime $(Ref(Int1024(2131)))[] << $(Ref(UInt8(1)))[]
51.531 ns (0 allocations: 0 bytes)
4262
julia> @btime $(Ref(Int1024(2131)))[] << $(Ref(true))[]
10.483 ns (0 allocations: 0 bytes)
4262 I didn't know that you can shift by a bool!
What do you mean by this? |
The compiler optimizes |
It seems the code generated by LLVM has been created and optimized for integers up to 128-bit, in which case emitting generic code and selecting the correct result (i.e. with I don't know any way to tell LLVM "this integer is between 0 and 63" (except, of course, with a code like |
Ok, I think I found a solution and it seems to work. What I did is to shift by steps. One function That way, it may take several shifts for higher values of I'll make tests to see if it is working properly and, if everything is ok, I'll send a pull request. |
I have made tests and the choices in my fork seem to be the best combination. These are the timings I'm getting with it:
I believe this performance is reasonable, considering that the shift becomes more complex as the length of |
One issue with that code is that the compiler cannot optimize shifting with constant values:
The only resolution I can think of is allow someone to write |
One could introduce the same mechanism as for I would argue that Of course, an even better method would be to mark the function as "please inline if the second argument is a known integer", but Julia's JIT infrastructure doesn't have such tags... |
In case of constant second-argument, the |
I realized that the compiler can understand an expression like Still not as fast as providing a constant value as the second argument to an intrinsic LLVM function, but it got closer:
|
A final note about performance. I noted that inlining all shift functions makes them almost as fast as intrinsic shift with constant In short, the only case where calling LLVM intrinsic with a compile-time constant |
Hello, great package!
Is this kind of performance regression expected?
The text was updated successfully, but these errors were encountered: