You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@simd splits cartesian iteration into a inner loop over the fast index and an outer loop over all other indices. I haven't had time to look into your issue, but it might be the case that in the nested for loop you are seeing vectorization of the inner most loop and then unrolling of the second loop?
LLVM has trouble with analysing our cartesian iteration and as such that might be the problem.
What you can try is to use @nloops in a @generated function instead of using ntuple with Val.
From what I could gather (don't trust me completely on this), the loops are unrolling/vectorizing correctly for 2D/3D without views.
With views, the 2D code unrolls/vectorizes, but not in 3D (does not unroll, I suspect). If you try to inline the code (f in the code above), the code allocates and slows down further!
I can try @nloops; last time I checked it was slower than the obvious code even without views, but that was some time ago :)
Update: Just went back to an old @nloops code I had, it now is as fast as the plain code with views, great! However, that code still looks like a DSL and much less readable compared to CartesianIndices...
Hello,
Cross posting as the post got no traction on Discourse, and I would need this for some work I am doing :)
https://discourse.julialang.org/t/performance-of-cartesian-indices-and-views/34820
Is this optimization expected to happen or am I too greedy/doing something wrong?
P.s: still a huge fan of CartesianIndices anyway!
Cheers!
The text was updated successfully, but these errors were encountered: