Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of CartensianIndices and Views #34838

Open
raminammour opened this issue Feb 21, 2020 · 3 comments
Open

Performance of CartensianIndices and Views #34838

raminammour opened this issue Feb 21, 2020 · 3 comments
Labels
performance Must go faster

Comments

@raminammour
Copy link
Contributor

Hello,

Cross posting as the post got no traction on Discourse, and I would need this for some work I am doing :)

https://discourse.julialang.org/t/performance-of-cartesian-indices-and-views/34820

Is this optimization expected to happen or am I too greedy/doing something wrong?

P.s: still a huge fan of CartesianIndices anyway!

Cheers!

@vchuravy
Copy link
Member

@simd splits cartesian iteration into a inner loop over the fast index and an outer loop over all other indices. I haven't had time to look into your issue, but it might be the case that in the nested for loop you are seeing vectorization of the inner most loop and then unrolling of the second loop?

LLVM has trouble with analysing our cartesian iteration and as such that might be the problem.
What you can try is to use @nloops in a @generated function instead of using ntuple with Val.

@raminammour
Copy link
Contributor Author

raminammour commented Feb 21, 2020

From what I could gather (don't trust me completely on this), the loops are unrolling/vectorizing correctly for 2D/3D without views.
With views, the 2D code unrolls/vectorizes, but not in 3D (does not unroll, I suspect). If you try to inline the code (f in the code above), the code allocates and slows down further!
I can try @nloops; last time I checked it was slower than the obvious code even without views, but that was some time ago :)
Update: Just went back to an old @nloops code I had, it now is as fast as the plain code with views, great! However, that code still looks like a DSL and much less readable compared to CartesianIndices...

@timholy
Copy link
Sponsor Member

timholy commented Feb 21, 2020

xref #9080. Given how long that has been open, I am beginning to wonder if we should special-case CartesianIndices iteration in the optimizer.

@brenhinkeller brenhinkeller added the performance Must go faster label Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants