-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduction operations like sum(::Vec) generate scalar code #91
Comments
Right now, we simply call the respective vector reduction intrinsic from LLVM: https://llvm.org/docs/LangRef.html#vector-reduction-intrinsics. However, there is this note:
So maybe we should set that. Would be a nice first PR to add it. |
For the PR: I wonder why this reassoc-behavior not triggered by adding |
But |
I would say this is closed by #92. |
Thank you! |
Hi,
I am learning about SIMD implementation. In the following example, an eight-component dot-product is implemented
Here, the multiplication generates a single AVX256 vmulps-instruction and a reduction needs to be performed for the "horizontal" sum operation. I would expect a hierarchical approach, but when looking at the generated assembly, one can notice that the sum is basically performed in a scalar manner:
however, when the reduction is typed out manually
then, two fancy
vhaddps
instructions are generated and the performance increases a lotHave I misused the
sum
operation or is it not (yet?) intended to generate that kind of "hierarchical" approach?Is there some other way to geht the
vhaddps
(or similar) instructions here? (is this a missing-@fastmath problem?)kind regards,
Christian
The text was updated successfully, but these errors were encountered: