-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU lowering of abs is bad for i16 vectors with more than 2 elements #94606
Comments
Hi! This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:
If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below. |
@llvm/issue-subscribers-good-first-issue Author: Matt Arsenault (arsenm)
We currently get a simple 2-instruction expansion in the v2i16 case, but larger vectors scalarize and produce bad code instead of vector splitting
|
@llvm/issue-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm)
We currently get a simple 2-instruction expansion in the v2i16 case, but larger vectors scalarize and produce bad code instead of vector splitting
|
@arsenm Also, does the vector size affect the sequence size? |
SelectionDAG makes this difficult because of how it assumes vectors work. If the wider vector types were illegal, the vector legalizer would produce the correct code. We usually work around this by custom lowering wider vector operations and then doing the split ourselves.
No, any 16-bit vector should be decomposed into 2-element sequences. 6 x i16 and 8 x i16 should just be decomposed into 2 x i16 pieces. Ideally the 3 x case would be a 2 x i16 plus a scalar |
We currently get a simple 2-instruction expansion in the v2i16 case, but larger vectors scalarize and produce bad code instead of vector splitting
The text was updated successfully, but these errors were encountered: