Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal codegen for abs-diff style functions #100810

Closed
Kmeakin opened this issue Jul 26, 2024 · 1 comment · Fixed by #102137
Closed

Suboptimal codegen for abs-diff style functions #100810

Kmeakin opened this issue Jul 26, 2024 · 1 comment · Fixed by #102137
Assignees
Labels

Comments

@Kmeakin
Copy link
Contributor

Kmeakin commented Jul 26, 2024

https://godbolt.org/z/doWPnqfs6

#include <stdint.h>

typedef uint32_t u32;
typedef int32_t i32;

inline u32 umax(u32 x, u32 y) { return x > y ? x : y; }
inline u32 umin(u32 x, u32 y) { return x < y ? x : y; }

inline u32 smax(i32 x, i32 y) { return x > y ? x : y; }
inline u32 smin(i32 x, i32 y) { return x < y ? x : y; }

u32 src1(u32 x, u32 y) { return umax(x, y) - umin(x, y); }
u32 tgt1(u32 x, u32 y) { return x > y ? x - y : y - x; }

u32 src2(u32 x, u32 y) { return umin(x, y) - umax(x, y); }
u32 tgt2(u32 x, u32 y) { return x < y ? x - y : y - x; }

u32 src3(i32 x, i32 y) { return smax(x, y) - smin(x, y); }
u32 tgt3(i32 x, i32 y) { return x > y ? x - y : y - x; }

u32 src4(i32 x, i32 y) { return smin(x, y) - smax(x, y); }
u32 tgt4(i32 x, i32 y) { return x < y ? x - y : y - x; }

The tgt form saves 1 instruction in the u32 case:

src1:
        cmp     w0, w1
        csel    w8, w0, w1, hi
        csel    w9, w0, w1, lo
        sub     w0, w8, w9
        ret

tgt1:
        sub     w8, w1, w0
        subs    w9, w0, w1
        csel    w0, w9, w8, hi
        ret

The difference is even more pronounced in the u128 case:

src1:
        cmp     x2, x0
        sbcs    xzr, x3, x1
        csel    x8, x1, x3, lo
        csel    x9, x0, x2, lo
        cmp     x0, x2
        sbcs    xzr, x1, x3
        csel    x10, x0, x2, lo
        csel    x11, x1, x3, lo
        subs    x0, x9, x10
        sbc     x1, x8, x11
        ret

tgt1:
        subs    x8, x0, x2
        sbc     x9, x1, x3
        subs    x10, x2, x0
        sbc     x11, x3, x1
        sbcs    xzr, x3, x1
        csel    x0, x8, x10, lo
        csel    x1, x9, x11, lo
        ret
@RKSimon
Copy link
Collaborator

RKSimon commented Jul 28, 2024

#92576 should help here

RKSimon added a commit that referenced this issue Aug 5, 2024
Extend test coverage for #92576 - copied from existing x86 tests
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Aug 6, 2024
banach-space pushed a commit to banach-space/llvm-project that referenced this issue Aug 7, 2024
banach-space pushed a commit to banach-space/llvm-project that referenced this issue Aug 7, 2024
banach-space pushed a commit to banach-space/llvm-project that referenced this issue Aug 7, 2024
…100810

Extend test coverage for llvm#92576 - copied from existing x86 tests
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this issue Aug 15, 2024
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this issue Aug 15, 2024
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this issue Aug 15, 2024
…100810

Extend test coverage for llvm#92576 - copied from existing x86 tests
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Aug 19, 2024
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Aug 20, 2024
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Aug 21, 2024
@EugeneZelenko EugeneZelenko added llvm:SelectionDAG SelectionDAGISel as well and removed llvm:optimizations labels Aug 21, 2024
cjdb pushed a commit to cjdb/llvm-project that referenced this issue Aug 23, 2024
dmpolukhin pushed a commit to dmpolukhin/llvm-project that referenced this issue Sep 2, 2024
RKSimon added a commit that referenced this issue Sep 4, 2024
…ubo x, y) -> abdu(x, y)" fold (and neg equivalent)

Handle cases where CGP has merged the CMP+SUB into a USUBO node - improves a few outstanding niggles from #100810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants