Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][llvm] Return failure from type converter for n-D scalable vectors #65450

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

c-rhodes
Copy link
Collaborator

@c-rhodes c-rhodes commented Sep 6, 2023

This patch changes vector type conversion to return failure on n-D scalable vector types instead of asserting.

This is an alternative approach to #65261 that aims to enable lowering of Vector ops directly to ArmSME intrinsics where possible, and seems more consistent with other type conversions. It's trivial to hit the assert at the moment and it could be interpreted as n-D scalable vector types being a bug, when they're valid types in the Vector dialect.

By returning failure it will generally fail more gracefully, particularly for release builds or other builds where assertions are disabled.

This patch changes vector type conversion to return failure on n-D
scalable vector types instead of asserting.

This is an alternative approach to llvm#65261 that aims to enable lowering
of Vector ops directly to ArmSME intrinsics where possible, and seems
more consistent with other type conversions. It's trivial to hit the
assert at the moment and it could be interpreted as n-D scalable vector
types being a bug, when they're valid types in the Vector dialect.

By returning failure it will generally fail more gracefully,
particularly for release builds or other builds where assertions are
disabled.
@banach-space
Copy link
Contributor

Thank you, this is far less intrusive and feels much more canonical than #65261 - this would be my preference :)

This should be sufficient for two types of lowerings from Vector to ArmSME:

  1. Vector -> ArmSME Ops -> LLVM SME intrinsics (with custom ops),
  2. Vector -> LLVM SME intrinsics (without custom ops).

ATM, we only use 1. I am guessing that you'd like to use 2. for vector.outerproduct to lower to SME's MOPA instructions, e.g. FMOPA? I am still wondering whether mixing 1. and 2. is the right approach.

But that's something we can discuss in a different patch,, this change makes sense regardless.

LGTM, but please wait for more reviews before landing this :)

@c-rhodes
Copy link
Collaborator Author

c-rhodes commented Sep 6, 2023

ATM, we only use 1. I am guessing that you'd like to use 2. for vector.outerproduct to lower to SME's MOPA instructions, e.g. FMOPA? I am still wondering whether mixing 1. and 2. is the right approach.

But that's something we can discuss in a different patch,, this change makes sense regardless.

That's correct, I want to lower vector.outerproduct directly to intrinsics, at least initially as this seems to work. But this would also open this up for other ops where possible.

LGTM, but please wait for more reviews before landing this :)

Thanks for reviewing!

c-rhodes added a commit to c-rhodes/llvm-project that referenced this pull request Sep 7, 2023
This patch adds support for lowering vector.outerproduct to the ArmSME
MOPA intrinsic for the following types:

  vector<[8]xf16>,  vector<[8]xf16>  -> vector<[8]x[8]xf16>
  vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16>
  vector<[4]xf32>,  vector<[4]xf32>  -> vector<[4]x[4]xf32>
  vector<[2]xf64>,  vector<[2]xf64>  -> vector<[2]x[2]xf64>

The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to BFMOPA
(non-widening) [2].

Note at the ISA level these variants are implemented by different
architecture features, these are listed below:

  FMOPA (non-widening)
    * half-precision   - +sme2p1,+sme-f16f16
    * single-precision - +sme
    * double-precision - +sme-f64f64
  BFMOPA (non-widening)
    * half-precision   - +sme2p1,+b16b16

There's currently no way to target different features when lowering to
ArmSME. Integration tests are added for F32 and F64. We use QEMU to run
the integration tests but SME2 support isn't available yet, it's
targeted for 9.0, so integration tests for these variants excluded.

Masking is currently unsupported.

Depends on llvm#65450.

[1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate-
[2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-
Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Yeah, much better!

assert(
(!type.isScalable() || (type.getRank() == 1)) &&
"expected 1-D scalable vector (n-D scalable vectors are not supported)");
if (type.isScalable() && (type.getRank() > 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle the 0-D case gracefully here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 0-D it won't reach here having returned already

@c-rhodes c-rhodes merged commit 38eb55a into llvm:main Sep 11, 2023
@c-rhodes
Copy link
Collaborator Author

Thanks for reviewing!

c-rhodes added a commit that referenced this pull request Sep 14, 2023
This patch adds support for lowering vector.outerproduct to the ArmSME
MOPA intrinsic for the following types:

  vector<[8]xf16>,  vector<[8]xf16>  -> vector<[8]x[8]xf16>
  vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16>
  vector<[4]xf32>,  vector<[4]xf32>  -> vector<[4]x[4]xf32>
  vector<[2]xf64>,  vector<[2]xf64>  -> vector<[2]x[2]xf64>

The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to
BFMOPA
(non-widening) [2].

Note at the ISA level these variants are implemented by different
architecture features, these are listed below:

  FMOPA (non-widening)
    * half-precision   - +sme2p1,+sme-f16f16
    * single-precision - +sme
    * double-precision - +sme-f64f64
  BFMOPA (non-widening)
    * half-precision   - +sme2p1,+b16b16

There's currently no way to target different features when lowering to
ArmSME. Integration tests are added for F32 and F64. We use QEMU to run
the integration tests but SME2 support isn't available yet, it's
targeted for 9.0, so integration tests for these variants excluded.

Masking is currently unsupported.

Depends on #65450.

[1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate-
[2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Sep 14, 2023
This patch adds support for lowering vector.outerproduct to the ArmSME
MOPA intrinsic for the following types:

  vector<[8]xf16>,  vector<[8]xf16>  -> vector<[8]x[8]xf16>
  vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16>
  vector<[4]xf32>,  vector<[4]xf32>  -> vector<[4]x[4]xf32>
  vector<[2]xf64>,  vector<[2]xf64>  -> vector<[2]x[2]xf64>

The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to
BFMOPA
(non-widening) [2].

Note at the ISA level these variants are implemented by different
architecture features, these are listed below:

  FMOPA (non-widening)
    * half-precision   - +sme2p1,+sme-f16f16
    * single-precision - +sme
    * double-precision - +sme-f64f64
  BFMOPA (non-widening)
    * half-precision   - +sme2p1,+b16b16

There's currently no way to target different features when lowering to
ArmSME. Integration tests are added for F32 and F64. We use QEMU to run
the integration tests but SME2 support isn't available yet, it's
targeted for 9.0, so integration tests for these variants excluded.

Masking is currently unsupported.

Depends on llvm#65450.

[1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate-
[2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…ors (llvm#65450)

This patch changes vector type conversion to return failure on n-D
scalable vector types instead of asserting.

This is an alternative approach to llvm#65261 that aims to enable lowering
of Vector ops directly to ArmSME intrinsics where possible, and seems
more consistent with other type conversions. It's trivial to hit the
assert at the moment and it could be interpreted as n-D scalable vector
types being a bug, when they're valid types in the Vector dialect.

By returning failure it will generally fail more gracefully,
particularly for release builds or other builds where assertions are
disabled.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
This patch adds support for lowering vector.outerproduct to the ArmSME
MOPA intrinsic for the following types:

  vector<[8]xf16>,  vector<[8]xf16>  -> vector<[8]x[8]xf16>
  vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16>
  vector<[4]xf32>,  vector<[4]xf32>  -> vector<[4]x[4]xf32>
  vector<[2]xf64>,  vector<[2]xf64>  -> vector<[2]x[2]xf64>

The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to
BFMOPA
(non-widening) [2].

Note at the ISA level these variants are implemented by different
architecture features, these are listed below:

  FMOPA (non-widening)
    * half-precision   - +sme2p1,+sme-f16f16
    * single-precision - +sme
    * double-precision - +sme-f64f64
  BFMOPA (non-widening)
    * half-precision   - +sme2p1,+b16b16

There's currently no way to target different features when lowering to
ArmSME. Integration tests are added for F32 and F64. We use QEMU to run
the integration tests but SME2 support isn't available yet, it's
targeted for 9.0, so integration tests for these variants excluded.

Masking is currently unsupported.

Depends on llvm#65450.

[1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate-
[2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants