Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SGEMM_DIRECT Kernel based on SME1 #5084

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

vaiskv
Copy link

@vaiskv vaiskv commented Jan 19, 2025

This PR contains support for sgemm_direct kernel based on SME1 architecture.
sgemm_direct kernel handles a special case of cblas_sgemm() level 3 API where aplha =1 and beta=0.

@martin-frbg
Copy link
Collaborator

Thanks. Convenient that this special case does not imply debugging TRMM and SYMM as with the general case SGEMM kernel in #5011 that I still hope to get to soon. :/
Having unguarded -march flags that require a recent compiler is not going to work for all users however, I think we'll need the same kludge as used on x86_64 - ifdef the source on HAVE_SME and use an empty function declaration in the #else branch if not available, then rely on the required compiler flags being provided by Makefile.arm64 (or kernel/Makefile in the DYNAMIC_ARCH case, as currently for SkylakeX and newer) . Also I don't think one can use HWCAP on Apple, there should be a feature flag HAVE_SME provided by the build scripts (and "eventually" by the runtime detection code in dynamic_arm64.c too)

@martin-frbg
Copy link
Collaborator

HarmonyOS doesn't seem to support HWCAP either, and AppleClang balks at the "else if" introduced in common_s.h. I'll see if I can unravel and test locally.

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

Sure, Thanks!. I am working on restructuring the code to HAVE_SME flag as per your suggestion.

common_s.h Outdated
@@ -213,9 +213,9 @@
#ifdef ARCH_X86_64
#define SGEMM_DIRECT_PERFORMANT gotoblas -> sgemm_direct_performant
#define SGEMM_DIRECT gotoblas -> sgemm_direct
#else
#else if ARCH_ARM64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#elif ARCH_ARM64

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@martin-frbg
Copy link
Collaborator

Embarassingly it looks as if I nuked the HAVE_SME I had put in cpuid_arm64.c back in november (#4971) with some later change... bad things still sometimes happen when I jump between machines that are not always on the internet :(

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

I have a question:
I am compiling the library with the following command:

make BINARY=64 CC=aarch64-linux-android33-clang ONLY_CBLAS=1 HOSTCC=gcc TARGET=ARMV9SME DYNAMIC_ARCH=1

From the interface/gemm.c file, SGEMM_DIRECT kernel gets compiled only when DYNAMIC_ARCH=1. So when the library is compiled with DYNAMIC_ARCH=1 and if the TARGET is set to ARMV8 instead of AMRV9SME, does the function defined in kernel/arm64/sgemm_direct_arm64_sme1.c also be part of the library? I am assuming it won't be part of it as we have guarded the file with HAVE_SME. But then how can we ensure the library is supported on all Arm targets (Arm v8 , v9 etc)?

@martin-frbg
Copy link
Collaborator

With DYNAMIC_ARCH, TARGET is only used for the common code (interface/gemm.c and all the other interfaces, driver/level3 and so on), and the codes under kernel/arm64 are compiled in a loop with TARGET_CORE set to each of the individual models ARMV8, ARMV8SVE, etc. supported in DYNAMIC_ARCH configuration. So what I tried to express is that your kernel/arm64/sgemm_direct_arm64_sme1.c should look roughly like

#ifdef HAVE_SME
void CNAME (... 
{
your sme code goes here
}
#else 
void CNAME(...)
{}
#endif

so that the compiler finds something to compile (even if it is an empty function) whether it is running for a target with -march=armv9+sme... temporarily defined or not.
And maybe instead of string comparing the current cpu name in the DYNAMIC_ARCH situation at runtime, we should have a supports_sme() function in driver/others/dynamic_arm64.c that does the AT_HWCAP call (or its alternatives for other platforms) and returns 0 or 1, like the support_avx512 is used in interface/gemm.c for x86_64

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

Got it. I will update the code and push the updated patch. Thanks!

@martin-frbg
Copy link
Collaborator

Umm, TARGET=ARMV9SME tells me you're already building on AymenQ's unmerged #5011 ? In that case I might refrain from putting back the HAVE_SME and we could just live with the TARGET name(s) like in the Skylakex sgemm_direct kernel.
(I'm just unsure if the Apple M4 is going to be that representative of other SME-capable Arm64 targets to come)

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch 3 times, most recently from e875f09 to 3bce73c Compare January 23, 2025 05:37
@martin-frbg
Copy link
Collaborator

Needs

#ifndef HWCAP2_SME
#define HWCAP2_SME 1<<23
#endif

in dynamic_arm64.c

@martin-frbg
Copy link
Collaborator

Also, kernel/setparam-ref.c will need a small expansion of the ifdef block referring to sgemm_direct - currently this has only the "ifdef x86_64" part, which is why DYNAMIC_ARCH fails to build on all arm64 platforms in CI.

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from 3bce73c to d3ef3a4 Compare January 24, 2025 07:34
@martin-frbg
Copy link
Collaborator

Hmm, seems that your "conflicting" changes in cmake/system.cmake have already been in the develop branch since early december (PR 5003), that part can probably be removed from this PR ?
The other conflict that git cannot resolve automatically appears to be caused by my merging the "thread throttling profile for NEOVERSEV1" into interface/gemm.c - I think it is just the line differences that are throwing the automation off.

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from d3ef3a4 to da50c80 Compare January 27, 2025 05:52
@vaiskv
Copy link
Author

vaiskv commented Jan 27, 2025

Resolved merge conflicts in interface/gemm.c file. Also rebased cmake/system.cmake with develop branch.

@vaiskv vaiskv requested a review from martin-frbg January 27, 2025 06:02
@martin-frbg
Copy link
Collaborator

Can you please clarify if your PR is meant to depend on #5011 ?

@vaiskv
Copy link
Author

vaiskv commented Jan 27, 2025

This PR is independent of #5011. This PR has SME 1 implementation for SGEMM_DIRECT kernel for targets that support SME 1 feature whereas the former has SME 2 implementation for SGEMM kernel.

@martin-frbg
Copy link
Collaborator

OK, thanks, that's what I originally assumed - but then there must be some missed bits about the ARMV9SME target it introduces.

@vaiskv
Copy link
Author

vaiskv commented Jan 28, 2025

There are still few CI pipeline failures related mac OS and fortran. Can you please let me know how to resolve these failures?

OK, thanks, that's what I originally assumed - but then there must be some missed bits about the ARMV9SME target it introduces

I think I have added the necessary changes required for ARMV9SME target. Can you please help check if we are good here?

@martin-frbg
Copy link
Collaborator

That was why I was asking - seems the ARMV9SME dependencies for building the sgemm_direct kernels do not yet work in DYNAMIC_ARCH builds (at least).

@vaiskv
Copy link
Author

vaiskv commented Jan 28, 2025

I got your point now.
At my end I am using NDK toolchain and using DYNAMIC_ARCH=1 for building OpenBLAS and that passes.
Is there any way to check the CI builds logs before pushing the patch?

@martin-frbg
Copy link
Collaborator

I think you should be able to open the logs by clicking on the respective "Details" link, except for the "Jenkins" CI (which runs on IBM Power and Zarch) that requires a concurrent login to the CI host for some reason.
I have gotten a native build to pass just now (running on a NeoverseN1) and need to check that I made only the minimal changes necessary, but it is something about the HAVE_SME not being "seen" by kernel/Makefile.L3

@martin-frbg
Copy link
Collaborator

could you please check if it still builds for you with the NDK when you use this in place of kernel/Makefile.L3 ?

Makefile.L3.txt

@vaiskv
Copy link
Author

vaiskv commented Jan 29, 2025

Thanks for sharing the Makefile.L3 !

With this change, when DYNAMIC_ARCH=1 and TARGET=ARMV8 or ARMV8SVE, the build is successful. The generated library has the sgemm direct kernel with SME1 implementation.

But when we set TARGET= ARMV9SME DYNAMIC_ARCH=1, the build fails with following error:

../kernel/arm64/sgemm_direct_sme1.S:210:make[1]: *** [Makefile.L3:844: sgemm_direct_sme1_preprocess_ARMV8.o] Error 1 make[1]: *** Waiting for unfinished jobs.... 5: error: instruction requires: sme st1w {za3h.s[w13, #1]}, p7, [x10, x17, lsl #2]

It looks like it is building the SME source files for ARMV8 target when we are setting TARGET=ARMV9SME DYNAMIC_ARCH=1 .

Also, when we just set TARGET=ARMV9SME, build is successful and the library has the sgemm_direct kernel with SME1 implementation.

@martin-frbg
Copy link
Collaborator

Sorry, made a silly mistake there - it needs to be $(TARGET_CORE) instead of plain $(CORE) in line 105 of Makefile.L3

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from 21fdf9b to 78a9b8f Compare January 31, 2025 06:54
@vaiskv
Copy link
Author

vaiskv commented Jan 31, 2025

Thanks, with this build is passing with TARGET= ARMV9SME DYNAMIC_ARCH=1.

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from 9864754 to bdc8cae Compare January 31, 2025 09:21
@martin-frbg
Copy link
Collaborator

martin-frbg commented Jan 31, 2025

Getting closer.. failures appear to be from jobs that use gcc 11 (doesn't yet support sme in the -arch, so should not set HAVE_SME or add ARMV9SME to targets - i.e. the c_check script should check for and produce NO_SME=1 similar to how it does NO_SVE) and from cmake where cmake/arch.cmake does not yet have ARMV9SME in the DYNAMIC_CORE list

@martin-frbg
Copy link
Collaborator

btw I'm not sure I understand why your PR needs to disable SME on MacOS, is this not expected to work on the M4 ?

@vaiskv
Copy link
Author

vaiskv commented Feb 2, 2025

Added the compiler check in c_check to check if SME is available.

In cmake/arch.cmake, DYNAMIC_CORE, ARMV9SME target is already added if compiler is GCC >=14 or LLVM >= 19.
if (${CMAKE_C_COMPILER_VERSION} VERSION_GREATER_EQUAL 14) # SME ACLE supported in GCC >= 14 set(DYNAMIC_CORE ${DYNAMIC_CORE} ARMV9SME) endif()
if (${CMAKE_C_COMPILER_VERSION} VERSION_GREATER_EQUAL 19) # SME ACLE supported in LLVM >= 19 set(DYNAMIC_CORE ${DYNAMIC_CORE} ARMV9SME) endif()
Is it okay to add like this?

btw I'm not sure I understand why your PR needs to disable SME on MacOS, is this not expected to work on the M4 ?

are you referring to the change in Makefile.system :
export MACOSX_DEPLOYMENT_TARGET=11.0 ifeq ($(C_COMPILER), GCC) export NO_SVE = 1 export NO_SME = 1 endif
The sgemm direct kernel is based on SME 1, so there shouldn't be any issue in running it on M4. I actually referred to PR #5011 for the changes required to add new target. It looks like for the said conditions NO_SVE is also set to 0. Please let me know if the approach has to be different here.

* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from f5dcbf7 to c1d80e6 Compare February 2, 2025 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants