hip mfma tests #246

CRobeck · 2022-05-20T17:58:39Z

This PR adds basic functionality test of leveraging the matrix cores on AMD gfx908 and gfx90a hardware for dense matrix products.

…lemented for gfx908 and gfx90a

… not availble. kernel is currently empty

…tructions are available

…t/mfma

MrBurmark · 2022-06-01T15:57:02Z

This is looking much better.
The main thing to do now is to convert it to run in parallel on the gpu. I think its fine if what each thread does and the block size is different between the different tunings, as long as they're still similar enough to think of as different tunings of the same algorithm.

src/basic/MAT_FUSED_MUL_ADD-Hip.cpp

fix spacing

MrBurmark · 2022-07-12T18:30:27Z

src/basic/MAT_FUSED_MUL_ADD-Hip.cpp

+  constexpr Index_type Ne = m_Ne;
+  constexpr Index_type NeNe = m_Ne * m_Ne;
+
+  dim3 gridDim (1, 1, 1);


Is this right?

The mfma instructions operate on a per-wavefront basis, as opposed to per thread. We're using 4 groups of 16 threads for each outer product, so we only need single block per grid.

I'm worried that once we saturate flops for a CU we'll be leaving flops on the table.

MrBurmark · 2022-07-14T17:13:18Z

src/common/HipDataUtils.hpp

+  hipDeviceProp_t devProp;
+  hipGetDeviceProperties(&devProp, 0);
+  std::string gcnArchName(devProp.gcnArchName);
+  std::string hipArch = gcnArchName.substr(0, 6);


is 0, 6 the right thing for all architectures, aren't there 7 digit gpu names like gfx10##?

Right. Currently we're only using it to test gfx908 and gfx90a features but if we want to use this function more generally (e.g. for testing xnack-ness) I suppose we should really grab the entire string and then sub select it based on what we're going to query. The full name example would be like:
gfx908:sramecc-:xnack-
gfx1010:sramecc-:xnack-

src/common/HipDataUtils.hpp

MrBurmark · 2022-07-14T17:15:24Z

src/basic/MAT_FUSED_MUL_ADD.hpp

+#define MAT_FUSED_MUL_ADD_BODY                                               	     \
+  Real_type dot = 0;                                                         	     \
+  for (Index_type k = 0; k < Ne; ++k) {                                      	     \
+    dot += A[row*Ne + k + ii*(Ne*Ne)] * B[k*Ne + col + ii*(Ne*Ne)];          	     \
+  }                                                                          	     \
+  D[row*Ne + col + ii*(Ne*Ne)] = dot;                                        	     \


Is this supposed to be doing D = A*B or D = A*B+C?

The mfma instructions are for computing D = A x B + C but we're assuming for this case C is zeros and ignored. It's defined this way so we can expand future cases with a non-trivial C matrix.

The name of the kernel is confusing then if we're not actually doing the ADD part.

MrBurmark · 2022-07-14T17:17:54Z

src/basic/MAT_FUSED_MUL_ADD-Seq.cpp

+
+    startTimer();
+    for (Index_type irep = 0; irep < run_reps; ++irep) {
+        for(Index_type ii = 0; ii != (N/(Ne*Ne)); ++ii){


Is N/(Ne*Ne) the number of elements, should we make it a named quantity?

Co-authored-by: Jason Burmark <MrBurmark@users.noreply.github.com>

MrBurmark · 2022-07-14T20:40:34Z

src/apps/DEL_DOT_VEC_2D-Hip.cpp

@@ -122,7 +122,7 @@ void DEL_DOT_VEC_2D::runHipVariantImpl(VariantID vid)

      const size_t grid_size = RAJA_DIVIDE_CEILING_INT(iend, block_size);

-      hipLaunchKernelGGL((lambda_hip_forall<block_size, decltype(deldotvec2d_lambda)>),
+      hipLaunchKernelGGL((lambda_hip_forall_1D<block_size, decltype(deldotvec2d_lambda)>),


Should this update be in another PR to keep this focused on the Kernel? I try to keep cuda and hip in sync.

I can revert it and/or move it to a new branch, it got pulled in from one of your review comments and I foresee adding a lambda_hip_forall using 2D thread indexing and wanted to get in front of it.

Let's not rename it here.

MrBurmark · 2022-07-18T20:55:08Z

src/basic/MAT_FUSED_MUL_ADD-Hip.cpp

+  const Index_type N_Elem = N/(Ne*Ne);
+  for(Index_type ii = 0; ii != N_Elem; ++ii){


Do we want to parallelize over elements?

Since this is to mirror an FE mass matrix solve we need to assume that each element is independent and shares no common data (they could but this is the worst case).

CRobeck and others added 30 commits April 26, 2022 12:07

adding inital infrastructure for MFMA test

1594a50

working hip mfma with matrix core builtins

e3d5aac

fixing incorrect default problem size

332067b

fixing flop calc

8aa020b

fixing some variable names

bc9770a

setting up problem size infrastructure correctly

774d742

fixing results array storage

83700d6

finish multi matrix support

1bbd473

updating algorithm description

5591849

adding a few more guard rails for fma builtins. kernel still only imp…

4653c23

…lemented for gfx908 and gfx90a

adding some ifdefs to call a seperate mfma kernel if hardware support…

bd47d35

… not availble. kernel is currently empty

adding reference basic mat-mat kernel for comparison when no mfma ins…

fde4b65

…tructions are available

Merge branch 'develop' into test/mfma

47925bb

adding problem size support to base hip kernel

7174382

adding problem size support to base hip kernel

f5fe861

Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…

23866ad

…t/mfma

Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…

5b7bd12

…t/mfma

Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…

5ad1472

…t/mfma

Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…

67c02c4

…t/mfma

Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…

f3b4488

…t/mfma

updating top level cmake file for new mfma test

799e17c

updating cmake file to add mfma test to basic list

451dee2

adding mfma test to raja perf suite infrastructure

c015117

adding mfma base and header files

6a6ccd7

adding mfma seq variant skeleton

43c1c6e

adding mfma omp and omp offload variant skeleton structure

2e75981

adding mfma cuda variant skeleton

5181219

add inital set of HIP mfma varaint with builtin matrix core instructions

ef451af

Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…

7eab287

…t/mfma

add mat_fused_mul_add cuda variant kernel

0bd36f4

MrBurmark reviewed Jun 1, 2022

View reviewed changes

src/basic/MAT_FUSED_MUL_ADD-Hip.cpp Outdated Show resolved Hide resolved

CRobeck and others added 2 commits June 6, 2022 10:01

update spacing

43e277b

fix spacing

44c36b4

CRobeck commented Jun 6, 2022

View reviewed changes

src/basic/MAT_FUSED_MUL_ADD-Hip.cpp Outdated Show resolved Hide resolved

artv3 reviewed Jun 6, 2022

View reviewed changes

src/basic/MAT_FUSED_MUL_ADD-Hip.cpp Outdated Show resolved Hide resolved

CRobeck and others added 7 commits June 6, 2022 13:43

formating clean up

8e3fbf6

move getHipArch into common header file as a static free function

bd2d44c

update out loop for raja hip/cuda variant

49084ef

update unroll pragma to RAJAPERF_UNROLL

5169e6e

fix RAJAPERF_UNROLL def

a400d93

fix type in cuda variant

5bc407f

Update MAT_FUSED_MUL_ADD.hpp

3186df1

fix spacing

MrBurmark reviewed Jul 12, 2022

View reviewed changes

MrBurmark reviewed Jul 14, 2022

View reviewed changes

src/common/HipDataUtils.hpp Outdated Show resolved Hide resolved

MrBurmark reviewed Jul 14, 2022

View reviewed changes

CRobeck and others added 5 commits July 14, 2022 15:05

cleaning up some variable naming and create a N_Elem var

9eb3ce4

add raja_hiperrchk header

f8c1f2c

Co-authored-by: Jason Burmark <MrBurmark@users.noreply.github.com>

update lone N_Elem var

1347434

update lambda forall

cde5c95

Merge branch 'test/mfma' of github.com:LLNL/RAJAPerf into test/mfma

a4ee50d

MrBurmark reviewed Jul 14, 2022

View reviewed changes

CRobeck added 4 commits July 14, 2022 22:23

roll back lambda_hip_forall naming

5b93123

git rid of some unused static function warning

39fbf15

make hipArch more consistent with device naming

831e61c

fixing var naming issue

4e6a7b3

MrBurmark reviewed Jul 18, 2022

View reviewed changes

rhornung67 mentioned this pull request Jul 10, 2023

v2023.06.0 Release #344

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hip mfma tests #246

hip mfma tests #246

CRobeck commented May 20, 2022

MrBurmark commented Jun 1, 2022

MrBurmark Jul 12, 2022

CRobeck Jul 12, 2022 •

edited

Loading

MrBurmark Jul 14, 2022

MrBurmark Jul 14, 2022

CRobeck Jul 15, 2022 •

edited

Loading

MrBurmark Jul 14, 2022 •

edited

Loading

CRobeck Jul 14, 2022 •

edited

Loading

MrBurmark Jul 14, 2022

MrBurmark Jul 14, 2022

MrBurmark Jul 14, 2022

CRobeck Jul 14, 2022

MrBurmark Jul 14, 2022

MrBurmark Jul 18, 2022 •

edited

Loading

CRobeck Jul 20, 2022

		const Index_type N_Elem = N/(Ne*Ne);
		for(Index_type ii = 0; ii != N_Elem; ++ii){

hip mfma tests #246

Are you sure you want to change the base?

hip mfma tests #246

Conversation

CRobeck commented May 20, 2022

MrBurmark commented Jun 1, 2022

Choose a reason for hiding this comment

CRobeck Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CRobeck Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

MrBurmark Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

CRobeck Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrBurmark Jul 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CRobeck Jul 12, 2022 •

edited

Loading

CRobeck Jul 15, 2022 •

edited

Loading

MrBurmark Jul 14, 2022 •

edited

Loading

CRobeck Jul 14, 2022 •

edited

Loading

MrBurmark Jul 18, 2022 •

edited

Loading