Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cnm.compute to support transformation #8

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

oowekyala
Copy link
Collaborator

Just putting this up there for visibility... do not merge

This is a draft of a new facility to do scheduling on CNM, based on a discussion with Karl

I added a new operation to CNM, cnm.compute, which basically does all what cnm does today, but in a single operation. This makes it much easier to transform: you can transform a cnm.compute into another easily and iteratively refine the schedule of the program.

For instance you can push one (parallel) dimension of the workgroup into the kernel with a simple transformation:

    %r = memref.expand_shape %arg1[[0, 1]] : memref<1024xi32> into memref<2x512xi32>
    cnm.compute
       ins(%arg0[(i, j) -> ()]: memref<1024xi32>)
       outs(%r[(i, j) -> (i, j)]: memref<2x512xi32>)
       on hierarchy<2x512>
       do (%a1: memref<1024xi32>, %o1: memref<i32>)  {
        affine.for %i = 0 to 1024 {
          %0 = memref.load %a1[%i] : memref<1024xi32>
          %1 = memref.load %o1[] : memref<i32>
          %2 = arith.addi %0, %1 : i32
          memref.store %2, %o1[] : memref<i32>
        }
      }

which turns into

    %r = memref.expand_shape %arg1[[0, 1]] : memref<1024xi32> into memref<2x512xi32>
    cnm.compute
       ins(%arg0[(i) -> ()]: memref<1024xi32>)
       outs(%r[(i) -> (i)]: memref<2x512xi32>)
       on hierarchy<2>
       do (%a1: memref<1024xi32>, %o1: memref<512xi32>)  {
        affine.parallel (%j) = (0) to (512) {
            affine.for %i = 0 to 1024 {
              %0 = memref.load %a1[%i] : memref<1024xi32>
              %1 = memref.load %o1[%j] : memref<512xi32>
              %2 = arith.addi %0, %1 : i32
              memref.store %2, %o1[%j] : memref<512xi32>
            }
        }
      }

and when you're done scheduling this translates directly to the cnm operations we're used to:

    %r = memref.expand_shape %arg1[[0, 1]] : memref<1024xi32> into memref<2x512xi32>
    %3 = cnm.workgroup : !cnm.workgroup<2>
    %4 = cnm.alloc() for %3 : !cnm.buffer<1024xi32 on 2, level 0>
    %5 = cnm.alloc() for %3 : !cnm.buffer<512xi32 on 2, level 0>
    cnm.scatter %arg0 into %4[#map] of %3 : memref<1024xi32> into !cnm.buffer<1024xi32 on 2, level 0>
    cnm.scatter %r into %5[#map1] of %3 : memref<2x512xi32> into !cnm.buffer<512xi32 on 2, level 0>
    cnm.launch %3 ins(%4 : !cnm.buffer<1024xi32 on 2, level 0>) outs(%5 : !cnm.buffer<512xi32 on 2, level 0>) on !cnm.workgroup<2> {
    ^bb0(%arg2: memref<1024xi32>, %arg3: memref<512xi32>):
      affine.parallel (%arg4) = (0) to (512) {
        affine.for %arg5 = 0 to 1024 {
          %6 = memref.load %arg2[%arg5] : memref<1024xi32>
          %7 = memref.load %arg3[%arg4] : memref<512xi32>
          %8 = arith.addi %6, %7 : i32
          memref.store %8, %arg3[%arg4] : memref<512xi32>
        }
      }
    }
    cnm.gather %5[#map1] of %3 into %r : !cnm.buffer<512xi32 on 2, level 0> into memref<2x512xi32>
    cnm.free_workgroup %3 : !cnm.workgroup<2>

The point of keeping this more granular representation around is to support hoisting of the workgroup allocation and free ops outside of enclosing loops (and, not throw away existing code).

I'm currently writing transformations that should apply to this IR, for instance, turning a parallel WG dimension into enclosing loops. My goal would be to be able to do all scheduling on the CNM level, and eventually remove the CINM tiling pass, that knows too much about the backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant