Missing support for semi-affine modulo operations in Affine dialect #9279

dcaballe · 2022-06-02T07:05:34Z

After fixing 9244, the peeling transformation leads to semi-affine modulo operations (non-constant RHS) that are not already supported in Affine and can't be lowered to the Arithmetic dialect.

Reproducer:

func.func @simple_mul_dispatch_0() {
  %c4 = arith.constant 4 : index
  %c0 = arith.constant 0 : index
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) offset(%c0) alignment(64) : memref<4xf32>
  memref.assume_alignment %0, 64 : memref<4xf32>
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : memref<4xf32>
  memref.assume_alignment %1, 64 : memref<4xf32>
  %2 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) offset(%c0) alignment(64) : memref<4xf32>
  memref.assume_alignment %2, 64 : memref<4xf32>
  %workgroup_id_x = hal.interface.workgroup.id[0] : index
  %workgroup_count_x = hal.interface.workgroup.count[0] : index
  %3 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_id_x]
  %4 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_count_x]
  %5 = affine.apply affine_map<()[s0, s1] -> (-((s0 * -4 + 4) mod (s1 * 4)) + 4)>()[%workgroup_id_x, %workgroup_count_x]
  cf.br ^bb1(%3 : index)
^bb1(%6: index):  // 2 preds: ^bb0, ^bb2
  %7 = arith.cmpi slt, %6, %5 : index
  cf.cond_br %7, ^bb2, ^bb3(%5 : index)
^bb2:  // pred: ^bb1
  %8 = memref.subview %2[%6] [4] [1] : memref<4xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
  %9 = memref.subview %0[%6] [4] [1] : memref<4xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
  %10 = memref.subview %1[%6] [4] [1] : memref<4xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
  %11 = vector.load %9[%c0] : memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>, vector<4xf32>
  %12 = vector.load %10[%c0] : memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>, vector<4xf32>
  %13 = arith.mulf %11, %12 : vector<4xf32>
  vector.store %13, %8[%c0] : memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>, vector<4xf32>
  %14 = arith.addi %6, %4 : index
  cf.br ^bb1(%14 : index)
^bb3(%15: index):  // 2 preds: ^bb1, ^bb4
  %16 = arith.cmpi slt, %15, %c4 : index
  cf.cond_br %16, ^bb4, ^bb5
^bb4:  // pred: ^bb3
  %17 = memref.subview %2[%15] [4] [1] : memref<4xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
  %18 = memref.subview %0[%15] [4] [1] : memref<4xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
  %19 = memref.subview %1[%15] [4] [1] : memref<4xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
  %20 = vector.load %18[%c0] : memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>, vector<4xf32>
  %21 = vector.load %19[%c0] : memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>, vector<4xf32>
  %22 = arith.mulf %20, %21 : vector<4xf32>
  vector.store %22, %17[%c0] : memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>, vector<4xf32>
  %23 = arith.addi %15, %4 : index
  cf.br ^bb3(%23 : index)
^bb5:  // pred: ^bb3
  return
}

iree-opt semi-affine-mod.mlir -lower-affine
semi-affine-mod.mlir:14:8: error: semi-affine expressions (modulo by non-const) are not supported                                                                                                                                         
  %5 = affine.apply affine_map<()[s0, s1] -> (-((s0 * -4 + 4) mod (s1 * 4)) + 4)>()[%workgroup_id_x, %workgroup_count_x

Same problematic affine map as in 9244. I guess we would need to add support for them but, based on the documentation (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Affine/Utils/Utils.cpp#L65), I'm not sure what should happen for potential non-constant negative RHSs.

I also wonder how is that peeling is hitting this limitation now. Shouldn't any dynamic shape scenario lead to semi-affine modulo ops?

@matthias-springer, @nicolasvasilache, @hanhanW, @MaheshRavishankar

The text was updated successfully, but these errors were encountered:

hanhanW · 2022-06-02T07:40:43Z

I feel that this is the issue about workgroup_count = 0 again... They are not compilation known for some cases (e.g., dynamic shapes, peeling, etc). In this case, they are set to zeros. If we try to simplify affine ops using zeros, it causes issues.

@dcaballe it looks like peeling on distributed loops needs more fixes. Would you like to try applying the transform on second level of tiling?

I'm curious why we want to peel on distributed loops. It makes more sense to me if we apply it in the second level of tiling. I feel that it also makes the work of each thread more balance. Is there a strong reason why we want to apply peeling on distributed loops?

dcaballe · 2022-06-02T18:41:22Z

We shouldn't be peeling distributed loops! Let me take a look at what is going on here but after talking to Nicolas, this seems to be a well-known limitation we will have to deal with for dynamic shapes at the very least.

dcaballe · 2022-06-03T02:29:08Z

I restricted the loops to peel a bit more (#9310). The problem happened because we may tile loops and then immediately remove them if they have a single iteration so the loop nest structure after tiling can't be inferred from the tile sizes itself. I hit a similar issue in the past and things can get very messy. IMO, we should be doing this kind of loop optimizations at the end and not in the middle of the strategy because we lose information when we remove loops. I hope the Transform dialect can give more flexibility here.

In any case, we can close this issue for now although we may need to fix it when we exercise dynamic shapes a bit more.

dcaballe added bug 🐞 Something isn't working help wanted Extra attention is needed codegen Shared code generation infrastructure and dialects codegen/llvm LLVM code generation compiler backend labels Jun 2, 2022

iree-github-actions-bot added this to IREE - ARCHIVED (do not update) Jun 2, 2022

dcaballe closed this as completed Jun 3, 2022

dcaballe moved this to Done in IREE - ARCHIVED (do not update) Jun 3, 2022

GMNGeoffrey moved this to Done in IREE Jun 27, 2022

GMNGeoffrey added this to IREE Jun 27, 2022

chhzh123 mentioned this issue Aug 8, 2022

[Op] Semi-affine modulo operation is not suppoted in MLIR cornell-zhang/hcl-dialect#127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing support for semi-affine modulo operations in Affine dialect #9279

Missing support for semi-affine modulo operations in Affine dialect #9279

dcaballe commented Jun 2, 2022

hanhanW commented Jun 2, 2022

dcaballe commented Jun 2, 2022

dcaballe commented Jun 3, 2022

Missing support for semi-affine modulo operations in Affine dialect #9279

Missing support for semi-affine modulo operations in Affine dialect #9279

Comments

dcaballe commented Jun 2, 2022

hanhanW commented Jun 2, 2022

dcaballe commented Jun 2, 2022

dcaballe commented Jun 3, 2022