Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoveSingleIterationLoop crashes trying to simplify a mod 0 affine expression #9244

Closed
dcaballe opened this issue May 28, 2022 · 12 comments · Fixed by #9258
Closed

RemoveSingleIterationLoop crashes trying to simplify a mod 0 affine expression #9244

dcaballe opened this issue May 28, 2022 · 12 comments · Fixed by #9258
Assignees
Labels
bug 🐞 Something isn't working help wanted Extra attention is needed

Comments

@dcaballe
Copy link
Contributor

dcaballe commented May 28, 2022

#device_target_cpu = #hal.device.target<"cpu", {executable_targets = [#hal.executable.target<"llvm", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>]}>
#executable_layout = #hal.executable.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>, #hal.descriptor_set.binding<1, storage_buffer>, #hal.descriptor_set.binding<2, storage_buffer>]>]>
#executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
#map0 = affine_map<()[s0] -> (s0 ceildiv 4)>
#map1 = affine_map<()[s0] -> (s0 * 4)>
#map2 = affine_map<()[s0, s1] -> (-((s0 * -4 + 4) mod (s1 * 4)) + 4)>
#map3 = affine_map<(d0)[s0] -> (d0 + s0)>
#translation = #iree_codegen.translation_info<CPUDoubleTilingExpert workload_per_wg = [4]>
module attributes {hal.device.targets = [#device_target_cpu]} {
  hal.executable private @simple_mul_dispatch_0 {
    hal.executable.variant public @embedded_elf_x86_64, target = #executable_target_embedded_elf_x86_64_ {
      hal.executable.entry_point public @simple_mul_dispatch_0 ordinal(0) layout(#executable_layout) {translation_info = #translation} {
      ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3: index):
        %c1 = arith.constant 1 : index
        %0 = affine.apply #map0()[%arg1]
        hal.return %0, %c1, %c1 : index, index, index
      }
      builtin.module {
        func.func @simple_mul_dispatch_0() {
          %cst = arith.constant 0.000000e+00 : f32
          %c4 = arith.constant 4 : index
          %c0 = arith.constant 0 : index
          %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) offset(%c0) alignment(64) : memref<4xf32>
          memref.assume_alignment %0, 64 : memref<4xf32>
          %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : memref<4xf32>
          memref.assume_alignment %1, 64 : memref<4xf32>
          %2 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) offset(%c0) alignment(64) : memref<4xf32>
          memref.assume_alignment %2, 64 : memref<4xf32>
          %workgroup_id_x = hal.interface.workgroup.id[0] : index
          %workgroup_count_x = hal.interface.workgroup.count[0] : index
          %3 = affine.apply #map1()[%workgroup_id_x]
          %4 = affine.apply #map1()[%workgroup_count_x]
          %5 = affine.apply #map2()[%workgroup_id_x, %workgroup_count_x]
          scf.for %arg0 = %3 to %5 step %4 {
            %6 = memref.subview %2[%arg0] [4] [1] : memref<4xf32> to memref<4xf32, #map3>
            %7 = memref.subview %0[%arg0] [4] [1] : memref<4xf32> to memref<4xf32, #map3>
            %8 = memref.subview %1[%arg0] [4] [1] : memref<4xf32> to memref<4xf32, #map3>
            %9 = vector.transfer_read %7[%c0], %cst {in_bounds = [true]} : memref<4xf32, #map3>, vector<4xf32>
            %10 = vector.transfer_read %8[%c0], %cst {in_bounds = [true]} : memref<4xf32, #map3>, vector<4xf32>
            %11 = arith.mulf %9, %10 : vector<4xf32>
            vector.transfer_write %11, %6[%c0] {in_bounds = [true]} : vector<4xf32>, memref<4xf32, #map3>
          }
          scf.for %arg0 = %5 to %c4 step %4 {
            %6 = memref.subview %2[%arg0] [4] [1] : memref<4xf32> to memref<4xf32, #map3>
            %7 = memref.subview %0[%arg0] [4] [1] : memref<4xf32> to memref<4xf32, #map3>
            %8 = memref.subview %1[%arg0] [4] [1] : memref<4xf32> to memref<4xf32, #map3>
            %9 = vector.transfer_read %7[%c0], %cst {in_bounds = [true]} : memref<4xf32, #map3>, vector<4xf32>
            %10 = vector.transfer_read %8[%c0], %cst {in_bounds = [true]} : memref<4xf32, #map3>, vector<4xf32>
            %11 = arith.mulf %9, %10 : vector<4xf32>
            vector.transfer_write %11, %6[%c0] {in_bounds = [true]} : vector<4xf32>, memref<4xf32, #map3>
          }
          return
        }
      }
    }
  }
}

RemoveSingleIterationLoop crashes when analyzing the last loop in the function above (scf.for %arg0 = %5 to %c4 step %4). alwaysRunsFirstIteration utility invokes substituteMin which tries to simplify the affine expression () -> (8 mod 0). The mod 0 expression is not properly handled and compilation crashes with:

.../llvm-project/mlir/lib/IR/AffineExpr.cpp:1186: void mlir::SimpleAffineExprFlattener::visitModExp
r(mlir::AffineBinaryOpExpr): Assertion `rhsConst > 0 && "RHS constant has to be positive"' failed.
@dcaballe dcaballe added bug 🐞 Something isn't working help wanted Extra attention is needed labels May 28, 2022
@hanhanW
Copy link
Contributor

hanhanW commented May 31, 2022

  %5 = affine.apply affine_map<()[s0, s1] -> (-((s0 * -4 + 4) mod (s1 * 4)) + 4)>()[%workgroup_id_x, %workgroup_count_x]

It looks more complicated than all the cases I've seen before. Could you elaborate how the IR is generated? What's the original IR (maybe at Linalg level) and the configurations?

@dcaballe
Copy link
Contributor Author

That code is generated after applying peeling to the original loop. In particular, it's from iree/samples/simple_embedding/simple_embedding_test.mlir. Perhaps @matthias-springer can help clarify how that expression is generated.

@hanhanW
Copy link
Contributor

hanhanW commented May 31, 2022

Peeling on wrong loops might breaks the assumption in RemoveSingleIterationLoop. Do you have full dump log (i.e., -mlir-print-ir-after-all)? How do we repro the issue? Are there patches to apply?

@dcaballe
Copy link
Contributor Author

Sorry, I thought it was obvious but it's actually not! I tried iree-opt simplify_bug.mlir -iree-codegen-remove-single-iteration-loop on the initial code that I provided and it looks like RemoveSingleIterationLoop is HAL dependent. I update the code above to also include the module but I'm still not able to reproduce with iree-opt. I'll check tomorrow with more time.

The loop before peeling is the following. It looks like a valid loop to peel to me:

#config = #iree_codegen.lowering_config<tile_sizes = [[4], [4], [0]]>                                                                                                                                                   
#device_target_cpu = #hal.device.target<"cpu", {executable_targets = [#hal.executable.target<"llvm", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>]}>                                                                                                                 
#executable_layout = #hal.executable.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>, #hal.descriptor_set.binding<1, storage_buffer>, #hal.descriptor_set.binding<2, storage_buffer>]>]>                                                                                                                                                                            
#executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>                                                                                                                                                
#map0 = affine_map<()[s0] -> (s0 ceildiv 4)>                                                                                                                                                                            
#map1 = affine_map<()[s0] -> (s0 * 4)>                                                                                                                                                                                  
#map2 = affine_map<(d0) -> (d0)>                                                                                                                                                                                        
#translation = #iree_codegen.translation_info<CPUDoubleTilingExpert workload_per_wg = [4]>                                                                                                                              
module attributes {hal.device.targets = [#device_target_cpu]} {                                                                                                                                                         
  hal.executable private @simple_mul_dispatch_0 {                                                                                                                                                                       
    hal.executable.variant public @embedded_elf_x86_64, target = #executable_target_embedded_elf_x86_64_ {                                                                                                              
      hal.executable.entry_point public @simple_mul_dispatch_0 ordinal(0) layout(#executable_layout) {translation_info = #translation} {                                                                                
      ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3: index):                                                                                                                                               
        %c1 = arith.constant 1 : index                                                                                                                                                                                  
        %0 = affine.apply #map0()[%arg1]                                                                                                                                                                                
        hal.return %0, %c1, %c1 : index, index, index                                                                                                                                                                   
      }                                                                                                                                                                                                                 
      builtin.module {                                                                                                                                                                                                  
        func.func @simple_mul_dispatch_0() {                                                                                                                                                                            
          %c1 = arith.constant 1 : index                                                                                                                                                                                
          %c4 = arith.constant 4 : index                                                                                                                                                                                
          %c0 = arith.constant 0 : index                                                                                                                                                                                
          %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<readonly:4xf32>                                                                   
          %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<readonly:4xf32>                                                                   
          %2 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<writeonly:4xf32>                                                                  
          %workgroup_id_x = hal.interface.workgroup.id[0] : index                                                                                                                                                       
          %workgroup_count_x = hal.interface.workgroup.count[0] : index                                                                                                                                                 
          %3 = affine.apply #map1()[%workgroup_id_x]                                                                                                                                                                    
          %4 = affine.apply #map1()[%workgroup_count_x]                                                                                                                                                                 
          scf.for %arg0 = %3 to %c4 step %4 {                                                                                                                                                                           
            %5 = flow.dispatch.tensor.load %2, offsets = [%arg0], sizes = [4], strides = [1] : !flow.dispatch.tensor<writeonly:4xf32> -> tensor<4xf32>                                                                  
            %6 = flow.dispatch.tensor.load %0, offsets = [%arg0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readonly:4xf32> -> tensor<4xf32>                                                                   
            %7 = flow.dispatch.tensor.load %1, offsets = [%arg0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readonly:4xf32> -> tensor<4xf32>                                                                   
            %8 = linalg.generic {indexing_maps = [#map2, #map2, #map2], iterator_types = ["parallel"]} ins(%6, %7 : tensor<4xf32>, tensor<4xf32>) outs(%5 : tensor<4xf32>) attrs =  {__internal_linalg_transform__ = "1", lowering_config = #config, name = "mul.1"} {                                                                                                                                                                          
            ^bb0(%arg1: f32, %arg2: f32, %arg3: f32):                                                                                                                                                                   
              %9 = arith.mulf %arg1, %arg2 : f32                                                                                                                                                                        
              linalg.yield %9 : f32                                                                                                                                                                                     
            } -> tensor<4xf32>                                                                                                                                                                                          
            flow.dispatch.tensor.store %8, %2, offsets = [%arg0], sizes = [4], strides = [%c1] : tensor<4xf32> -> !flow.dispatch.tensor<writeonly:4xf32>                                                                
          }                                                                                                                                                                                                             
          return                                                                                                                                                                                                        
        }                                                                                                                                                                                                               
      }                                                                                                                                                                                                                 
    }                                                                                                                                                                                                                   
  }
}

@hanhanW
Copy link
Contributor

hanhanW commented May 31, 2022

Sorry for the confusion.. I meant that if it can get reproduced through iree-translate. I'd like to see full dump before jumping into this specific issue. The actual issue could happen in other places. We might want to fix it in the first place.

E.g., what's the IR before and after peeling. What loops are we target on for peeling? It looks like the peeling transform is applied on distributed loops. I'd expect it happens on the second tiling level. It would be clearer to me if you can provide the commit and IR dumps. We can also chat through VC if there are many details.

@MaheshRavishankar
Copy link
Contributor

(Begin soap box) The use of RemoveSingleIterationLoop is a hack. We should really not be using it, but have it for reasons that are not entirely technical. (End soap box).

For purpose of prototyping might be worth just dropping this pass and seeing if things work. (worst case for development purposes add a flag that guards this pass usage, and leave it default true)

@dcaballe
Copy link
Contributor Author

iree-opt --split-input-file --pass-pipeline='hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-remove-single-iteration-loop))))' simplify_bug.mlir seems to repro it. I find surprising that we can't run passes in isolation and we need to run a "small pipeline" even for simple passes like this one. I guess that's because we have dependencies with the HAL dialect? Anyways...

I'm also attaching the output of print-ir-after-all. If you want to reproduce it yourself, you can pull https://github.com/dcaballe/iree/tree/peeling for IREE and https://github.com/dcaballe/llvm-project/tree/peeling for third-party/llvm.

If I remove RemoveSingleIterationLoop from the double tiling expert and peeling is disabled, tests/e2e/linalg_transform/linalg_transform.mlir.test and iree/compiler/Codegen/LLVMCPU/test/linalg_transform.mlir.test crash with:

iree-run-mlir: /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h:134: mlir::transform::TransformState::RegionScope::RegionScope(mlir::transform::Transform
State &, mlir::Region &): Assertion `state.regionStack.back()->isProperAncestor(&region) && "scope started at a non-nested region"' failed.                                                             
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.                                                          
Stack dump:                                                                                                                                                                                
0.      Program arguments: iree-run-mlir /usr/local/google/home/diegocaballero/iree2/tests/e2e/linalg_transform/linalg_transform.mlir --iree-hal-target-backends=dylib-llvm-aot --iree-codegen-use-linalg-transform-interp --linalg-transf
orm-file-name=/usr/local/google/home/diegocaballero/iree2/tests/e2e/linalg_transform/linalg_transform_spec.mlir                                                                            
 #0 0x0000000004852d4a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:11                
 #1 0x0000000004852efb PrintStackTraceSignalHandler(void*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1                                 
 #2 0x0000000004851596 llvm::sys::RunSignalHandlers() /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Signals.cpp:103:5                  
 #3 0x0000000004853625 SignalHandler(int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1               
 #4 0x00007f2428dd7200 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12200)                                                                                  
 #5 0x00007f24289878a1 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:50:1                                                                                        
 #6 0x00007f2428971546 abort ./stdlib/abort.c:81:7                                                                                                                   
 #7 0x00007f242897142f get_sysdep_segment_value ./intl/loadmsgcat.c:509:8                                                                                            
 #8 0x00007f242897142f _nl_load_domain ./intl/loadmsgcat.c:970:34                                                                                                    
 #9 0x00007f2428980222 (/lib/x86_64-linux-gnu/libc.so.6+0x31222)                                                                                                                           
#10 0x0000000007d98e84 mlir::transform::TransformState::RegionScope::RegionScope(mlir::transform::TransformState&, mlir::Region&) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Dialect/Transform
/IR/TransformInterfaces.h:135:7                                                                                                                                                            
#11 0x0000000007d9554b mlir::transform::TransformState::make_region_scope(mlir::Region&) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h:349:10
#12 0x0000000007e7cb90 mlir::transform::WithPDLPatternsOp::apply(mlir::transform::TransformResults&, mlir::transform::TransformState&) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Dialect/Transform/IR/TransformOps.cpp:313:22                                                                                                                                                                                                 
#13 0x0000000007e6c2d6 mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::WithPDLPatternsOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir
::transform::TransformResults&, mlir::transform::TransformState&) /usr/local/google/home/diegocaballero/iree2/build/debug/third_party/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h.inc:55:56
#14 0x0000000007e6eb63 mlir::transform::TransformOpInterface::apply(mlir::transform::TransformResults&, mlir::transform::TransformState&) /usr/local/google/home/diegocaballero/iree2/build/debug/third_party/llvm-project/llvm/tools/mlir
/include/mlir/Dialect/Transform/IR/TransformInterfaces.cpp.inc:10:14                                                                                                                       
#15 0x0000000007e6e768 mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Dialect/Transform/IR/TransformInterfaces.cpp:1
26:24                                                                                                                                                                                      
#16 0x0000000007d53523 (anonymous namespace)::LinalgTransformInterp::runOnOperation() /usr/local/google/home/diegocaballero/iree2/llvm-external-projects/iree-dialects/lib/Dialect/LinalgTransform/Passes/TransformInterpreter.cpp:97:24
#17 0x0000000004ac8cba mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:471:21
#18 0x0000000004ac92b4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /usr
/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:534:16                                                                                             
#19 0x0000000004acd571 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_4::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero
/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:456:12                                                                                                                              
#20 0x0000000004acd2f2 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, 
unsigned int)::$_4>(long, mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12   
#21 0x0000000004d4b131 llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llv
m/include/llvm/ADT/STLFunctionalExtras.h:68:12                                                                                                                                             
#22 0x0000000004d48785 mlir::Pass::runPipeline(mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Pass/Pass.h:195:12                        
#23 0x0000000007d0ab70 mlir::iree_compiler::(anonymous namespace)::LLVMCPULowerExecutableTargetPass::runOnOperation() /usr/local/google/home/diegocaballero/iree2/compiler/src/iree/compiler/Codegen/LLVMCPU/LLVMCPULowerExecutableTarget.
cpp:236:14                                                                                                                                                                                                              
#24 0x0000000004ac8cba mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:471:21
#25 0x0000000004ac92b4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /usr
/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:534:16                                                                                                                          
#26 0x0000000004acd571 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_4::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero
/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:456:12                                                                                                                                                           
#27 0x0000000004acd2f2 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, 
unsigned int)::$_4>(long, mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12                                
#28 0x0000000004d4b131 llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llv
m/include/llvm/ADT/STLFunctionalExtras.h:68:12                                                                                                                                                                          
#29 0x0000000004d48785 mlir::Pass::runPipeline(mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Pass/Pass.h:195:12                                          
#30 0x000000000782af37 mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass::runOnOperation() /usr/local/google/home/diegocaballero/iree2/compiler/src/iree/compiler/Dialect/HAL/Transforms/TranslateExecutables.cpp:67:16

If I enable peeling, more tests are crashing with:

FAILED: samples/static_library/simple_mul_c_module.h samples/static_library/simple_mul_c_module.o samples/static_library/simple_mul_emitc.h /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library/simple_mul_c_module.h /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library/simple_mul_c_module.o /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library/simple_mul_emitc.h                                                                                                                                                                                                                                                                                                                                                                           
cd /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library && /usr/local/google/home/diegocaballero/iree2/build/debug/tools/iree-compile --iree-mlir-to-vm-c-module --iree-hal-target-backends=dylib-llvm-aot --iree-llvm-link-embedded=false --iree-llvm-link-static --iree-llvm-static-library-output-path=simple_mul_c_module.o /usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir -o simple_mul_emitc.h                                                                                                                                                                                                                                                                                                                                                                
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: error: semi-affine expressions (modulo by non-const) are not supported                                                                                                                                                                                                                                                   
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:1:1: note: called from                                                                                                                                                                                                                                                                                                        
func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                                               
^                                                                                                                                                                                                                                                                                                                                                                                                                
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: error: failed to legalize operation 'builtin.unrealized_conversion_cast' that was explicitly marked illegal                                                                                                                                                                                                              
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:1:1: note: called from                                                                                                                                                                                                                                                                                                        
func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                                               
^                                                                                                                                                                                                                                                                                                                                                                                                                
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: note: see current operation: %67 = "builtin.unrealized_conversion_cast"(%66) : (i64) -> index                                                                                                                                                                                                                            
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm", "static", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:1:1: note: called from                                                                                                                                                                                                                                                                                                        
func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                                               
^                                                                                

But I'm not sure if this happens because RemoveSingleIterationLoop is disabled or this is a new unrelated issue. It seems related to the mod operation as well so, before investigating this further, it would be great to have confirmation from @matthias-springer that the first example is a good candidate for peeling and peeling is generating the right code. Perhaps we are hitting a gap in the peeling implementation.

@MaheshRavishankar
Copy link
Contributor

iree-opt --split-input-file --pass-pipeline='hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-remove-single-iteration-loop))))' simplify_bug.mlir seems to repro it. I find surprising that we can't run passes in isolation and we need to run a "small pipeline" even for simple passes like this one. I guess that's because we have dependencies with the HAL dialect? Anyways...

That particular pass needs to look at the hal.executable.entry_point to remove the loop. Hence its a not great, and also why you need to whole nesting...

I'm also attaching the output of print-ir-after-all. If you want to reproduce it yourself, you can pull https://github.com/dcaballe/iree/tree/peeling for IREE and https://github.com/dcaballe/llvm-project/tree/peeling for third-party/llvm.

If I remove RemoveSingleIterationLoop from the double tiling expert and peeling is disabled, tests/e2e/linalg_transform/linalg_transform.mlir.test and iree/compiler/Codegen/LLVMCPU/test/linalg_transform.mlir.test crash with:

Strange. Didnt know that the linalg_transform relied on this. You can ignore these errors. If these are the only ones left we can pull in Nicolas to help.

iree-run-mlir: /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h:134: mlir::transform::TransformState::RegionScope::RegionScope(mlir::transform::Transform
State &, mlir::Region &): Assertion `state.regionStack.back()->isProperAncestor(&region) && "scope started at a non-nested region"' failed.                                                             
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/[](https://github.com/llvm/llvm-project/issues/) and include the crash backtrace.                                                          
Stack dump:                                                                                                                                                                                
0.      Program arguments: iree-run-mlir /usr/local/google/home/diegocaballero/iree2/tests/e2e/linalg_transform/linalg_transform.mlir --iree-hal-target-backends=dylib-llvm-aot --iree-codegen-use-linalg-transform-interp --linalg-transf
orm-file-name=/usr/local/google/home/diegocaballero/iree2/tests/e2e/linalg_transform/linalg_transform_spec.mlir                                                                            
 #0 0x0000000004852d4a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:11                
 #1 0x0000000004852efb PrintStackTraceSignalHandler(void*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1                                 
 #2 0x0000000004851596 llvm::sys::RunSignalHandlers() /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Signals.cpp:103:5                  
 #3 0x0000000004853625 SignalHandler(int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1               
 #4 0x00007f2428dd7200 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12200)                                                                                  
 #5 0x00007f24289878a1 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:50:1                                                                                        
 #6 0x00007f2428971546 abort ./stdlib/abort.c:81:7                                                                                                                   
 #7 0x00007f242897142f get_sysdep_segment_value ./intl/loadmsgcat.c:509:8                                                                                            
 #8 0x00007f242897142f _nl_load_domain ./intl/loadmsgcat.c:970:34                                                                                                    
 #9 0x00007f2428980222 (/lib/x86_64-linux-gnu/libc.so.6+0x31222)                                                                                                                           
#10 0x0000000007d98e84 mlir::transform::TransformState::RegionScope::RegionScope(mlir::transform::TransformState&, mlir::Region&) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Dialect/Transform
/IR/TransformInterfaces.h:135:7                                                                                                                                                            
#11 0x0000000007d9554b mlir::transform::TransformState::make_region_scope(mlir::Region&) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h:349:10
#12 0x0000000007e7cb90 mlir::transform::WithPDLPatternsOp::apply(mlir::transform::TransformResults&, mlir::transform::TransformState&) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Dialect/Transform/IR/TransformOps.cpp:313:22                                                                                                                                                                                                 
#13 0x0000000007e6c2d6 mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::WithPDLPatternsOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir
::transform::TransformResults&, mlir::transform::TransformState&) /usr/local/google/home/diegocaballero/iree2/build/debug/third_party/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Transform/IR/TransformInterfaces.h.inc:55:56
#14 0x0000000007e6eb63 mlir::transform::TransformOpInterface::apply(mlir::transform::TransformResults&, mlir::transform::TransformState&) /usr/local/google/home/diegocaballero/iree2/build/debug/third_party/llvm-project/llvm/tools/mlir
/include/mlir/Dialect/Transform/IR/TransformInterfaces.cpp.inc:10:14                                                                                                                       
#15 0x0000000007e6e768 mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Dialect/Transform/IR/TransformInterfaces.cpp:1
26:24                                                                                                                                                                                      
#16 0x0000000007d53523 (anonymous namespace)::LinalgTransformInterp::runOnOperation() /usr/local/google/home/diegocaballero/iree2/llvm-external-projects/iree-dialects/lib/Dialect/LinalgTransform/Passes/TransformInterpreter.cpp:97:24
#17 0x0000000004ac8cba mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:471:21
#18 0x0000000004ac92b4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /usr
/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:534:16                                                                                             
#19 0x0000000004acd571 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_4::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero
/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:456:12                                                                                                                              
#20 0x0000000004acd2f2 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, 
unsigned int)::$_4>(long, mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12   
#21 0x0000000004d4b131 llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llv
m/include/llvm/ADT/STLFunctionalExtras.h:68:12                                                                                                                                             
#22 0x0000000004d48785 mlir::Pass::runPipeline(mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Pass/Pass.h:195:12                        
#23 0x0000000007d0ab70 mlir::iree_compiler::(anonymous namespace)::LLVMCPULowerExecutableTargetPass::runOnOperation() /usr/local/google/home/diegocaballero/iree2/compiler/src/iree/compiler/Codegen/LLVMCPU/LLVMCPULowerExecutableTarget.
cpp:236:14                                                                                                                                                                                                              
#24 0x0000000004ac8cba mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:471:21
#25 0x0000000004ac92b4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /usr
/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:534:16                                                                                                                          
#26 0x0000000004acd571 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_4::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero
/iree2/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:456:12                                                                                                                                                           
#27 0x0000000004acd2f2 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, 
unsigned int)::$_4>(long, mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12                                
#28 0x0000000004d4b131 llvm::function_ref<mlir::LogicalResult (mlir::OpPassManager&, mlir::Operation*)>::operator()(mlir::OpPassManager&, mlir::Operation*) const /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/llv
m/include/llvm/ADT/STLFunctionalExtras.h:68:12                                                                                                                                                                          
#29 0x0000000004d48785 mlir::Pass::runPipeline(mlir::OpPassManager&, mlir::Operation*) /usr/local/google/home/diegocaballero/iree2/third_party/llvm-project/mlir/include/mlir/Pass/Pass.h:195:12                                          
#30 0x000000000782af37 mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass::runOnOperation() /usr/local/google/home/diegocaballero/iree2/compiler/src/iree/compiler/Dialect/HAL/Transforms/TranslateExecutables.cpp:67:16

If I enable peeling, more tests are crashing with:

FAILED: samples/static_library/simple_mul_c_module.h samples/static_library/simple_mul_c_module.o samples/static_library/simple_mul_emitc.h /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library/simple_mul_c_module.h /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library/simple_mul_c_module.o /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library/simple_mul_emitc.h                                                                                                                                                                                                                                                                                                                                                                           
cd /usr/local/google/home/diegocaballero/iree2/build/debug/samples/static_library && /usr/local/google/home/diegocaballero/iree2/build/debug/tools/iree-compile --iree-mlir-to-vm-c-module --iree-hal-target-backends=dylib-llvm-aot --iree-llvm-link-embedded=false --iree-llvm-link-static --iree-llvm-static-library-output-path=simple_mul_c_module.o /usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir -o simple_mul_emitc.h                                                                                                                                                                                                                                                                                                                                                                
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: error: semi-affine expressions (modulo by non-const) are not supported                                                                                                                                                                                                                                                   
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:1:1: note: called from                                                                                                                                                                                                                                                                                                        
func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                                               
^                                                                                                                                                                                                                                                                                                                                                                                                                
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: error: failed to legalize operation 'builtin.unrealized_conversion_cast' that was explicitly marked illegal                                                                                                                                                                                                              
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:1:1: note: called from                                                                                                                                                                                                                                                                                                        
func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                                               
^                                                                                                                                                                                                                                                                                                                                                                                                                
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: note: see current operation: %67 = "builtin.unrealized_conversion_cast"(%66) : (i64) -> index                                                                                                                                                                                                                            
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:3:8: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm", "static", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
  %0 = "arith.mulf"(%arg0, %arg1) {name = "mul.1"} : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                             
       ^                                                                                                                                                                                                                                                                                                                                                                                                         
/usr/local/google/home/diegocaballero/iree2/samples/static_library/simple_mul.mlir:1:1: note: called from                                                                                                                                                                                                                                                                                                        
func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32>                                                                                                                                                                                                                                                                                                                               
^                                                                                

But I'm not sure if this happens because RemoveSingleIterationLoop is disabled or this is a new unrelated issue. It seems related to the mod operation as well so, before investigating this further, it would be great to have confirmation from @matthias-springer that the first example is a good candidate for peeling and peeling is generating the right code. Perhaps we are hitting a gap in the peeling implementation.

Yeah this seems related to mod 0 issue. Btw, fact that mod 0 is appearing is itself strange. Is n mod 0 supposed to simplify to n?

@matthias-springer
Copy link
Contributor

it would be great to have confirmation from @matthias-springer that the first example is a good candidate for peeling and peeling is generating the right code. Perhaps we are hitting a gap in the peeling implementation.

The loop looks fine. There are actually no requirements for the loop itself. Any loop can be peeled. (Unless every iteration is already a full iteration, then we may not do anything, I forgot...) After the peeling, we try to simplify affine.max and affine.min ops inside of the loop. If there are invalid affine expressions somewhere inside the loop, this may fail.

An affine expression that has a mod 0 sounds ill-formed to me. It's like a division by zero, which is also invalid. I'm wondering where this is coming from. Are you sure it is generated during peeling?

The loop peeling replaces the upper bound of the loop with a new bound. If we have many constant parts in the computation, this new bound can be very simple. In the general case, with an SSA value step (like we have it here), it can get pretty complex. The upper bound is computed in SCF/Transforms/LoopSpecialization.cpp:

// New upper bound: %ub - (%ub - %lb) mod %step

@dcaballe
Copy link
Contributor Author

An affine expression that has a mod 0 sounds ill-formed to me.

The mod 0 seems to be a thing, based on this TODO comment. Not sure how it should be resolved to, though.

Are you sure it is generated during peeling?

// New upper bound: %ub - (%ub - %lb) mod %step

You can search for 'Peel' in this dump. Peeling generates:

#map0 = affine_map<()[s0] -> (s0 ceildiv 4)>
#map1 = affine_map<()[s0] -> (s0 * 4)>
#map2 = affine_map<()[s0, s1, s2] -> (s1 - (s1 - s0) mod s2)>
#map3 = affine_map<(d0) -> (d0)>
...
          %3 = affine.apply #map1()[%workgroup_id_x]
          %4 = affine.apply #map1()[%workgroup_count_x]
          %5 = affine.apply #map2()[%3, %c4, %4]
          scf.for %arg0 = %3 to %5 step %4 {

Let's replace:

  • %c4 in map2: affine_map<()[s0, s1] -> (4 - (4 - s0) mod s1)>
  • %3 in map2: affine_map<()[s0, s1] -> (4 - (4 - s0 *4) mod s1)>
  • %4 in map2: affine_map<()[s0, s1] -> (4 - (4 - s0 *4) mod (s1 * 4))>
    which can be canonicalized to affine_map<()[s0, s1] -> (-((s0 * -4 + 4) mod (s1 * 4)) + 4)>`. This looks good to me.

However, when we instantiate the map, s0 = workgroup_id_x and s1 = workgroup_count_x. workgroup_count_x should always be > 0, right? Something weird seems to be happening here in the simplifyMin call. I don't see any 'min' operation in the IR...

@matthias-springer
Copy link
Contributor

There are in fact no affine.min/affine.max ops in the IR. The loop peeling is actually pretty straightforward in that case. And looking at the IR in the dump, it looks like it does the right thing.

The mod 0 still looks very suspicious to me. I'd recommend looking into how this is generated. Maybe there is a bug in SimplifyTrivialLoops. I have no idea what SimplifyTrivialLoops is doing, but it almost looks like it is trying to compute something given the bounds of some value (GetMinMaxExprFn). With that in mind, the function name simplifyMin would make sense ("lower bound") and maybe does not refer to an affine.min op.

@hanhanW
Copy link
Contributor

hanhanW commented Jun 1, 2022

I think I know why. There are issues in getNumWorkgroup. IREE can't infer the bound, so they are replaced with 0. (The same issue happens in dynamic shapes, but I don't know why unconditionally converting them to zeros works.)

I dont find the documentation yet. I feel that it means unknown when setting the values to zeros. In this case, we should do nothing when they are zeros.

This should fix the issue: #9258

For more exploration, I'd suggest to disable the pass in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants