[Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads #9226

mehrdadh · 2021-10-07T22:52:17Z

Currently if we define a relay.conv2d operator with specific size (see bellow) and apply passes to build with SIMD schedule, it generates error.

Expected behavior

It should compute without error.

Actual behavior

Error:

o<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > > const&, std::unordered_map<tvm::tir::IterVar, tvm::Range, std::hash<tvm::tir::IterVar>, std::equal_to<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > > const&, std::unordered_map<tvm::te::Tensor, tvm::runtime::Array<tvm::Range, void>, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::runtime::Array<tvm::Range, void> > > > const&, tvm::te::TensorIntrin const&)
E             File "/home/mhessar/mlperftiny/3rdparty/tvm/src/te/operation/tensorize.cc", line 339
E           TVMError:
E           ---------------------------------------------------------------
E           An error occurred during the execution of TVM.
E           For more information, please see: https://tvm.apache.org/docs/errors.html
E           ---------------------------------------------------------------
E             Check failed: (expr_equal(lhs, rhs)) is false: Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[(i*2), 0])*int32(b[j, 0]))], init=[], axis=[iter_var(k, range(min=0, ext=1))], where=(bool)1, value_index=0), intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[i, 0])*int32(b[j, 0]))], init=[], axis=[iter_var(k, range(min=0, ext=1))], where=(bool)1, value_index=0)

python/tvm/_ffi/_ctypes/packed_func.py:237: TVMError

Script to Generate

 data = relay.var("data", relay.TensorType((1, 49, 10, 1), "int8"))
    weight = relay.var("weight", relay.TensorType((10, 4, 1, 64), "int8"))
    y = relay.nn.conv2d(
        data,
        weight,
        padding=(4, 1, 5, 1),
        strides=(2, 2),
        kernel_size=(10, 4),
        kernel_layout="HWIO",
        data_layout="NHWC",
        out_dtype="int32",
    )
    f = relay.Function([data, weight], y)
    mod = tvm.IRModule.from_expr(f)
    mod = relay.transform.InferType()(mod)

    logging.info(mod)
    relay_mod_simd = _apply_desired_layout_simd(mod)

    target = tvm.target.target.micro(
        model,
        options=[
            "-keys=arm_cpu,cpu",
            "-link-params=1",
            "--executor=aot",
            "--unpacked-api=1",
            "--interface-api=c",
        ],
    )

    with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}):
        lowered_simd = relay.build(relay_mod_simd, target)

The text was updated successfully, but these errors were encountered:

mehrdadh · 2021-10-07T22:52:30Z

cc @sergey-grovety

sergio-grovety · 2021-10-08T19:27:17Z

Hi @mehrdadh! Thank you for pointing to this bug. We were able to reproduce it and will try to solve the issue asap.

mehrdadh · 2021-10-11T17:44:52Z

@sergey-grovety thanks for working on it. I'm adding other instances that this happens.

from Visual Wake Word model:

Check failed: (expr_equal(lhs, rhs)) is false: Failed to match the compute with TensorIntrin tensor_intrin's declaration 
provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=
[(int32(a[(i*2), k])*int32(b[j, k]))], init=[], axis=[iter_var(k, range(min=0, ext=3))], where=(bool)1, value_index=0), intrin=  
reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[i, k])*int32(b[j, 
k]))], init=[], axis=[iter_var(k, range(min=0, ext=3))], where=(bool)1, value_index=0), running this stage: stage(conv2d, 
compute(conv2d, body=[reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), 
source=[(int32(padded_data[nn, ((yy*2) + ry), ((xx*2) + rx), rc])*int32(placeholder[ry, rx, ff, rc]))], init=[], axis=[iter_var(ry, 
range(min=0, ext=3)), iter_var(rx, range(min=0, ext=3)), iter_var(rc, range(min=0, ext=3))], where=(bool)1, 
value_index=0)], axis=[iter_var(nn, range(min=0, ext=1)), iter_var(yy, range(min=0, ext=48)), iter_var(xx, range(min=0, 
ext=48)), iter_var(ff, range(min=0, ext=8))], reduce_axis=[iter_var(ry, range(min=0, ext=3)), iter_var(rx, range(min=0, 
ext=3)), iter_var(rc, range(min=0, ext=3))], tag=conv2d_nhwc, attrs={"workload": ["conv2d_direct_simd.arm_cpu", 
["TENSOR", [1, 96, 96, 3], "int16"], ["TENSOR", [3, 3, 8, 3], "int16"], [2, 2], [0, 0, 1, 1], [1, 1], "int32"]}))

mehrdadh · 2021-10-11T17:56:14Z

and this one is from Image Classification model (one of the MLPerfTiny models):

Check failed: (expr_equal(lhs, rhs)) is false: Failed to match the compute with TensorIntrin tensor_intrin's declaration  
provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=
[(int32(a[(i*2), k])*int32(b[j, k]))], init=[], axis=[iter_var(k, range(min=0, ext=32))], where=(bool)1, value_index=0), intrin=  
reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[i, k])*int32(b[j, k]))], 
init=[], axis=[iter_var(k, range(min=0, ext=32))], where=(bool)1, value_index=0), running this stage: stage(conv2d, 
compute(conv2d, body=[reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), 
source=[(int32(padded_data[nn, ((yy*2) + ry), ((xx*2) + rx), rc])*int32(placeholder[ry, rx, ff, rc]))], init=[], axis=[iter_var(ry, 
range(min=0, ext=1)), iter_var(rx, range(min=0, ext=1)), iter_var(rc, range(min=0, ext=32))], where=(bool)1, 
value_index=0)], axis=[iter_var(nn, range(min=0, ext=1)), iter_var(yy, range(min=0, ext=8)), iter_var(xx, range(min=0, 
ext=8)), iter_var(ff, range(min=0, ext=64))], reduce_axis=[iter_var(ry, range(min=0, ext=1)), iter_var(rx, range(min=0, 
ext=1)), iter_var(rc, range(min=0, ext=32))], tag=conv2d_nhwc, attrs={"workload": ["conv2d_direct_simd.arm_cpu", 
["TENSOR", [1, 16, 16, 32], "int16"], ["TENSOR", [1, 1, 64, 32], "int16"], [2, 2], [0, 0, 0, 0], [1, 1], "int32"]}))

mehrdadh · 2021-11-15T21:39:36Z

This is resolved after #9233 merged.

mehrdadh added the type: bug label Oct 7, 2021

areusch changed the title ~~[Bug] Direct SIMD Conv2d Bug~~ [Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads Oct 14, 2021

sergio-grovety pushed a commit to sergio-grovety/tvm that referenced this issue Oct 14, 2021

bugfix for the issue apache#9226

66dbd5d

mehrdadh closed this as completed Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads #9226

[Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads #9226

mehrdadh commented Oct 7, 2021

mehrdadh commented Oct 7, 2021

sergio-grovety commented Oct 8, 2021

mehrdadh commented Oct 11, 2021 •

edited

Loading

mehrdadh commented Oct 11, 2021

mehrdadh commented Nov 15, 2021

[Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads #9226

[Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads #9226

Comments

mehrdadh commented Oct 7, 2021

Expected behavior

Actual behavior

Script to Generate

mehrdadh commented Oct 7, 2021

sergio-grovety commented Oct 8, 2021

mehrdadh commented Oct 11, 2021 • edited Loading

mehrdadh commented Oct 11, 2021

mehrdadh commented Nov 15, 2021

mehrdadh commented Oct 11, 2021 •

edited

Loading