Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads #9226

Closed
mehrdadh opened this issue Oct 7, 2021 · 5 comments

Comments

@mehrdadh
Copy link
Member

mehrdadh commented Oct 7, 2021

Currently if we define a relay.conv2d operator with specific size (see bellow) and apply passes to build with SIMD schedule, it generates error.

Expected behavior

It should compute without error.

Actual behavior

Error:

o<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > > const&, std::unordered_map<tvm::tir::IterVar, tvm::Range, std::hash<tvm::tir::IterVar>, std::equal_to<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > > const&, std::unordered_map<tvm::te::Tensor, tvm::runtime::Array<tvm::Range, void>, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::runtime::Array<tvm::Range, void> > > > const&, tvm::te::TensorIntrin const&)
E             File "/home/mhessar/mlperftiny/3rdparty/tvm/src/te/operation/tensorize.cc", line 339
E           TVMError:
E           ---------------------------------------------------------------
E           An error occurred during the execution of TVM.
E           For more information, please see: https://tvm.apache.org/docs/errors.html
E           ---------------------------------------------------------------
E             Check failed: (expr_equal(lhs, rhs)) is false: Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[(i*2), 0])*int32(b[j, 0]))], init=[], axis=[iter_var(k, range(min=0, ext=1))], where=(bool)1, value_index=0), intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[i, 0])*int32(b[j, 0]))], init=[], axis=[iter_var(k, range(min=0, ext=1))], where=(bool)1, value_index=0)

python/tvm/_ffi/_ctypes/packed_func.py:237: TVMError

Script to Generate

 data = relay.var("data", relay.TensorType((1, 49, 10, 1), "int8"))
    weight = relay.var("weight", relay.TensorType((10, 4, 1, 64), "int8"))
    y = relay.nn.conv2d(
        data,
        weight,
        padding=(4, 1, 5, 1),
        strides=(2, 2),
        kernel_size=(10, 4),
        kernel_layout="HWIO",
        data_layout="NHWC",
        out_dtype="int32",
    )
    f = relay.Function([data, weight], y)
    mod = tvm.IRModule.from_expr(f)
    mod = relay.transform.InferType()(mod)

    logging.info(mod)
    relay_mod_simd = _apply_desired_layout_simd(mod)

    target = tvm.target.target.micro(
        model,
        options=[
            "-keys=arm_cpu,cpu",
            "-link-params=1",
            "--executor=aot",
            "--unpacked-api=1",
            "--interface-api=c",
        ],
    )

    with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}):
        lowered_simd = relay.build(relay_mod_simd, target)
@mehrdadh
Copy link
Member Author

mehrdadh commented Oct 7, 2021

cc @sergey-grovety

@sergio-grovety
Copy link
Contributor

Hi @mehrdadh! Thank you for pointing to this bug. We were able to reproduce it and will try to solve the issue asap.

@mehrdadh
Copy link
Member Author

mehrdadh commented Oct 11, 2021

@sergey-grovety thanks for working on it. I'm adding other instances that this happens.

  • from Visual Wake Word model:
Check failed: (expr_equal(lhs, rhs)) is false: Failed to match the compute with TensorIntrin tensor_intrin's declaration 
provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=
[(int32(a[(i*2), k])*int32(b[j, k]))], init=[], axis=[iter_var(k, range(min=0, ext=3))], where=(bool)1, value_index=0), intrin=  
reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[i, k])*int32(b[j, 
k]))], init=[], axis=[iter_var(k, range(min=0, ext=3))], where=(bool)1, value_index=0), running this stage: stage(conv2d, 
compute(conv2d, body=[reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), 
source=[(int32(padded_data[nn, ((yy*2) + ry), ((xx*2) + rx), rc])*int32(placeholder[ry, rx, ff, rc]))], init=[], axis=[iter_var(ry, 
range(min=0, ext=3)), iter_var(rx, range(min=0, ext=3)), iter_var(rc, range(min=0, ext=3))], where=(bool)1, 
value_index=0)], axis=[iter_var(nn, range(min=0, ext=1)), iter_var(yy, range(min=0, ext=48)), iter_var(xx, range(min=0, 
ext=48)), iter_var(ff, range(min=0, ext=8))], reduce_axis=[iter_var(ry, range(min=0, ext=3)), iter_var(rx, range(min=0, 
ext=3)), iter_var(rc, range(min=0, ext=3))], tag=conv2d_nhwc, attrs={"workload": ["conv2d_direct_simd.arm_cpu", 
["TENSOR", [1, 96, 96, 3], "int16"], ["TENSOR", [3, 3, 8, 3], "int16"], [2, 2], [0, 0, 1, 1], [1, 1], "int32"]}))

@mehrdadh
Copy link
Member Author

and this one is from Image Classification model (one of the MLPerfTiny models):

Check failed: (expr_equal(lhs, rhs)) is false: Failed to match the compute with TensorIntrin tensor_intrin's declaration  
provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=
[(int32(a[(i*2), k])*int32(b[j, k]))], init=[], axis=[iter_var(k, range(min=0, ext=32))], where=(bool)1, value_index=0), intrin=  
reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(a[i, k])*int32(b[j, k]))], 
init=[], axis=[iter_var(k, range(min=0, ext=32))], where=(bool)1, value_index=0), running this stage: stage(conv2d, 
compute(conv2d, body=[reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), 
source=[(int32(padded_data[nn, ((yy*2) + ry), ((xx*2) + rx), rc])*int32(placeholder[ry, rx, ff, rc]))], init=[], axis=[iter_var(ry, 
range(min=0, ext=1)), iter_var(rx, range(min=0, ext=1)), iter_var(rc, range(min=0, ext=32))], where=(bool)1, 
value_index=0)], axis=[iter_var(nn, range(min=0, ext=1)), iter_var(yy, range(min=0, ext=8)), iter_var(xx, range(min=0, 
ext=8)), iter_var(ff, range(min=0, ext=64))], reduce_axis=[iter_var(ry, range(min=0, ext=1)), iter_var(rx, range(min=0, 
ext=1)), iter_var(rc, range(min=0, ext=32))], tag=conv2d_nhwc, attrs={"workload": ["conv2d_direct_simd.arm_cpu", 
["TENSOR", [1, 16, 16, 32], "int16"], ["TENSOR", [1, 1, 64, 32], "int16"], [2, 2], [0, 0, 0, 0], [1, 1], "int32"]}))

@areusch areusch changed the title [Bug] Direct SIMD Conv2d Bug [Bug] conv2d_nhwc_direct_simd.arm_cpu schedule has incorrect output with certain workloads Oct 14, 2021
sergio-grovety pushed a commit to sergio-grovety/tvm that referenced this issue Oct 14, 2021
@mehrdadh
Copy link
Member Author

This is resolved after #9233 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants