Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend][TFLite] update logical_binary using new function #1

Closed
wants to merge 492 commits into from

Conversation

blackkker
Copy link
Owner

When importing the MoveNet_MultiPose model.
I met a keyerror as the pic shows:
image
it seems to be the error of self.get_expr().
But, I can't reproduce the test case for the logical_and op that is consistent with the model.
Also, i found PR#7048 updated the def get_tensor_expr, and some op have used the latest functions.
So, i update it! And my model complie successful.

mikepapadim and others added 30 commits March 25, 2022 13:55
)

Co-authored-by: Michalis Papapdimitriou <mpapapdimitriou@octoml.ai>
…10411)

Builds upon the work in #10254 to remove identity operations sandwiched
between two non-compute operations (reshape/strided slice - concatenate
is handled differently), under certain conditions. Specifically, an
identity operation is not removed when the dimensionality between the
two non-compute operations is reduced, due to non-congruent values
being accessed incorrectly. For example,

```
strided_slice(dims=4) -> identity -> reshape(dims=4)
```
becomes...
```
strided_slice -> reshape
```
but,
```
strided_slice(dims=4) -> identity -> reshape(dims=2)
```
remains as...
```
strided_slice -> identity -> reshape
```

Change-Id: Ie28ba384fcb3230d6f4651c0c19e2b9526ebcc42
Currently failing with `USE_CMSISNN` set to `OFF`
This mimics the behaviour of aot_test_utils.py to ensure the tests don't start running when the toolchain isn't available.
I believe the flakiness in #10748 is the small chance of generating a
1x1 or 1xn convolution which allows for a different buffer size:

https://github.com/apache/tvm/blob/63461c0c97c307e581271708c3490f5275675a1a/src/relay/backend/contrib/cmsisnn/buffer_size.cc#L38-L47

Therefore, careful selection of the distribution should alleviate
this issue.
* [Hexagon] Improved ergonomics of HexagonLauncher in unit tests.

The goal of this commit is to reduce/eliminate common code required
through unit tests that interact with Hexagon hardware.

- New testing fixtures in `tests/python/contrib/test_hexagon`.  A test
  running on hexagon hardware should only need to use the
  `hexagon_session` fixture.

  - `rpc_server_port`: Iterates through port numbers, selecting an
    unused port for each unit test.  Avoids needing to explicitly
    specify unique ports for each unit test.

  - `tvm_tracker`: Starts a tracker on use, exits after test.  Avoids
    needing to manually start a tracker prior to running the unit
    test.

  - `hexagon_launcher`: Starts a `HexagonLauncher` server on use,
    stops server after test.  Avoids needing to call `start_server()`
    and `stop_server()` in each test.

  - `hexagon_session`: Starts a hexagon session using
    `hexagon_laucnehr.start_session()`, exits after test.

- Added `Session.upload` function, which delegates to
  `HexagonLauncher.upload`.  Avoids needing to interact with both the
  launcher and the session.

- Allowed `tvm.IRModule` as argument passed to `Session.load_module`,
  which will automatically save/upload the module, then load it.
  Avoids needing to handle save/upload of temporary files in each unit
  test.

* Added default port for tracker if not already set.

* Pass through None from hexagon_launcher to hexagon_session.

* Updated launcher to use external tracker if specified.

* Avoid setting up the local tracker unless required.

* Declare previous_port as global, instead of list.

* Corrected type hints.

* Docstring updates
As part of this any failing tests have been marked for follow up as part of #10673.

This depends on fixes in #10659, #10672 and #10674 to scope other tests correctly.
This PR added an analysis function `SuggestIndexMap` to analyze buffer access pattern and suggest index map for layout transformations.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
* [microNPU] Add a pass to move allocate nodes to the outer scope

Adds a pass called `HoistAllocates` to move allocate nodes to the top
of the body of the main function. In doing so, it opens the door to
other optimizations that need to swap the ordering of external calls.

Pass illustration:
(before)
```
allocate {
    extern_call {
        allocate {
            extern_call {

            }
        }
    }
}
```

(after)
```
allocate {
    allocate {
        extern_call
        extern_call
    }
}
```

Change-Id: Ibcfc3c75b15deebb5c6645a4923a6ddf683b37c4

* address comments

* uses prim func pass, rather than module pass.
* adds error message informing user to run this pass with LowerToTIR()
  pass for now.

Change-Id: I57757b9dc5bff0208034a974a341c09cce0294bc

* Support allocates when not followed by a sequence statement

With a test to back this case up.

Change-Id: I670809f5ee53b583a15d9b783852dda3089756e9

* Add new directory tir/contrib/ethosu to cmake build

Change-Id: I3e9f24adfe992ace4e03238a18a8378b03257e1a
* [ci] Generate Jenkinsfile from a template

This uses `jinja2` to generate the Jenkinsfile. This is useful since it lets us both keep common functionality easy to define (i.e. iterate over all images and do something) while keeping the output easy to debug (you can look at the `Jenkinsfile` directly instead of trying to imagine what the Groovy interpreter will do). This will become more useful as we start to make CI more configurable, such as adding dynamic test sharding.

This mostly introduces the infrastructure and makes some token changes to demonstrate the generation process, but already its use is shown since the parameters was missing an entry for the `ci_hexagon` image.

* Address comments, fix CI with temporary workaround

Co-authored-by: driazati <driazati@users.noreply.github.com>
* fix reduce crash on scalar inputs

* fix uncovered cases.

* fix on different opset to pass ci
…0792)

Co-authored-by: driazati <driazati@users.noreply.github.com>
The code should be checking DSPRPC_LIB_DIRS instead of REMOTE_DIR.
Renames variable from `runtime` with `executor` to better
reflect current terminology and reduce confusion.
## Context

When dealing with end-to-end models, we note that some tensors may have large shapes. Thus, when designing graph-level IR, we sometimes use `int64` instead of `int32` for the shape. Below is an dense GeMM example which has `int64` input tensor shape:

```python
@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(rxplaceholder: T.Buffer[(1, 512), "float32"], rxplaceholder_1: T.Buffer[(T.int64(1000), T.int64(512)), "float32"], T_matmul_NT: T.Buffer[(1, T.int64(1000)), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "dense", "tir.noalias": True, "op_pattern": 3})
        # body
        # with T.block("root")
        for i0_0, i1_0, i0_1, i1_1, i2_0, i0_2, i1_2, i2_1, i0_3, i1_3 in T.grid(1, 4, 1, 25, 8, 1, 10, 64, 1, 1):
            with T.block("T_matmul_NT"):
                i = T.axis.spatial(1, 0)
                j = T.axis.spatial(T.int64(1000), i1_0 * T.int64(250) + i1_1 * T.int64(10) + i1_2)
                k = T.axis.reduce(512, i2_0 * 64 + i2_1)
                T.reads(T_matmul_NT[i, j], rxplaceholder[i, k], rxplaceholder_1[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[rxplaceholder_1], "meta_schedule.tiling_structure":"SSRSRS"})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + rxplaceholder[i, k] * rxplaceholder_1[j, k]
```

## Problem

Though our TVMScript printer can easily print `int64` constants, the parser had poor support for `int64`. So this PR introduces some parser support for `int64`, basically about the data type of loop variables, block iterators and block read/write regions.

Besides the parser, most of the TIR schedule primitives didn't take `int64` into account in their implementations. These schedule primitives will be fixed and updated in recent future, in followup PRs.
- Added device validity check in allocation. HexagonDeviceAPI should
  only be called for CPU/Hexagon types.

- Check for "global.vtcm" scope instead of "vtcm".  The ccope of N-d
  allocations produced by `LowerVtcmAlloc` should be `"global.vtcm"`.
  The previous check allowed unsupported scope such as `"local.vtcm"`.

- Remove `vtcmallocs` entry after calling free. Previously, the vtcm
  allocation map kept dangling pointers to `HexagonBuffer` objects
  after they had been freed.

- Rename N-d alloc and free packed functions.  Since most of the
  similar device functions use snake case, renaming `*.AllocND` to
  `*.alloc_nd` and `*.FreeND` to `*.free_nd`.

Co-authored-by: Adam Straw <astraw@octoml.ai>

Co-authored-by: Adam Straw <astraw@octoml.ai>
Found the CPU affinity setting not work in pipeline executor, the
symptom is that there is no perf change after doing cpu affinity
change. the reason is that only the ConfigRuntime stored the cpu
affinity setting but the BackendRuntime class not.
… non-int32 dtypes (#10795)

_This PR is a follow-up effort of #10789, which enables the `int64` support for TIR schedule primitive Cache-Read and Cache-Write._

Prior to this PR, the IterVars of the generated cache stage block are always `int32`-typed, which might conflict with the dtypes of the domains of the IterVars.

In this PR, the dtype of new IterVars are constructed according to the data types of their domains, and thereby the possible conflicts are resolved. Meanwhile the data types of the read/write regions of the cache stage blocks are also constructed according to correct data types.
* use python3.7 install script in ci-qemu

* update pyton venv to 3.7

* setuptools is just python3...

* don't use apt-add-repository (breaks with python3.7 as python3 on ubuntu 18.04
This is a follow up to #10695.

Change-Id: I7f2dc14826cefea81fe5ff69c6255cdb5dc7f5c0
* [LIBXSMM] add libxsmm to TVM CI.

* Config "make" thread number in a more flexible way.

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* Empty commit to trigger github CI.

* Update ubuntu_install_libxsmm.sh.

* Trigger CI tasks.

* Trigger CI tasks.

Co-authored-by: wenxizhu <wenxizhu@tencent.com>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
…10762)

Uses reference kernels for the codegen tests when the version of
Tensorflow is >= 2.5.0 (as this is the first version this functionality
was added). Otherwise, fallback to running the tests without.

Change-Id: I92b24ad259d2fda2fed497aa0fe6d7f11a0db85a
* [ARM] Fix NCHWc int8 dot product schedule lowering

* fix arm task extraction test not running

* skip test on i386
…ction (#10686)

Add supported for LetStmts in feature extraction. A stack of variable definitions is maintained and added to the arithmetic analyzer at the appropriate points. The buffer access analysis now creates a new arithmetic analysis context per set of loops to avoid redefining variables which is unsafe in the presence of let statements.
* Make Schedule::Copy non-const, fork RND seed in Copy

* fork seed in traced schedule copy too

commit eeb4a6d
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Tue Mar 29 06:39:38 2022 +0900

    add more comment

commit 183b4cf
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Mon Mar 28 10:04:12 2022 +0900

    skip flaky vk test

commit c19ecc1
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Mon Mar 28 07:34:25 2022 +0900

    move intrin decl for vector type

commit 3dd7f04
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 09:40:29 2022 +0900

    disable default post processor, tuning now works with compactness check

commit 2f6fdae
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 08:08:35 2022 +0900

    more comment

commit c7ebfa9
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 07:42:46 2022 +0900

    add comment

commit 78400ba
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 07:40:28 2022 +0900

    disable tuning test for now

commit a33243f
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 07:30:03 2022 +0900

    remove annotation check in ir comparator

commit 105f98c
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 07:28:36 2022 +0900

    clean up

commit 8aa16f2
Author: Masahiro Masuda <masahi129@gmail.com>
Date:   Sat Mar 26 07:15:24 2022 +0900

    Add test

* add test case that hangs without forkseed
Krzysztof Parzyszek and others added 28 commits April 19, 2022 10:15
Increase the default stack size to 256kB, since this is the minimum
main thread stack size in QuRT on simulator.
* Refactor hexagon test scripts

* rever removing the script
* Attempt to prevent concurrent update in Map

Calling Map::Set invalidates exising iterators to protect from
using already deleted data due to re-hashing

Change-Id: Ib6b580758e74c8b77ed560932d87b643bd6c9402

* Migrated to using TVM_LOG_DEBUG

Now uses TVM_LOG_DEBUG
Map state_marker made atomic

Change-Id: I090c4b33e6edaa977cccba11f8d1c6ff3fbca430

* removed usage of atomics

Change-Id: I7bd930cb52d58ca10fd49a5fe8f5d48b3e955d0a
* [OpenCL] Fix type casting

The previous PR #11021 was reverted in #11035 due
to it affected performance of generated OpenCL code.

This PR fixed the same issue but doesn't lead to performance
degradation. Tested on Resnet50_v2 network.

* Implement using select built-in
The buggy script as below:
```python
import tvm
from tvm import relay
from tvm.contrib import graph_executor
x = relay.var("x", shape=[1, 3, 224, 224], dtype="float16")
y = relay.nn.lrn(x)
mod = tvm.IRModule.from_expr(relay.Function([x], y))
lib = relay.build(mod, target="llvm")
f = graph_executor.GraphModule(lib["default"](tvm.cpu()))
f.run()
```

The error I get is
```
Check failed: ret == 0 (-1 vs. 0) : Assert fail: (((tir.tvm_struct_get(arg.T_divide, 0, 5) == (uint8)2) && (tir.tvm_struct_get(arg.T_divide, 0, 6) == (uint8)32)) && (tir.tvm_struct_get(arg.T_divide, 0, 7) == (uint16)1)), arg.T_divide.dtype is expected to be float32
```
* [CMSIS-NN] Add Arm(R) Cortex(R)-M55 CPU and CMSIS-NN demo

- Downloads a quantized (int8) person detection model
- Uses tvmc to compile the model for Cortex(R)-M55 CPU and CMSIS-NN
- Downloads an image to run the model on
- Creates a C header file inputs.c containing the image data as a C array
- Builds the demo application
- Runs the demo application on the FVP
- Application reports whether a person was detected e.g. "Person detected"

Change-Id: If58d02ed0c4d2a85c0100398f65e6915a86f6546

* [CMSIS-NN] Add Arm(R) Cortex(R)-M55 CPU and CMSIS-NN demo
- Downloads a quantized (int8) person detection model
- Uses tvmc to compile the model for Cortex(R)-M55 CPU and CMSIS-NN
- Downloads an image to run the model on
- Creates a C header file inputs.c containing the image data as a C array
- Builds the demo application
- Runs the demo application on the FVP
- Application reports whether a person was detected e.g. "Person detected"

Change-Id: Ic20ceed80bc6e48d5c96ff0d5ca6c85e7f19174b
…11050)

* [TIR] Add TensorizeInfo and GetTensorizeLoopMapping

* expose PreOrderVisit to python

* add test case

* add conv2d nchwc test

* add mma test

* add arm nhwc conv2d test

* Revert "add arm nhwc conv2d test"

This reverts commit eb147f3.

* refine

* add doc

* update

* fixd condition

* black

* pylint

* Update python/tvm/tir/schedule/analysis.py

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

* run black

* bring back logic in original code to support loop permutation

* add comment

* simplify

* minor fix to test

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
* [microNPU] Performance model bugfixes

* Fixed incorrect num_blocks calculations for both BufferModes.
* Fixed similar issues with Read/Write byte calculations.
* Fixed an issue where the 'partkernel' flag was not propagated to
  the performance estimation code.
* Fixed single buffering check incorrectly used output shape and
  block rather than the input shape and block.
* Fixed block config not aligned to micro block for Elementwise.

Change-Id: Ide6b231bc1a17c65bed20129d2179a215ada14b2

* Address review comment

Changed incorrect usage of 'max_width' to 'max_depth'.
Add the `configs` directory to be part of the installed version of
TVM in the setuptools configuration, and introduce a new function
to load the `configs` directory from the right paths both when TVM
is locally installed for development, as well as, when it is installed
as a package.
* Fix bug allowing microTVM to be used with Arduino version v0.20 and   
above (see changes to _parse_connected_boards) and adds relevant unit   
tests.                                                                  
                                                                        
* Only perform version check when calling build or flash (things that   
actually require arduino-cli), and adds relevant unit tests.            
                                                                        
* Only raise a warning if the arduino-cli version present is below the  
min version (previously any version other than v0.18 would cause an     
error).                                                                 
                                                                        
* Change version comparison to use version.check, like the rest of TVM
…anspose (#10952)

* Support input scale and zp of 1-element vector in qnn.conv2d_transpose

* Lint
* support  Pool layout is CHW

* fix lint test

* change the if condition
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
)

* [RPC] Don't use existence of USE_HEXAGON_SDK as enablement check

Use USE_HEXAGON to check if Hexagon support is enabled or not.

This fixes #11059.

* Restart CI
Retrigger CI.

Address issues.

Retrigger CI.
Previously, the size of the memory which should be allocated was
calculated as multiplication width on height. It doesn't work well in
case when one texture has big size in height and the next one big size
in width. We tried to reuse the allocated memory and every time when
the next texture with big size was used we reallocated the previous
one. It has huge impact on the performance.
Now we check two dimensions independently. So, in this case we will
check both dimensions and it helps us to avoid the situation with
cyclic memory reallocation.
* unary op for resize2d and test

* renamed test

* added log in quantized form

* black'd some files

* changed suggested commentary
* [TVMScript] Allow `val = buf[index]` without type annotation

Other instances of `var = expr` were previously allowed without
requiring a type annotation, by using the dtype of the expression as
the dtype of `var`.  This behavior didn't work for `buf[index]`
expressions, which are internally represented as `BufferSlice` python
objects, and only converted to `BufferLoad` primexprs when used as an
expression.

This commit adds a `dtype` property to `BufferSlice`, allowing
`buf[index]` to be used in a let statement without a type annotation.

* Reverted a wider change

Automatically adding a type annotation to Var if it could be
determined from the dtype let the unit test directly compare the
annotated and unannotated versions of buffer load.  Unfortunately, it
also broke 54 unrelated tests, so that change is removed from this PR.
* [TIR] StmtFunctor RenewDefs

In this PR, I introduce a StmtFunctor `RenewDefs` for deep copy all definition nodes in PrimFunc (including Var, Buffer, and IterVar). This functor can create a new PrimFunc with the same behavior as the old one but contains different Nodes.

This Functor may help TIR fusion or inline multiple PrimFuncs

* add ut

* address comments

* address comments

* lint

* lint
* [microNPU] Integrate rolling buffers in Arm(R) Ethos(TM)-U

Change-Id: Iede5e68981a063f6eb1e118433cc2c92e175af52

* Add documentation for create_tiles

* Fix linter issues

* Fix integration tests
* STM32: add as a new target

* STM32: Target takes a board ID rather then a series.

* STM32: target series.

* STM32: Fixed lint issues.
* AOT with LLVM Codegen on Hexagon

* Address comments
@blackkker blackkker closed this Apr 22, 2022
blackkker pushed a commit that referenced this pull request Sep 19, 2023
…pache#15483)

* [Script] Be more careful when generating ast.ExtSlice for Subscript

The ast.ExtSlice expects a non-empty list, otherwise evaluation
fails with "error: empty dims on ExtSlice". Also, each element
in "dims" list of ExtSlice must be either Slice or Index.

In python3.8 an expression A[()] is parsed (by ast) as Subscript
with slice being Index(value=Tuple(elts=[])). When we translate a
subscript from doc.AST to ast, we unconditionally convert every
tuple to ast.ExtSlice, which in this case is incorrect.

The fix is to map empty tuple back to the Index(Tuple[])) instead
of ExtSlice. In other cases, ensure that members of ExtSlice are
of correct types.

* Fix lint #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.