forked from apache/tvm
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][TFLite] update logical_binary using new function #1
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…10411) Builds upon the work in #10254 to remove identity operations sandwiched between two non-compute operations (reshape/strided slice - concatenate is handled differently), under certain conditions. Specifically, an identity operation is not removed when the dimensionality between the two non-compute operations is reduced, due to non-congruent values being accessed incorrectly. For example, ``` strided_slice(dims=4) -> identity -> reshape(dims=4) ``` becomes... ``` strided_slice -> reshape ``` but, ``` strided_slice(dims=4) -> identity -> reshape(dims=2) ``` remains as... ``` strided_slice -> identity -> reshape ``` Change-Id: Ie28ba384fcb3230d6f4651c0c19e2b9526ebcc42
Currently failing with `USE_CMSISNN` set to `OFF`
This mimics the behaviour of aot_test_utils.py to ensure the tests don't start running when the toolchain isn't available.
I believe the flakiness in #10748 is the small chance of generating a 1x1 or 1xn convolution which allows for a different buffer size: https://github.com/apache/tvm/blob/63461c0c97c307e581271708c3490f5275675a1a/src/relay/backend/contrib/cmsisnn/buffer_size.cc#L38-L47 Therefore, careful selection of the distribution should alleviate this issue.
* [Hexagon] Improved ergonomics of HexagonLauncher in unit tests. The goal of this commit is to reduce/eliminate common code required through unit tests that interact with Hexagon hardware. - New testing fixtures in `tests/python/contrib/test_hexagon`. A test running on hexagon hardware should only need to use the `hexagon_session` fixture. - `rpc_server_port`: Iterates through port numbers, selecting an unused port for each unit test. Avoids needing to explicitly specify unique ports for each unit test. - `tvm_tracker`: Starts a tracker on use, exits after test. Avoids needing to manually start a tracker prior to running the unit test. - `hexagon_launcher`: Starts a `HexagonLauncher` server on use, stops server after test. Avoids needing to call `start_server()` and `stop_server()` in each test. - `hexagon_session`: Starts a hexagon session using `hexagon_laucnehr.start_session()`, exits after test. - Added `Session.upload` function, which delegates to `HexagonLauncher.upload`. Avoids needing to interact with both the launcher and the session. - Allowed `tvm.IRModule` as argument passed to `Session.load_module`, which will automatically save/upload the module, then load it. Avoids needing to handle save/upload of temporary files in each unit test. * Added default port for tracker if not already set. * Pass through None from hexagon_launcher to hexagon_session. * Updated launcher to use external tracker if specified. * Avoid setting up the local tracker unless required. * Declare previous_port as global, instead of list. * Corrected type hints. * Docstring updates
This PR added an analysis function `SuggestIndexMap` to analyze buffer access pattern and suggest index map for layout transformations. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
* [microNPU] Add a pass to move allocate nodes to the outer scope Adds a pass called `HoistAllocates` to move allocate nodes to the top of the body of the main function. In doing so, it opens the door to other optimizations that need to swap the ordering of external calls. Pass illustration: (before) ``` allocate { extern_call { allocate { extern_call { } } } } ``` (after) ``` allocate { allocate { extern_call extern_call } } ``` Change-Id: Ibcfc3c75b15deebb5c6645a4923a6ddf683b37c4 * address comments * uses prim func pass, rather than module pass. * adds error message informing user to run this pass with LowerToTIR() pass for now. Change-Id: I57757b9dc5bff0208034a974a341c09cce0294bc * Support allocates when not followed by a sequence statement With a test to back this case up. Change-Id: I670809f5ee53b583a15d9b783852dda3089756e9 * Add new directory tir/contrib/ethosu to cmake build Change-Id: I3e9f24adfe992ace4e03238a18a8378b03257e1a
* [ci] Generate Jenkinsfile from a template This uses `jinja2` to generate the Jenkinsfile. This is useful since it lets us both keep common functionality easy to define (i.e. iterate over all images and do something) while keeping the output easy to debug (you can look at the `Jenkinsfile` directly instead of trying to imagine what the Groovy interpreter will do). This will become more useful as we start to make CI more configurable, such as adding dynamic test sharding. This mostly introduces the infrastructure and makes some token changes to demonstrate the generation process, but already its use is shown since the parameters was missing an entry for the `ci_hexagon` image. * Address comments, fix CI with temporary workaround Co-authored-by: driazati <driazati@users.noreply.github.com>
* fix reduce crash on scalar inputs * fix uncovered cases. * fix on different opset to pass ci
…0792) Co-authored-by: driazati <driazati@users.noreply.github.com>
The code should be checking DSPRPC_LIB_DIRS instead of REMOTE_DIR.
Renames variable from `runtime` with `executor` to better reflect current terminology and reduce confusion.
## Context When dealing with end-to-end models, we note that some tensors may have large shapes. Thus, when designing graph-level IR, we sometimes use `int64` instead of `int32` for the shape. Below is an dense GeMM example which has `int64` input tensor shape: ```python @tvm.script.ir_module class Module: @T.prim_func def main(rxplaceholder: T.Buffer[(1, 512), "float32"], rxplaceholder_1: T.Buffer[(T.int64(1000), T.int64(512)), "float32"], T_matmul_NT: T.Buffer[(1, T.int64(1000)), "float32"]) -> None: # function attr dict T.func_attr({"global_symbol": "dense", "tir.noalias": True, "op_pattern": 3}) # body # with T.block("root") for i0_0, i1_0, i0_1, i1_1, i2_0, i0_2, i1_2, i2_1, i0_3, i1_3 in T.grid(1, 4, 1, 25, 8, 1, 10, 64, 1, 1): with T.block("T_matmul_NT"): i = T.axis.spatial(1, 0) j = T.axis.spatial(T.int64(1000), i1_0 * T.int64(250) + i1_1 * T.int64(10) + i1_2) k = T.axis.reduce(512, i2_0 * 64 + i2_1) T.reads(T_matmul_NT[i, j], rxplaceholder[i, k], rxplaceholder_1[j, k]) T.writes(T_matmul_NT[i, j]) T.block_attr({"layout_free_placeholders":[rxplaceholder_1], "meta_schedule.tiling_structure":"SSRSRS"}) with T.init(): T_matmul_NT[i, j] = T.float32(0) T_matmul_NT[i, j] = T_matmul_NT[i, j] + rxplaceholder[i, k] * rxplaceholder_1[j, k] ``` ## Problem Though our TVMScript printer can easily print `int64` constants, the parser had poor support for `int64`. So this PR introduces some parser support for `int64`, basically about the data type of loop variables, block iterators and block read/write regions. Besides the parser, most of the TIR schedule primitives didn't take `int64` into account in their implementations. These schedule primitives will be fixed and updated in recent future, in followup PRs.
- Added device validity check in allocation. HexagonDeviceAPI should only be called for CPU/Hexagon types. - Check for "global.vtcm" scope instead of "vtcm". The ccope of N-d allocations produced by `LowerVtcmAlloc` should be `"global.vtcm"`. The previous check allowed unsupported scope such as `"local.vtcm"`. - Remove `vtcmallocs` entry after calling free. Previously, the vtcm allocation map kept dangling pointers to `HexagonBuffer` objects after they had been freed. - Rename N-d alloc and free packed functions. Since most of the similar device functions use snake case, renaming `*.AllocND` to `*.alloc_nd` and `*.FreeND` to `*.free_nd`. Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Adam Straw <astraw@octoml.ai>
Found the CPU affinity setting not work in pipeline executor, the symptom is that there is no perf change after doing cpu affinity change. the reason is that only the ConfigRuntime stored the cpu affinity setting but the BackendRuntime class not.
… non-int32 dtypes (#10795) _This PR is a follow-up effort of #10789, which enables the `int64` support for TIR schedule primitive Cache-Read and Cache-Write._ Prior to this PR, the IterVars of the generated cache stage block are always `int32`-typed, which might conflict with the dtypes of the domains of the IterVars. In this PR, the dtype of new IterVars are constructed according to the data types of their domains, and thereby the possible conflicts are resolved. Meanwhile the data types of the read/write regions of the cache stage blocks are also constructed according to correct data types.
* use python3.7 install script in ci-qemu * update pyton venv to 3.7 * setuptools is just python3... * don't use apt-add-repository (breaks with python3.7 as python3 on ubuntu 18.04
This is a follow up to #10695. Change-Id: I7f2dc14826cefea81fe5ff69c6255cdb5dc7f5c0
* [LIBXSMM] add libxsmm to TVM CI. * Config "make" thread number in a more flexible way. Co-authored-by: Cody Yu <comaniac0422@gmail.com> * Empty commit to trigger github CI. * Update ubuntu_install_libxsmm.sh. * Trigger CI tasks. * Trigger CI tasks. Co-authored-by: wenxizhu <wenxizhu@tencent.com> Co-authored-by: Cody Yu <comaniac0422@gmail.com>
…10762) Uses reference kernels for the codegen tests when the version of Tensorflow is >= 2.5.0 (as this is the first version this functionality was added). Otherwise, fallback to running the tests without. Change-Id: I92b24ad259d2fda2fed497aa0fe6d7f11a0db85a
* [ARM] Fix NCHWc int8 dot product schedule lowering * fix arm task extraction test not running * skip test on i386
…ction (#10686) Add supported for LetStmts in feature extraction. A stack of variable definitions is maintained and added to the arithmetic analyzer at the appropriate points. The buffer access analysis now creates a new arithmetic analysis context per set of loops to avoid redefining variables which is unsafe in the presence of let statements.
* Make Schedule::Copy non-const, fork RND seed in Copy * fork seed in traced schedule copy too commit eeb4a6d Author: Masahiro Masuda <masahi129@gmail.com> Date: Tue Mar 29 06:39:38 2022 +0900 add more comment commit 183b4cf Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Mar 28 10:04:12 2022 +0900 skip flaky vk test commit c19ecc1 Author: Masahiro Masuda <masahi129@gmail.com> Date: Mon Mar 28 07:34:25 2022 +0900 move intrin decl for vector type commit 3dd7f04 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 09:40:29 2022 +0900 disable default post processor, tuning now works with compactness check commit 2f6fdae Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 08:08:35 2022 +0900 more comment commit c7ebfa9 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 07:42:46 2022 +0900 add comment commit 78400ba Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 07:40:28 2022 +0900 disable tuning test for now commit a33243f Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 07:30:03 2022 +0900 remove annotation check in ir comparator commit 105f98c Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 07:28:36 2022 +0900 clean up commit 8aa16f2 Author: Masahiro Masuda <masahi129@gmail.com> Date: Sat Mar 26 07:15:24 2022 +0900 Add test * add test case that hangs without forkseed
Increase the default stack size to 256kB, since this is the minimum main thread stack size in QuRT on simulator.
* Refactor hexagon test scripts * rever removing the script
* Attempt to prevent concurrent update in Map Calling Map::Set invalidates exising iterators to protect from using already deleted data due to re-hashing Change-Id: Ib6b580758e74c8b77ed560932d87b643bd6c9402 * Migrated to using TVM_LOG_DEBUG Now uses TVM_LOG_DEBUG Map state_marker made atomic Change-Id: I090c4b33e6edaa977cccba11f8d1c6ff3fbca430 * removed usage of atomics Change-Id: I7bd930cb52d58ca10fd49a5fe8f5d48b3e955d0a
The buggy script as below: ```python import tvm from tvm import relay from tvm.contrib import graph_executor x = relay.var("x", shape=[1, 3, 224, 224], dtype="float16") y = relay.nn.lrn(x) mod = tvm.IRModule.from_expr(relay.Function([x], y)) lib = relay.build(mod, target="llvm") f = graph_executor.GraphModule(lib["default"](tvm.cpu())) f.run() ``` The error I get is ``` Check failed: ret == 0 (-1 vs. 0) : Assert fail: (((tir.tvm_struct_get(arg.T_divide, 0, 5) == (uint8)2) && (tir.tvm_struct_get(arg.T_divide, 0, 6) == (uint8)32)) && (tir.tvm_struct_get(arg.T_divide, 0, 7) == (uint16)1)), arg.T_divide.dtype is expected to be float32 ```
* [CMSIS-NN] Add Arm(R) Cortex(R)-M55 CPU and CMSIS-NN demo - Downloads a quantized (int8) person detection model - Uses tvmc to compile the model for Cortex(R)-M55 CPU and CMSIS-NN - Downloads an image to run the model on - Creates a C header file inputs.c containing the image data as a C array - Builds the demo application - Runs the demo application on the FVP - Application reports whether a person was detected e.g. "Person detected" Change-Id: If58d02ed0c4d2a85c0100398f65e6915a86f6546 * [CMSIS-NN] Add Arm(R) Cortex(R)-M55 CPU and CMSIS-NN demo - Downloads a quantized (int8) person detection model - Uses tvmc to compile the model for Cortex(R)-M55 CPU and CMSIS-NN - Downloads an image to run the model on - Creates a C header file inputs.c containing the image data as a C array - Builds the demo application - Runs the demo application on the FVP - Application reports whether a person was detected e.g. "Person detected" Change-Id: Ic20ceed80bc6e48d5c96ff0d5ca6c85e7f19174b
…11050) * [TIR] Add TensorizeInfo and GetTensorizeLoopMapping * expose PreOrderVisit to python * add test case * add conv2d nchwc test * add mma test * add arm nhwc conv2d test * Revert "add arm nhwc conv2d test" This reverts commit eb147f3. * refine * add doc * update * fixd condition * black * pylint * Update python/tvm/tir/schedule/analysis.py Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * run black * bring back logic in original code to support loop permutation * add comment * simplify * minor fix to test Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
* [microNPU] Performance model bugfixes * Fixed incorrect num_blocks calculations for both BufferModes. * Fixed similar issues with Read/Write byte calculations. * Fixed an issue where the 'partkernel' flag was not propagated to the performance estimation code. * Fixed single buffering check incorrectly used output shape and block rather than the input shape and block. * Fixed block config not aligned to micro block for Elementwise. Change-Id: Ide6b231bc1a17c65bed20129d2179a215ada14b2 * Address review comment Changed incorrect usage of 'max_width' to 'max_depth'.
Add the `configs` directory to be part of the installed version of TVM in the setuptools configuration, and introduce a new function to load the `configs` directory from the right paths both when TVM is locally installed for development, as well as, when it is installed as a package.
* Fix bug allowing microTVM to be used with Arduino version v0.20 and above (see changes to _parse_connected_boards) and adds relevant unit tests. * Only perform version check when calling build or flash (things that actually require arduino-cli), and adds relevant unit tests. * Only raise a warning if the arduino-cli version present is below the min version (previously any version other than v0.18 would cause an error). * Change version comparison to use version.check, like the rest of TVM
…anspose (#10952) * Support input scale and zp of 1-element vector in qnn.conv2d_transpose * Lint
* support Pool layout is CHW * fix lint test * change the if condition
Refer to the issue in #10831 (comment)
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>
) * [RPC] Don't use existence of USE_HEXAGON_SDK as enablement check Use USE_HEXAGON to check if Hexagon support is enabled or not. This fixes #11059. * Restart CI
Retrigger CI. Address issues. Retrigger CI.
Previously, the size of the memory which should be allocated was calculated as multiplication width on height. It doesn't work well in case when one texture has big size in height and the next one big size in width. We tried to reuse the allocated memory and every time when the next texture with big size was used we reallocated the previous one. It has huge impact on the performance. Now we check two dimensions independently. So, in this case we will check both dimensions and it helps us to avoid the situation with cyclic memory reallocation.
* unary op for resize2d and test * renamed test * added log in quantized form * black'd some files * changed suggested commentary
* [TVMScript] Allow `val = buf[index]` without type annotation Other instances of `var = expr` were previously allowed without requiring a type annotation, by using the dtype of the expression as the dtype of `var`. This behavior didn't work for `buf[index]` expressions, which are internally represented as `BufferSlice` python objects, and only converted to `BufferLoad` primexprs when used as an expression. This commit adds a `dtype` property to `BufferSlice`, allowing `buf[index]` to be used in a let statement without a type annotation. * Reverted a wider change Automatically adding a type annotation to Var if it could be determined from the dtype let the unit test directly compare the annotated and unannotated versions of buffer load. Unfortunately, it also broke 54 unrelated tests, so that change is removed from this PR.
* [TIR] StmtFunctor RenewDefs In this PR, I introduce a StmtFunctor `RenewDefs` for deep copy all definition nodes in PrimFunc (including Var, Buffer, and IterVar). This functor can create a new PrimFunc with the same behavior as the old one but contains different Nodes. This Functor may help TIR fusion or inline multiple PrimFuncs * add ut * address comments * address comments * lint * lint
* [microNPU] Integrate rolling buffers in Arm(R) Ethos(TM)-U Change-Id: Iede5e68981a063f6eb1e118433cc2c92e175af52 * Add documentation for create_tiles * Fix linter issues * Fix integration tests
* STM32: add as a new target * STM32: Target takes a board ID rather then a series. * STM32: target series. * STM32: Fixed lint issues.
* AOT with LLVM Codegen on Hexagon * Address comments
blackkker
pushed a commit
that referenced
this pull request
Sep 19, 2023
…pache#15483) * [Script] Be more careful when generating ast.ExtSlice for Subscript The ast.ExtSlice expects a non-empty list, otherwise evaluation fails with "error: empty dims on ExtSlice". Also, each element in "dims" list of ExtSlice must be either Slice or Index. In python3.8 an expression A[()] is parsed (by ast) as Subscript with slice being Index(value=Tuple(elts=[])). When we translate a subscript from doc.AST to ast, we unconditionally convert every tuple to ast.ExtSlice, which in this case is incorrect. The fix is to map empty tuple back to the Index(Tuple[])) instead of ExtSlice. In other cases, ensure that members of ExtSlice are of correct types. * Fix lint #1
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When importing the MoveNet_MultiPose model.
I met a keyerror as the pic shows:
it seems to be the error of
self.get_expr()
.But, I can't reproduce the test case for the logical_and op that is consistent with the model.
Also, i found PR#7048 updated the
def get_tensor_expr
, and some op have used the latest functions.So, i update it! And my model complie successful.