diff --git a/docs/demo_guides/amlogic_npu.md b/docs/demo_guides/amlogic_npu.md index 19396472882..8941b001cc0 100644 --- a/docs/demo_guides/amlogic_npu.md +++ b/docs/demo_guides/amlogic_npu.md @@ -223,12 +223,13 @@ Paddle Lite 已支持晶晨 NPU 的预测部署。 3)`build.sh` 根据入参生成针对不同操作系统、体系结构的二进制程序,需查阅注释信息配置正确的参数值。 4)`run_with_adb.sh` 入参包括模型名称、操作系统、体系结构、目标设备、设备序列号等,需查阅注释信息配置正确的参数值。 5)`run_with_ssh.sh` 入参包括模型名称、操作系统、体系结构、目标设备、ip地址、用户名、用户密码等,需查阅注释信息配置正确的参数值。 + 6)下述命令行示例中涉及的具体IP、SSH账号密码、设备序列号等均为示例环境,请用户根据自身实际设备环境修改。 在 ARM CPU 上运行 mobilenet_v1_int8_224_per_layer 全量化模型 $ cd PaddleLite-generic-demo/image_classification_demo/shell For C308X - $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer linux arm64 cpu + $ ./run_with_ssh.sh mobilenet_v1_int8_224_per_layer linux arm64 cpu 192.168.100.244 22 root 123456 (C308X) warmup: 1 repeat: 5, average: 167.6916 ms, max: 207.458000 ms, min: 159.823239 ms results: 3 @@ -240,7 +241,7 @@ Paddle Lite 已支持晶晨 NPU 的预测部署。 Postprocess time: 0.542000 ms For A311D - $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer linux arm64 cpu + $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer linux arm64 cpu 0123456789ABCDEF (A311D) warmup: 1 repeat: 15, average: 81.678067 ms, max: 81.945999 ms, min: 81.591003 ms results: 3 @@ -252,7 +253,7 @@ Paddle Lite 已支持晶晨 NPU 的预测部署。 Postprocess time: 0.407000 ms For S905D3(Android版) - $ ./run_with_ssh.sh mobilenet_v1_int8_224_per_layer android armeabi-v7a cpu + $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer android armeabi-v7a cpu c8631471d5cd (S905D3(Android版)) warmup: 1 repeat: 5, average: 280.465997 ms, max: 358.815002 ms, min: 268.549812 ms results: 3 @@ -269,7 +270,7 @@ Paddle Lite 已支持晶晨 NPU 的预测部署。 $ cd PaddleLite-generic-demo/image_classification_demo/shell For C308X - $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer linux arm64 amlogic_npu + $ ./run_with_ssh.sh mobilenet_v1_int8_224_per_layer linux arm64 amlogic_npu 192.168.100.244 22 root 123456 (C308X) warmup: 1 repeat: 5, average: 6.982800 ms, max: 7.045000 ms, min: 6.951000 ms results: 3 @@ -281,7 +282,7 @@ Paddle Lite 已支持晶晨 NPU 的预测部署。 Postprocess time: 0.509000 ms For A311D - $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer linux arm64 amlogic_npu + $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer linux arm64 amlogic_npu 0123456789ABCDEF ( A311D) warmup: 1 repeat: 15, average: 5.567867 ms, max: 5.723000 ms, min: 5.461000 ms results: 3 @@ -293,7 +294,7 @@ Paddle Lite 已支持晶晨 NPU 的预测部署。 Postprocess time: 0.411000 ms For S905D3(Android版) - $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer android armeabi-v7a amlogic_npu + $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer android armeabi-v7a amlogic_npu c8631471d5cd (S905D3(Android版)) warmup: 1 repeat: 5, average: 13.4116 ms, max: 15.751210 ms, min: 12.433400 ms results: 3 diff --git a/docs/demo_guides/rockchip_npu.md b/docs/demo_guides/rockchip_npu.md index 4906745c637..ae92f94442e 100644 --- a/docs/demo_guides/rockchip_npu.md +++ b/docs/demo_guides/rockchip_npu.md @@ -208,6 +208,7 @@ Paddle Lite 已支持 Rockchip NPU 的预测部署。 3)`build.sh` 根据入参生成针对不同操作系统、体系结构的二进制程序,需查阅注释信息配置正确的参数值。 4)`run_with_adb.sh` 入参包括模型名称、操作系统、体系结构、目标设备、设备序列号等,需查阅注释信息配置正确的参数值。 5)`run_with_ssh.sh` 入参包括模型名称、操作系统、体系结构、目标设备、ip 地址、用户名、用户密码等,需查阅注释信息配置正确的参数值。 + 6)下述命令行示例中涉及的具体IP、SSH账号密码、设备序列号等均为示例环境,请用户根据自身实际设备环境修改。 在 ARM CPU 上运行 mobilenet_v1_int8_224_per_layer 全量化模型 $ cd PaddleLite-generic-demo/image_classification_demo/shell diff --git a/docs/demo_guides/verisilicon_timvx.md b/docs/demo_guides/verisilicon_timvx.md new file mode 100644 index 00000000000..011fd95f7af --- /dev/null +++ b/docs/demo_guides/verisilicon_timvx.md @@ -0,0 +1,434 @@ +# 芯原 TIM-VX 部署示例 + +Paddle Lite 已支持通过 TIM-VX 的方式调用芯原 NPU 算力的预测部署。 +其接入原理是与其他接入 Paddle Lite 的新硬件类似,即加载并分析 Paddle 模型,首先将 Paddle 算子转成 NNAdapter 标准算子,其次再通过 TIM-VX 的组网 API 进行网络构建,在线编译模型并执行模型。 + +需要注意的是,芯原(verisilicon)作为 IP 设计厂商,本身并不提供实体SoC产品,而是授权其 IP 给芯片厂商,如:晶晨(Amlogic),瑞芯微(Rockchip)等。因此本文是适用于被芯原授权了 NPU IP 的芯片产品。只要芯片产品没有大副修改芯原的底层库,则该芯片就可以使用本文档作为 Paddle Lite 推理部署的参考和教程。在本文中,晶晨 SoC 中的 NPU 和 瑞芯微 SoC 中的 NPU 统称为芯原 NPU。 + +本文档与[ 晶晨 NPU 部署示例 ](./amlogic_npu)和[ 瑞芯微 NPU 部署示例 ](./rockchip_npu)中所描述的部署示例相比,虽然涉及的部分芯片产品相同,但前者是通过 IP 厂商芯原的 TIM-VX 框架接入 Paddle Lite,后二者是通过各自芯片 DDK 接入 Paddle Lite。接入方式不同,支持的算子和模型范围也有所区别。 + +## 支持现状 + +### 已支持的芯片 + +- Amlogic A311D + +- Amlogic S905D3 + + 注意:理论上支持所有经过芯原授权了 NPU IP 的 SoC(须有匹配版本的 NPU 驱动,下文描述),上述为经过测试的部分。 + +### 已支持的 Paddle 模型 + +#### 模型 + +- [mobilenet_v1_int8_224_per_layer](https://paddlelite-demo.bj.bcebos.com/models/mobilenet_v1_int8_224_per_layer.tar.gz) +- [resnet50_int8_224_per_layer](https://paddlelite-demo.bj.bcebos.com/models/resnet50_int8_224_per_layer.tar.gz) +- [ssd_mobilenet_v1_relu_voc_int8_300_per_layer](https://paddlelite-demo.bj.bcebos.com/models/ssd_mobilenet_v1_relu_voc_int8_300_per_layer.tar.gz) + +#### 性能 + +- 测试环境 + - 编译环境 + - Ubuntu 16.04,GCC 5.4 for ARMLinux armhf and aarch64 + + - 硬件环境 + - Amlogic A311D + - CPU:4 x ARM Cortex-A73 \+ 2 x ARM Cortex-A53 + - NPU:5 TOPs for INT8 + - Amlogic S905D3(Android 版本) + - CPU:2 x ARM Cortex-55 + - NPU:1.2 TOPs for INT8 + +- 测试方法 + - warmup=1, repeats=5,统计平均时间,单位是 ms + - 线程数为1,`paddle::lite_api::PowerMode CPU_POWER_MODE`设置为` paddle::lite_api::PowerMode::LITE_POWER_HIGH ` + - 分类模型的输入图像维度是{1, 3, 224, 224} + +- 测试结果 + + |模型 |A311D||S905D3(Android 版本)|| + |---|---|---|---|---| + | |CPU(ms) | NPU(ms) |CPU(ms) | NPU(ms) | + |mobilenet_v1_int8_224_per_layer| 81.632133 | 5.112500 | 280.465997 | 12.808100 | + |resnet50_int8_224_per_layer| 390.498300| 17.583200 | 787.532340 | 41.313999 | + |ssd_mobilenet_v1_relu_voc_int8_300_per_layer| 134.991560| 15.216700 | 295.48919| 40.108970 | + +### 已支持(或部分支持)NNAdapter 的 Paddle 算子 + +您可以查阅[ NNAdapter 算子支持列表](https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/lite/kernels/nnadapter/converter/all.h)获得各算子在不同新硬件上的最新支持信息。 + +## 参考示例演示 + +### 测试设备 + +- Khadas VIM3 开发板(SoC 为 Amlogic A311D) + + A311D + + + +- Khadas VIM3L 开发板(SoC 为 Amlogic S905D3) + + A311D + +### 准备设备环境 + +- A311D + + - 需要驱动版本为 6.4.4.3(下载驱动请联系开发板厂商)。 + + - 注意是 64 位系统。 + + - 提供了网络连接 SSH 登录的方式,部分系统提供了adb连接的方式。 + + - 可通过 `dmesg | grep Galcore` 查询系统版本: + + ```shell + $ dmesg | grep Galcore + [ 24.140820] Galcore version 6.4.4.3.310723AAA + ``` + +- S905D3(Android 版本) + + - 需要驱动版本为 6.4.4.3(下载驱动请联系开发板厂商)。 + - 注意是 32 位系统。 + - `adb root + adb remount` 以获得修改系统库的权限。 + + ```shell + $ dmesg | grep Galcore + [ 9.020168] <6>[ 9.020168@0] Galcore version 6.4.4.3.310723a + ``` + + - 示例程序和 Paddle Lite 库的编译需要采用交叉编译方式,通过 `adb`或`ssh` 进行设备的交互和示例程序的运行。 + + +### 准备交叉编译环境 + +- 为了保证编译环境一致,建议参考[ Docker 环境准备](../source_compile/docker_environment)中的 Docker 开发环境进行配置; +- 由于有些设备只提供网络访问方式(根据开发版的实际情况),需要通过 `scp` 和 `ssh` 命令将交叉编译生成的Paddle Lite 库和示例程序传输到设备上执行,因此,在进入 Docker 容器后还需要安装如下软件: + + ``` + # apt-get install openssh-client sshpass + ``` + +### 运行图像分类示例程序 + +- 下载 Paddle Lite 通用示例程序[PaddleLite-generic-demo.tar.gz](https://paddlelite-demo.bj.bcebos.com/devices/generic/PaddleLite-generic-demo.tar.gz),解压后目录主体结构如下: + + ```shell + - PaddleLite-generic-demo + - image_classification_demo + - assets + - images + - tabby_cat.jpg # 测试图片 + - tabby_cat.raw # 经过 convert_to_raw_image.py 处理后的 RGB Raw 图像 + - labels + - synset_words.txt # 1000 分类 label 文件 + - models + - mobilenet_v1_int8_224_per_layer + - __model__ # Paddle fluid 模型组网文件,可使用 netron 查看网络结构 + — conv1_weights # Paddle fluid 模型参数文件 + - batch_norm_0.tmp_2.quant_dequant.scale # Paddle fluid 模型量化参数文件 + — subgraph_partition_config_file.txt # 自定义子图分割配置文件 + ... + - shell + - CMakeLists.txt # 示例程序 CMake 脚本 + - build.linux.arm64 # arm64 编译工作目录 + - image_classification_demo # 已编译好的,适用于 arm64 的示例程序 + - build.linux.armhf # armhf编译工作目录 + - image_classification_demo # 已编译好的,适用于 armhf 的示例程序 + - build.android.armeabi-v7a # Android armv7编译工作目录 + - image_classification_demo # 已编译好的,适用于 Android armv7 的示例程序 + ... + - image_classification_demo.cc # 示例程序源码 + - build.sh # 示例程序编译脚本 + - run.sh # 示例程序本地运行脚本 + - run_with_ssh.sh # 示例程序ssh运行脚本 + - run_with_adb.sh # 示例程序adb运行脚本 + - libs + - PaddleLite + - linux + - arm64 # Linux 64 位系统 + - include # Paddle Lite 头文件 + - lib # Paddle Lite 库文件 + - verisilicon_timvx # 芯原 TIM-VX DDK、NNAdapter 运行时库、device HAL 库 + - libnnadapter.so # NNAdapter 运行时库 + - libGAL.so # 芯原 DDK + - libVSC.so # 芯原 DDK + - libOpenVX.so # 芯原 DDK + - libarchmodelSw.so # 芯原 DDK + - libNNArchPerf.so # 芯原 DDK + - libOvx12VXCBinary.so # 芯原 DDK + - libNNVXCBinary.so # 芯原 DDK + - libOpenVXU.so # 芯原 DDK + - libNNGPUBinary.so # 芯原 DDK + - libovxlib.so # 芯原 DDK + - libOpenCL.so # OpenCL + - libverisilicon_timvx.so # # NNAdapter device HAL 库 + - libtim-vx.so # 芯原 TIM-VX + - libgomp.so.1 # gnuomp 库 + - libpaddle_full_api_shared.so # 预编译 PaddleLite full api 库 + - libpaddle_light_api_shared.so # 预编译 PaddleLite light api 库 + ... + - android + - armeabi-v7a # Android 32 位系统 + - include # Paddle Lite 头文件 + - lib # Paddle Lite 库文件 + - verisilicon_timvx # 芯原 TIM-VX DDK、NNAdapter 运行时库、device HAL 库 + - libnnadapter.so # NNAdapter 运行时库 + - libGAL.so # 芯原 DDK + - libVSC.so # 芯原 DDK + - libOpenVX.so # 芯原 DDK + - libarchmodelSw.so # 芯原 DDK + - libNNArchPerf.so # 芯原 DDK + - libOvx12VXCBinary.so # 芯原 DDK + - libNNVXCBinary.so # 芯原 DDK + - libOpenVXU.so # 芯原 DDK + - libNNGPUBinary.so # 芯原 DDK + - libovxlib.so # 芯原 DDK + - libOpenCL.so # OpenCL + - libverisilicon_timvx.so # # NNAdapter device HAL 库 + - libtim-vx.so # 芯原 TIM-VX + - libgomp.so.1 # gnuomp 库 + - libc++_shared.so + - libpaddle_full_api_shared.so # 预编译 Paddle Lite full api 库 + - libpaddle_light_api_shared.so # 预编译 Paddle Lite light api 库 + - OpenCV # OpenCV 预编译库 + - ssd_detection_demo # 基于 ssd 的目标检测示例程序 + ``` + +- 按照以下命令分别运行转换后的ARM CPU模型和 芯原 TIM-VX 模型,比较它们的性能和结果; + + ```shell + 注意: + 1)`run_with_adb.sh` 不能在 Docker 环境执行,否则可能无法找到设备,也不能在设备上运行。 + 2)`run_with_ssh.sh` 不能在设备上运行,且执行前需要配置目标设备的 IP 地址、SSH 账号和密码。 + 3)`build.sh` 根据入参生成针对不同操作系统、体系结构的二进制程序,需查阅注释信息配置正确的参数值。 + 4)`run_with_adb.sh` 入参包括模型名称、操作系统、体系结构、目标设备、设备序列号等,需查阅注释信息配置正确的参数值。 + 5)`run_with_ssh.sh` 入参包括模型名称、操作系统、体系结构、目标设备、ip地址、用户名、用户密码等,需查阅注释信息配置正确的参数值。 + 6)下述命令行示例中涉及的具体IP、SSH账号密码、设备序列号等均为示例环境,请用户根据自身实际设备环境修改。 + + 在 ARM CPU 上运行 mobilenet_v1_int8_224_per_layer 全量化模型 + $ cd PaddleLite-generic-demo/image_classification_demo/shell + + For A311D + $ ./run_with_ssh.sh mobilenet_v1_int8_224_per_layer linux arm64 cpu 192.168.100.30 22 khadas khadas + (A311D) + warmup: 1 repeat: 15, average: 81.678067 ms, max: 81.945999 ms, min: 81.591003 ms + results: 3 + Top0 Egyptian cat - 0.512545 + Top1 tabby, tabby cat - 0.402567 + Top2 tiger cat - 0.067904 + Preprocess time: 1.352000 ms + Prediction time: 81.678067 ms + Postprocess time: 0.407000 ms + + For S905D3(Android版) + $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer android armeabi-v7a cpu c8631471d5cd + (S905D3(Android版)) + warmup: 1 repeat: 5, average: 280.465997 ms, max: 358.815002 ms, min: 268.549812 ms + results: 3 + Top0 Egyptian cat - 0.512545 + Top1 tabby, tabby cat - 0.402567 + Top2 tiger cat - 0.067904 + Preprocess time: 3.199000 ms + Prediction time: 280.465997 ms + Postprocess time: 0.596000 ms + + ------------------------------ + + 在 芯原 NPU 上运行 mobilenet_v1_int8_224_per_layer 全量化模型 + $ cd PaddleLite-generic-demo/image_classification_demo/shell + + For A311D + $ ./run_with_ssh.sh mobilenet_v1_int8_224_per_layer linux arm64 verisilicon_timvx 192.168.100.30 22 khadas khadas + (A311D) + warmup: 1 repeat: 15, average: 5.112500 ms, max: 5.223000 ms, min: 5.009130 ms + results: 3 + Top0 Egyptian cat - 0.508929 + Top1 tabby, tabby cat - 0.415333 + Top2 tiger cat - 0.064347 + Preprocess time: 1.356000 ms + Prediction time: 5.112500 ms + Postprocess time: 0.411000 ms + + For S905D3(Android版) + $ ./run_with_adb.sh mobilenet_v1_int8_224_per_layer android armeabi-v7a verisilicon_timvx c8631471d5cd + (S905D3(Android版)) + warmup: 1 repeat: 5, average: 13.4116 ms, max: 14.7615 ms, min: 12.80810 ms + results: 3 + Top0 Egyptian cat - 0.508929 + Top1 tabby, tabby cat - 0.415333 + Top2 tiger cat - 0.064347 + Preprocess time: 3.170000 ms + Prediction time: 13.4116 ms + Postprocess time: 0.634000 ms + ``` + +- 如果需要更改测试图片,可将图片拷贝到 `PaddleLite-generic-demo/image_classification_demo/assets/images` 目录下,然后调用 `convert_to_raw_image.py` 生成相应的 RGB Raw 图像,最后修改 `run_with_adb.sh`、`run_with_ssh.sh` 的 IMAGE_NAME 变量即可; +- 重新编译示例程序: + ```shell + 注意: + 1)请根据 `buid.sh` 配置正确的参数值。 + 2)需在 Docker 环境中编译。 + + # 对于 A311D + ./build.sh linux arm64 + + # 对于 S905D3(Android版) + ./build.sh android armeabi-v7a + ``` + +### 更新模型 +- 通过 Paddle 训练或 X2Paddle 转换得到 MobileNetv1 foat32 模型[ mobilenet_v1_fp32_224 ](https://paddlelite-demo.bj.bcebos.com/models/mobilenet_v1_fp32_224_fluid.tar.gz) +- 通过 Paddle+PaddleSlim 后量化方式,生成[ mobilenet_v1_int8_224_per_layer 量化模型](https://paddlelite-demo.bj.bcebos.com/devices/rockchip/mobilenet_v1_int8_224_fluid.tar.gz) +- 下载[ PaddleSlim-quant-demo.tar.gz ](https://paddlelite-demo.bj.bcebos.com/tools/PaddleSlim-quant-demo.tar.gz),解压后清单如下: + ```shell + - PaddleSlim-quant-demo + - image_classification_demo + - quant_post # 后量化 + - quant_post_rockchip_npu.sh # 一键量化脚本,Amlogic 和瑞芯微底层都使用芯原的 NPU,所以通用 + - README.md # 环境配置说明,涉及 PaddlePaddle、PaddleSlim 的版本选择、编译和安装步骤 + - datasets # 量化所需要的校准数据集合 + - ILSVRC2012_val_100 # 从 ImageNet2012 验证集挑选的 100 张图片 + - inputs # 待量化的 fp32 模型 + - mobilenet_v1 + - resnet50 + - outputs # 产出的全量化模型 + - scripts # 后量化内置脚本 + ``` +- 查看 `README.md` 完成 PaddlePaddle 和 PaddleSlim 的安装 +- 直接执行 `./quant_post_rockchip_npu.sh` 即可在 `outputs` 目录下生成mobilenet_v1_int8_224_per_layer 量化模型 + ```shell + ----------- Configuration Arguments ----------- + activation_bits: 8 + activation_quantize_type: moving_average_abs_max + algo: KL + batch_nums: 10 + batch_size: 10 + data_dir: ../dataset/ILSVRC2012_val_100 + is_full_quantize: 1 + is_use_cache_file: 0 + model_path: ../models/mobilenet_v1 + optimize_model: 1 + output_path: ../outputs/mobilenet_v1 + quantizable_op_type: conv2d,depthwise_conv2d,mul + use_gpu: 0 + use_slim: 1 + weight_bits: 8 + weight_quantize_type: abs_max + ------------------------------------------------ + quantizable_op_type:['conv2d', 'depthwise_conv2d', 'mul'] + 2021-08-30 05:52:10,048-INFO: Load model and set data loader ... + 2021-08-30 05:52:10,129-INFO: Optimize FP32 model ... + I0830 05:52:10.139564 14447 graph_pattern_detector.cc:91] --- detected 14 subgraphs + I0830 05:52:10.148236 14447 graph_pattern_detector.cc:91] --- detected 13 subgraphs + 2021-08-30 05:52:10,167-INFO: Collect quantized variable names ... + 2021-08-30 05:52:10,168-WARNING: feed is not supported for quantization. + 2021-08-30 05:52:10,169-WARNING: fetch is not supported for quantization. + 2021-08-30 05:52:10,170-INFO: Preparation stage ... + 2021-08-30 05:52:11,853-INFO: Run batch: 0 + 2021-08-30 05:52:16,963-INFO: Run batch: 5 + 2021-08-30 05:52:21,037-INFO: Finish preparation stage, all batch:10 + 2021-08-30 05:52:21,048-INFO: Sampling stage ... + 2021-08-30 05:52:31,800-INFO: Run batch: 0 + 2021-08-30 05:53:23,443-INFO: Run batch: 5 + 2021-08-30 05:54:03,773-INFO: Finish sampling stage, all batch: 10 + 2021-08-30 05:54:03,774-INFO: Calculate KL threshold ... + 2021-08-30 05:54:28,580-INFO: Update the program ... + 2021-08-30 05:54:29,194-INFO: The quantized model is saved in ../outputs/mobilenet_v1 + post training quantization finish, and it takes 139.42292165756226. + + ----------- Configuration Arguments ----------- + batch_size: 20 + class_dim: 1000 + data_dir: ../dataset/ILSVRC2012_val_100 + image_shape: 3,224,224 + inference_model: ../outputs/mobilenet_v1 + input_img_save_path: ./img_txt + save_input_img: False + test_samples: -1 + use_gpu: 0 + ------------------------------------------------ + Testbatch 0, acc1 0.8, acc5 1.0, time 1.63 sec + End test: test_acc1 0.76, test_acc5 0.92 + --------finish eval int8 model: mobilenet_v1------------- + ``` + - 参考[模型转化方法](../user_guides/model_optimize_tool),利用 opt 工具转换生成 TIM-VX 模型,仅需要将 `valid_targets` 设置为 `verisilicon_timvx`, `arm` 即可。 + ```shell + $ ./opt --model_dir=mobilenet_v1_int8_224_per_layer \ + --optimize_out_type=naive_buffer \ + --optimize_out=opt_model \ + --valid_targets=verisilicon_timvx,arm + ``` +### 更新支持 TIM-VX 的 Paddle Lite 库 + +- 下载 Paddle Lite 源码 + + ```shell + $ git clone https://github.com/PaddlePaddle/Paddle-Lite.git + $ cd Paddle-Lite + $ git checkout + # 注意:编译中依赖的 verisilicon_timvx 相关代码和依赖项会在后续编译脚本中自动下载,无需用户手动下载。 + ``` + +- 编译并生成 `Paddle Lite+Verisilicon_TIMVX` 的部署库 + + - For A311D + - tiny_publish 编译方式 + ```shell + $ ./lite/tools/build_linux.sh --with_extra=ON --with_log=ON --with_nnadapter=ON --nnadapter_with_verisilicon_timvx=ON --nnadapter_verisilicon_timvx_src_git_tag=main --nnadapter_verisilicon_timvx_viv_sdk_url=http://paddlelite-demo.bj.bcebos.com/devices/verisilicon/sdk/viv_sdk_linux_arm64_6_4_4_3_generic.tgz + + ``` + - full_publish 编译方式 + ```shell + $ ./lite/tools/build_linux.sh --with_extra=ON --with_log=ON --with_nnadapter=ON --nnadapter_with_verisilicon_timvx=ON --nnadapter_verisilicon_timvx_src_git_tag=main --nnadapter_verisilicon_timvx_viv_sdk_url=http://paddlelite-demo.bj.bcebos.com/devices/verisilicon/sdk/viv_sdk_linux_arm64_6_4_4_3_generic.tgz full_publish + + ``` + - 替换头文件和库 + ```shell + # 替换 include 目录 + $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/include/ PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/include/ + # 替换 NNAdapter 运行时库 + $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libnnadapter.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/verisilicon_timvx/ + # 替换 NNAdapter device HAL 库 + $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libverisilicon_timvx.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/verisilicon_timvx/ + # 替换 芯原 TIM-VX 库 + $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libtim-vx.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/verisilicon_timvx/ + # 替换 libpaddle_light_api_shared.so + $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libpaddle_light_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/ + # 替换 libpaddle_full_api_shared.so (仅在 full_publish 编译方式下) + $ cp -rf build.lite.linux.armv8.gcc/inference_lite_lib.armlinux.armv8.nnadapter/cxx/lib/libpaddle_full_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/linux/arm64/lib/ + ``` + + - S905D3(Android 版) + - tiny_publish 编译方式 + ```shell + $ ./lite/tools/build_android.sh --arch=armv7 --toolchain=clang --android_stl=c++_shared --with_extra=ON --with_exception=ON --with_cv=ON --with_log=ON --with_nnadapter=ON --nnadapter_with_verisilicon_timvx=ON --nnadapter_verisilicon_timvx_src_git_tag=main --nnadapter_verisilicon_timvx_viv_sdk_url=http://paddlelite-demo.bj.bcebos.com/devices/verisilicon/sdk/viv_sdk_android_9_armeabi_v7a_6_4_4_3_generic.tgz + ``` + + - full_publish 编译方式 + ```shell + $ ./lite/tools/build_android.sh --arch=armv7 --toolchain=clang --android_stl=c++_shared --with_extra=ON --with_exception=ON --with_cv=ON --with_log=ON --with_nnadapter=ON --nnadapter_with_verisilicon_timvx=ON --nnadapter_verisilicon_timvx_src_git_tag=main --nnadapter_verisilicon_timvx_viv_sdk_url=http://paddlelite-demo.bj.bcebos.com/devices/verisilicon/sdk/viv_sdk_android_9_armeabi_v7a_6_4_4_3_generic.tgz full_publish + ``` + - 替换头文件和库 + ```shell + # 替换 include 目录 + $ cp -rf build.lite.android.armv7.clang/inference_lite_lib.android.armv7.nnadapter/cxx/include/ PaddleLite-generic-demo/libs/PaddleLite/linux/armhf/include/ + # 替换 NNAdapter 运行时库 + $ cp -rf build.lite.android.armv7.clang/inference_lite_lib.android.armv7.nnadapter/cxx/lib/libnnadapter.so PaddleLite-generic-demo/libs/PaddleLite/android/armeabi-v7a/lib/verisilicon_timvx/ + # 替换 NNAdapter device HAL 库 + $ cp -rf build.lite.android.armv7.clang/inference_lite_lib.android.armv7.nnadapter/cxx/lib/libverisilicon_timvx.so PaddleLite-generic-demo/libs/PaddleLite/android/armeabi-v7a/lib/verisilicon_timvx/ + # 替换 芯原 TIM-VX 库 + $ cp -rf build.lite.android.armv7.clang/inference_lite_lib.android.armv7.nnadapter/cxx/lib/libtim-vx.so PaddleLite-generic-demo/libs/PaddleLite/android/armeabi-v7a/lib/verisilicon_timvx/ + # 替换 libpaddle_light_api_shared.so + $ cp -rf build.lite.android.armv7.clang/inference_lite_lib.android.armv7.nnadapter/cxx/lib/libpaddle_light_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/android/armeabi-v7a/lib/ + # 替换 libpaddle_full_api_shared.so(仅在 full_publish 编译方式下) + $ cp -rf build.lite.android.armv7.clang/inference_lite_lib.android.armv7.nnadapter/cxx/lib/libpaddle_full_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/android/armeabi-v7a/lib/ + ``` + +- 替换头文件后需要重新编译示例程序 + +## 其它说明 + +- Paddle Lite 研发团队正在持续扩展基于TIM-VX的算子和模型。 diff --git a/docs/index.rst b/docs/index.rst index 37a923d3a1d..7bc950b092a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -78,6 +78,7 @@ Welcome to Paddle-Lite's documentation! demo_guides/baidu_xpu demo_guides/rockchip_npu demo_guides/amlogic_npu + demo_guides/verisilicon_timvx demo_guides/mediatek_apu demo_guides/imagination_nna demo_guides/bitmain diff --git a/docs/performance/benchmark.md b/docs/performance/benchmark.md index 60f13c0c2e2..db19374b1f3 100644 --- a/docs/performance/benchmark.md +++ b/docs/performance/benchmark.md @@ -137,7 +137,10 @@ 请参考 [Paddle Lite 使用瑞芯微 NPU 预测部署](../demo_guides/rockchip_npu)的最新性能数据 ## 晶晨 NPU 的性能数据 -请参考 [Paddle Lite 使用晶晨NPU 预测部署](../demo_guides/amlogic_npu)的最新性能数据 +请参考 [Paddle Lite 使用晶晨 NPU 预测部署](../demo_guides/amlogic_npu)的最新性能数据 + +## 芯原 TIM-VX 的性能数据 +请参考 [Paddle Lite 使用芯原 TIM-VX 预测部署](../demo_guides/verisilicon_timvx)的最新性能数据 ## 联发科 APU 的性能数据 请参考 [Paddle Lite 使用联发科 APU 预测部署](../demo_guides/mediatek_apu)的最新性能数据 diff --git a/docs/quick_start/support_model_list.md b/docs/quick_start/support_model_list.md index fd24a5b5d78..5c65c3176ff 100644 --- a/docs/quick_start/support_model_list.md +++ b/docs/quick_start/support_model_list.md @@ -6,7 +6,7 @@ | 类别 | 类别细分 | 模型 | 支持平台 | |-|-|:-|:-| -| CV | 分类 | [MobileNetV1](https://paddlelite-demo.bj.bcebos.com/models/mobilenet_v1_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, RockchipNPU, MediatekAPU, KunlunxinXPU, HuaweiAscendNPU | +| CV | 分类 | [MobileNetV1](https://paddlelite-demo.bj.bcebos.com/models/mobilenet_v1_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, RockchipNPU, MediatekAPU, KunlunxinXPU, HuaweiAscendNPU, VerisiliconTIMVX | | CV | 分类 | [MobileNetV2](https://paddlelite-demo.bj.bcebos.com/models/mobilenet_v2_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, KunlunxinXPU, HuaweiAscendNPU | | CV | 分类 | [MobileNetV3_large](https://paddle-inference-dist.bj.bcebos.com/AI-Rank/mobile/MobileNetV3_large_x1_0.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiAscendNPU | | CV | 分类 | [MobileNetV3_small](https://paddle-inference-dist.bj.bcebos.com/AI-Rank/mobile/MobileNetV3_small_x1_0.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiAscendNPU | @@ -19,8 +19,8 @@ | CV | 分类 | HRNet_W18_C | ARM, X86 | | CV | 分类 | RegNetX_4GF | ARM, X86 | | CV | 分类 | Xception41 | ARM, X86 | -| CV | 分类 | [ResNet18](https://paddlelite-demo.bj.bcebos.com/models/resnet18_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, RockchipNPU, KunlunxinXPU, HuaweiAscendNPU | -| CV | 分类 | [ResNet50](https://paddlelite-demo.bj.bcebos.com/models/resnet50_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, RockchipNPU, KunlunxinXPU, HuaweiAscendNPU | +| CV | 分类 | [ResNet18](https://paddlelite-demo.bj.bcebos.com/models/resnet18_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, RockchipNPU, KunlunxinXPU, HuaweiAscendNPU, VerisiliconTIMVX | +| CV | 分类 | [ResNet50](https://paddlelite-demo.bj.bcebos.com/models/resnet50_fp32_224_fluid.tar.gz) | ARM, X86, GPU(OPENCL,METAL), HuaweiKirinNPU, RockchipNPU, KunlunxinXPU, HuaweiAscendNPU, VerisiliconTIMVX | | CV | 分类 | [ResNet101](https://paddlelite-demo.bj.bcebos.com/NNAdapter/models/PaddleClas/ResNet101.tgz) | ARM, X86, HuaweiKirinNPU, RockchipNPU, KunlunxinXPU, HuaweiAscendNPU | | CV | 分类 | [ResNeXt50](https://paddlelite-demo.bj.bcebos.com/NNAdapter/models/PaddleClas/ResNeXt50_32x4d.tgz) | ARM, X86, HuaweiAscendNPU | | CV | 分类 | [MnasNet](https://paddlelite-demo.bj.bcebos.com/models/mnasnet_fp32_224_fluid.tar.gz)| ARM, HuaweiKirinNPU, HuaweiAscendNPU | @@ -32,7 +32,7 @@ | CV | 分类 | VGG16 | ARM, X86, GPU(OPENCL), KunlunxinXPU, HuaweiAscendNPU | | CV | 分类 | VGG19 | ARM, X86, GPU(OPENCL,METAL), KunlunxinXPU, HuaweiAscendNPU| | CV | 分类 | GoogleNet | ARM, X86, KunlunxinXPU, HuaweiAscendNPU | -| CV | 检测 | [SSD-MobileNetV1](https://paddlelite-demo.bj.bcebos.com/models/ssd_mobilenet_v1_pascalvoc_fp32_300_fluid.tar.gz) | ARM, HuaweiKirinNPU*, HuaweiAscendNPU* | +| CV | 检测 | [SSD-MobileNetV1](https://paddlelite-demo.bj.bcebos.com/models/ssd_mobilenet_v1_pascalvoc_fp32_300_fluid.tar.gz) | ARM, HuaweiKirinNPU*, HuaweiAscendNPU*, VerisiliconTIMVX* | | CV | 检测 | [SSD-MobileNetV3-large](https://paddle-inference-dist.bj.bcebos.com/AI-Rank/mobile/ssdlite_mobilenet_v3_large.tar.gz) | ARM, X86, GPU(OPENCL,METAL) | | CV | 检测 | [SSD-VGG16](https://paddlelite-demo.bj.bcebos.com/NNAdapter/models/PaddleDetection/ssd_vgg16_300_240e_voc.tgz) | ARM, X86, HuaweiAscendNPU* | | CV | 检测 | [YOLOv3-DarkNet53](https://paddlelite-demo.bj.bcebos.com/NNAdapter/models/PaddleDetection/yolov3_darknet53_270e_coco.tgz) | ARM, X86, HuaweiAscendNPU* | diff --git a/docs/quick_start/support_operation_list.md b/docs/quick_start/support_operation_list.md index a85d9b708ff..2c98239f2a6 100644 --- a/docs/quick_start/support_operation_list.md +++ b/docs/quick_start/support_operation_list.md @@ -10,99 +10,99 @@ Host 端 Kernel 是算子在任意 CPU 上纯 C/C++ 的具体实现,具有可 以 ARM CPU 为例,如果模型中某个算子没有 ARM 端 Kernel,但是有 Host 端 Kernel,那么模型优化阶段该算子会选择 Host 端 Kernel,该模型还是可以顺利部署。 -| OP_name| ARM | OpenCL | Metal | 昆仑芯XPU | Host | X86 | 比特大陆 | 英特尔FPGA | 寒武纪mlu | 华为昇腾NPU | 联发科APU | 瑞芯微NPU | 华为麒麟NPU | 颖脉NNA | 晶晨NPU | -|-:|-| -| -| -| -| -| -| -| -| -| -| -| -| -| -| -| affine_channel|Y| | | | | | | | | | | | | | | -| affine_grid|Y| | | | | | | | | | | | | | | -| arg_max|Y|Y| |Y|Y| | | |Y|Y| | | | | | -| assign_value| | | |Y|Y| |Y| | |Y| | | | | | -| batch_norm|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| bilinear_interp|Y|Y|Y|Y| |Y|Y| | |Y| | | | | | -| bilinear_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | | -| box_coder|Y|Y|Y|Y|Y|Y|Y| | | | | | | | | -| calib|Y| | |Y| | | | |Y| | | | | | | -| cast| | | |Y|Y| |Y| |Y|Y| | | | | | -| concat|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| -| conv2d_transpose|Y|Y|Y|Y| |Y|Y| | |Y| | | | |Y| -| density_prior_box| | | |Y|Y|Y|Y| | | | | | | | | -| depthwise_conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| -| depthwise_conv2d_transpose| |Y| | | | |Y| | | | | | | | | -| dropout|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| elementwise_add|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| elementwise_div|Y| | |Y| |Y|Y| | |Y|Y|Y|Y| |Y| -| elementwise_floordiv|Y| | | | |Y| | | | | | | | | | -| elementwise_max|Y| | |Y| |Y| | | |Y| | | | | | -| elementwise_min|Y| | | | |Y| | | |Y| | | | | | -| elementwise_mod|Y| | | | |Y| | | | | | | | | | -| elementwise_mul|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| elementwise_pow|Y| | | | |Y| | | |Y| | | | | | -| elementwise_sub|Y|Y|Y|Y| |Y|Y| | |Y|Y|Y|Y| |Y| -| elu|Y| | | |Y| | | | | | | | | | | -| erf|Y| | | | | | | | | | | | | | | -| expand| |Y| | |Y| | | | | | | | | | | -| expand_as| | | | |Y| | | | | | | | | | | -| fc|Y|Y|Y| | |Y| | |Y|Y|Y|Y|Y|Y|Y| -| feed| | |Y| |Y| | | | | | | | | | | -| fetch| | |Y| |Y| | | | | | | | | | | -| fill_constant| | | |Y|Y| |Y| | |Y| | | | | | -| fill_constant_batch_size_like| | | |Y|Y| | | | | | | | | | | -| flatten| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| flatten2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| flatten_contiguous_range| | | |Y|Y| | | | |Y|Y|Y|Y| |Y| -|fusion_elementwise_add_activation|Y|Y|Y| | |Y| | | |Y|Y|Y|Y| |Y| -|fusion_elementwise_div_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| -|fusion_elementwise_max_activation|Y| | | | |Y| | | |Y| | | | | | -|fusion_elementwise_min_activation|Y| | | | |Y| | | |Y| | | | | | -|fusion_elementwise_mul_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| -|fusion_elementwise_pow_activation| | | | | | | | | |Y| | | | | | -|fusion_elementwise_sub_activation|Y|Y| | | |Y| | | |Y|Y|Y|Y| |Y| -| grid_sampler|Y|Y| |Y| |Y| | | | | | | | | | -| instance_norm|Y|Y| |Y| |Y| | | |Y| | | | | | -| io_copy| |Y|Y|Y| | | | |Y| | | | | | | -| io_copy_once| |Y|Y|Y| | | | | | | | | | | | -| layout|Y|Y| | | |Y| | |Y| | | | | | | -| layout_once|Y|Y| | | | | | | | | | | | | | -| leaky_relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y| | | | | | -| lod_array_length| | | | |Y| | | | | | | | | | | -| matmul|Y|Y|Y|Y| |Y|Y| | |Y| | | | | | -| mul|Y| | |Y| |Y|Y| | | | | | | | | -| multiclass_nms|Y| | | |Y| |Y| | | | | | | | | -| multiclass_nms2|Y| | | |Y| |Y| | | | | | | | | -| multiclass_nms3|Y| | | |Y| | | | | | | | | | | -| nearest_interp|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| nearest_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | | -| pad2d|Y|Y|Y|Y|Y| | | | |Y| | | | | | -| pool2d|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y| -| prelu|Y|Y| |Y|Y| | | | |Y| | | | | | -| prior_box|Y| | |Y|Y| |Y| | | | | | | | | -| range| | | | |Y| | | | |Y| | | | | | -| reduce_mean|Y|Y| |Y| |Y|Y| | |Y| | | | | | -| relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y| -| relu6|Y|Y|Y|Y|Y|Y| | |Y|Y|Y|Y|Y|Y|Y| -| reshape| |Y| |Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| reshape2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| scale|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| search_fc| | | |Y| | | | | | | | | | | | -| sequence_topk_avg_pooling| | | |Y| |Y| | | | | | | | | | -| shuffle_channel|Y|Y|Y| |Y| | | | | | | | | | | -| sigmoid|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y| |Y| -| slice|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| softmax|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y| -| softplus|Y| | | | | | | | | | | | | | | -| squeeze| |Y| |Y|Y| |Y| |Y|Y| | | | | | -| squeeze2| |Y| |Y|Y| |Y| |Y|Y| | | | | | -| stack| | | |Y|Y|Y| | | |Y| | | | | | -| subgraph| | | | | | | | |Y| | | | | | | -| sync_batch_norm|Y|Y| | | |Y| | | | | | | | | | -| tanh|Y|Y| |Y|Y|Y| | | |Y|Y|Y|Y| |Y| -| thresholded_relu|Y| | | |Y| | | | | | | | | | | -| transpose|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| transpose2|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| unsqueeze| |Y| |Y|Y| | | | |Y| | | | | | -| unsqueeze2| |Y| |Y|Y| | | | |Y| | | | | | -| write_back| | | | |Y| | | | | | | | | | | -| yolo_box|Y|Y|Y|Y|Y| |Y| | | | | | | | | +| OP_name| ARM | OpenCL | Metal | 昆仑芯XPU | Host | X86 | 比特大陆 | 英特尔FPGA | 寒武纪mlu | 华为昇腾NPU | 联发科APU | 瑞芯微NPU | 华为麒麟NPU | 颖脉NNA | 晶晨NPU | 芯原TIM-VX | +|-:|-| -| -| -| -| -| -| -| -| -| -| -| -| -| -| -| +| affine_channel|Y| | | | | | | | | | | | | | | | +| affine_grid|Y| | | | | | | | | | | | | | | | +| arg_max|Y|Y| |Y|Y| | | |Y|Y| | | | | | | +| assign_value| | | |Y|Y| |Y| | |Y| | | | | | | +| batch_norm|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| bilinear_interp|Y|Y|Y|Y| |Y|Y| | |Y| | | | | |Y| +| bilinear_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | |Y| +| box_coder|Y|Y|Y|Y|Y|Y|Y| | | | | | | | | | +| calib|Y| | |Y| | | | |Y| | | | | | | | +| cast| | | |Y|Y| |Y| |Y|Y| | | | | | | +| concat|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| +| conv2d_transpose|Y|Y|Y|Y| |Y|Y| | |Y| | | | |Y|Y| +| density_prior_box| | | |Y|Y|Y|Y| | | | | | | | | | +| depthwise_conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| +| depthwise_conv2d_transpose| |Y| | | | |Y| | | | | | | | | | +| dropout|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| elementwise_add|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| elementwise_div|Y| | |Y| |Y|Y| | |Y|Y|Y|Y| |Y|Y| +| elementwise_floordiv|Y| | | | |Y| | | | | | | | | | | +| elementwise_max|Y| | |Y| |Y| | | |Y| | | | | |Y| +| elementwise_min|Y| | | | |Y| | | |Y| | | | | |Y| +| elementwise_mod|Y| | | | |Y| | | | | | | | | | | +| elementwise_mul|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| elementwise_pow|Y| | | | |Y| | | |Y| | | | | |Y| +| elementwise_sub|Y|Y|Y|Y| |Y|Y| | |Y|Y|Y|Y| |Y|Y| +| elu|Y| | | |Y| | | | | | | | | | | | +| erf|Y| | | | | | | | | | | | | | | | +| expand| |Y| | |Y| | | | | | | | | | | | +| expand_as| | | | |Y| | | | | | | | | | | | +| fc|Y|Y|Y| | |Y| | |Y|Y|Y|Y|Y|Y|Y|Y| +| feed| | |Y| |Y| | | | | | | | | | | | +| fetch| | |Y| |Y| | | | | | | | | | | | +| fill_constant| | | |Y|Y| |Y| | |Y| | | | | | | +| fill_constant_batch_size_like| | | |Y|Y| | | | | | | | | | |Y| +| flatten| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| flatten2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| flatten_contiguous_range| | | |Y|Y| | | | |Y|Y|Y|Y| |Y|Y| +|fusion_elementwise_add_activation|Y|Y|Y| | |Y| | | |Y|Y|Y|Y| |Y| | +|fusion_elementwise_div_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| | +|fusion_elementwise_max_activation|Y| | | | |Y| | | |Y| | | | | | | +|fusion_elementwise_min_activation|Y| | | | |Y| | | |Y| | | | | | | +|fusion_elementwise_mul_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| | +|fusion_elementwise_pow_activation| | | | | | | | | |Y| | | | | | | +|fusion_elementwise_sub_activation|Y|Y| | | |Y| | | |Y|Y|Y|Y| |Y| | +| grid_sampler|Y|Y| |Y| |Y| | | | | | | | | | | +| instance_norm|Y|Y| |Y| |Y| | | |Y| | | | | | | +| io_copy| |Y|Y|Y| | | | |Y| | | | | | | | +| io_copy_once| |Y|Y|Y| | | | | | | | | | | | | +| layout|Y|Y| | | |Y| | |Y| | | | | | | | +| layout_once|Y|Y| | | | | | | | | | | | | | | +| leaky_relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y| | | | | |Y| +| lod_array_length| | | | |Y| | | | | | | | | | | | +| matmul|Y|Y|Y|Y| |Y|Y| | |Y| | | | | |Y| +| mul|Y| | |Y| |Y|Y| | | | | | | | | | +| multiclass_nms|Y| | | |Y| |Y| | | | | | | | | | +| multiclass_nms2|Y| | | |Y| |Y| | | | | | | | | | +| multiclass_nms3|Y| | | |Y| | | | | | | | | | | | +| nearest_interp|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| nearest_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | |Y| +| pad2d|Y|Y|Y|Y|Y| | | | |Y| | | | | | | +| pool2d|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y| +| prelu|Y|Y| |Y|Y| | | | |Y| | | | | | | +| prior_box|Y| | |Y|Y| |Y| | | | | | | | | | +| range| | | | |Y| | | | |Y| | | | | | | +| reduce_mean|Y|Y| |Y| |Y|Y| | |Y| | | | | | | +| relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y| +| relu6|Y|Y|Y|Y|Y|Y| | |Y|Y|Y|Y|Y|Y|Y|Y| +| reshape| |Y| |Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| reshape2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| scale|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| search_fc| | | |Y| | | | | | | | | | | | | +| sequence_topk_avg_pooling| | | |Y| |Y| | | | | | | | | | | +| shuffle_channel|Y|Y|Y| |Y| | | | | | | | | | |Y| +| sigmoid|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| slice|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| softmax|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y| +| softplus|Y| | | | | | | | | | | | | | | | +| squeeze| |Y| |Y|Y| |Y| |Y|Y| | | | | |Y| +| squeeze2| |Y| |Y|Y| |Y| |Y|Y| | | | | |Y| +| stack| | | |Y|Y|Y| | | |Y| | | | | | | +| subgraph| | | | | | | | |Y| | | | | | | | +| sync_batch_norm|Y|Y| | | |Y| | | | | | | | | | | +| tanh|Y|Y| |Y|Y|Y| | | |Y|Y|Y|Y| |Y| | +| thresholded_relu|Y| | | |Y| | | | | | | | | | | | +| transpose|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| transpose2|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| unsqueeze| |Y| |Y|Y| | | | |Y| | | | | |Y| +| unsqueeze2| |Y| |Y|Y| | | | |Y| | | | | |Y| +| write_back| | | | |Y| | | | | | | | | | | | +| yolo_box|Y|Y|Y|Y|Y| |Y| | | | | | | | | | ### 附加算子 @@ -110,273 +110,273 @@ Host 端 Kernel 是算子在任意 CPU 上纯 C/C++ 的具体实现,具有可 加上附加算子共计 269 个,需要在编译时打开 `--with_extra=ON` 开关才会编译,具体请参考[参数详情](../source_compile/compile_options)。 -| OP_name| ARM | OpenCL | Metal | 昆仑芯XPU | Host | X86 | 比特大陆 | 英特尔FPGA | 寒武纪mlu | 华为昇腾NPU | 联发科APU | 瑞芯微NPU | 华为麒麟NPU | 颖脉NNA | 晶晨NPU | -|-:|-| -| -| -| -| -| -| -| -| -| -| -| -| -| -| -| abs|Y|Y| |Y|Y| | | | |Y| | | | | | -| affine_channel|Y| | | | | | | | | | | | | | | -| affine_grid|Y| | | | | | | | | | | | | | | -| anchor_generator| | | |Y|Y| | | | | | | | | | | -| arg_max|Y|Y| |Y|Y| | | |Y|Y| | | | | | -| arg_min| | | | | | | | | |Y| | | | | | -| argsort| | | | |Y| | | | | | | | | | | -| assign| | | |Y|Y| | | | |Y| | | | | | -| assign_value| | | |Y|Y| |Y| | |Y| | | | | | -| attention_padding_mask| | | | | | | | | | | | | | | | -| axpy|Y| | | | | | | | | | | | | | | -| batch_norm|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| beam_search| | | | |Y| | | | | | | | | | | -| beam_search_decode| | | | |Y| | | | | | | | | | | -| bilinear_interp|Y|Y|Y|Y| |Y|Y| | |Y| | | | | | -| bilinear_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | | -| bmm| | | |Y| | | | | | | | | | | | -| box_clip| | | |Y|Y| | | | | | | | | | | -| box_coder|Y|Y|Y|Y|Y|Y|Y| | | | | | | | | -| calib|Y| | |Y| | | | |Y| | | | | | | -| calib_once|Y| | |Y| | | | |Y| | | | | | | -| cast| | | |Y|Y| |Y| |Y|Y| | | | | | -| clip|Y|Y| |Y| |Y| | | |Y| | | | | | -| collect_fpn_proposals| | | | |Y| | | | | | | | | | | -| concat|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| conditional_block| | | | |Y| | | | | | | | | | | -| conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| -| conv2d_transpose|Y|Y|Y|Y| |Y|Y| | |Y| | | | |Y| -| correlation| | | |Y|Y| | | | | | | | | | | -| cos| |Y| | |Y| | | | | | | | | | | -| cos_sim| | | | |Y| | | | | | | | | | | -| crf_decoding| | | | |Y| | | | | | | | | | | -| crop| | | | |Y| | | | | | | | | | | -| crop_tensor| | | | |Y| | | | | | | | | | | -| ctc_align| | | | |Y| | | | | | | | | | | -| cumsum| | | | |Y| | | | |Y| | | | | | -| decode_bboxes|Y| | | | | | | | | | | | | | | -| deformable_conv|Y| | | |Y| | | | |Y| | | | | | -| density_prior_box| | | |Y|Y|Y|Y| | | | | | | | | -| depthwise_conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| -| depthwise_conv2d_transpose| |Y| | | | |Y| | | | | | | | | -| distribute_fpn_proposals| | | | |Y| | | | | | | | | | | -| dropout|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| elementwise_add|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| elementwise_div|Y| | |Y| |Y|Y| | |Y|Y|Y|Y| |Y| -| elementwise_floordiv|Y| | | | |Y| | | | | | | | | | -| elementwise_max|Y| | |Y| |Y| | | |Y| | | | | | -| elementwise_min|Y| | | | |Y| | | |Y| | | | | | -| elementwise_mod|Y| | | | |Y| | | | | | | | | | -| elementwise_mul|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| elementwise_pow|Y| | | | |Y| | | |Y| | | | | | -| elementwise_sub|Y|Y|Y|Y| |Y|Y| | |Y|Y|Y|Y| |Y| -| elu|Y| | | |Y| | | | | | | | | | | -| equal| | | | |Y| | | | |Y| | | | | | -| erf|Y| | | | | | | | | | | | | | | -| exp|Y|Y|Y|Y|Y| | | | |Y| | | | | | -| expand| |Y| | |Y| | | | | | | | | | | -| expand_as| | | | |Y| | | | | | | | | | | -| expand_v2| | | |Y|Y| | | | |Y| | | | | | -| fake_channel_wise_dequantize_max_abs| | | | | | | | | | | | | | | | -| fake_channel_wise_quantize_dequantize_abs_max| | | | | | | | | | | | | | | | -| fake_dequantize_max_abs| | | | | | | | | | | | | | | | -| fake_quantize_abs_max| | | | | | | | | | | | | | | | -| fake_quantize_dequantize_abs_max| | | | | | | | | | | | | | | | -|fake_quantize_dequantize_moving_average_abs_max| | | | | | | | | | | | | | | | -| fake_quantize_moving_average_abs_max| | | | | | | | | | | | | | | | -| fake_quantize_range_abs_max| | | | | | | | | | | | | | | | -| fc|Y|Y|Y| | |Y| | |Y|Y|Y|Y|Y|Y|Y| -| feed| | |Y| |Y| | | | | | | | | | | -| fetch| | |Y| |Y| | | | | | | | | | | -| fill_any_like| | | |Y|Y| | | | |Y| | | | | | -| fill_constant| | | |Y|Y| |Y| | |Y| | | | | | -| fill_constant_batch_size_like| | | |Y|Y| | | | | | | | | | | -| fill_zeros_like| | | |Y|Y| | | | | | | | | | | -| flatten| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| flatten2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| flatten_contiguous_range| | | |Y|Y| | | | |Y|Y|Y|Y| |Y| -| flip| | | | |Y| | | | | | | | | | | -| floor|Y| | | |Y| | | | | | | | | | | -| fusion_elementwise_add_activation|Y|Y|Y| | |Y| | | |Y|Y|Y|Y| |Y| -| fusion_elementwise_div_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| -| fusion_elementwise_max_activation|Y| | | | |Y| | | |Y| | | | | | -| fusion_elementwise_min_activation|Y| | | | |Y| | | |Y| | | | | | -| fusion_elementwise_mul_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| -| fusion_elementwise_pow_activation| | | | | | | | | |Y| | | | | | -| fusion_elementwise_sub_activation|Y|Y| | | |Y| | | |Y|Y|Y|Y| |Y| -| gather|Y| | |Y|Y|Y| | |Y| | | | | | | -| gather_nd| | | | |Y| | | | | | | | | | | -| gather_tree| | | | |Y| | | | | | | | | | | -| gelu|Y| | |Y| |Y| | | |Y| | | | | | -| generate_proposals| | | |Y|Y| | | | | | | | | | | -| generate_proposals_v2|Y| | | | | | | | | | | | | | | -| greater_equal| | | | |Y| | | | |Y| | | | | | -| greater_than| |Y| | |Y| | | | |Y| | | | | | -| grid_sampler|Y|Y| |Y| |Y| | | | | | | | | | -| group_norm|Y| | | | |Y| | | | | | | | | | -| gru|Y| | |Y| |Y| | | | | | | | | | -| gru_unit|Y| | |Y| |Y| | | | | | | | | | -| hard_sigmoid|Y|Y|Y|Y|Y| |Y| | |Y| | | | | | -| hard_swish|Y|Y|Y|Y|Y|Y|Y| | |Y| | | | | | -| im2sequence|Y| | |Y| | |Y| | | | | | | | | -| increment| | | |Y|Y| | | | | | | | | | | -| index_select| | | | |Y| | | | | | | | | | | -| instance_norm|Y|Y| |Y| |Y| | | |Y| | | | | | -| inverse| | | | |Y| | | | | | | | | | | -| io_copy| |Y|Y|Y| | | | |Y| | | | | | | -| io_copy_once| |Y|Y|Y| | | | | | | | | | | | -| is_empty| | | |Y|Y| | | | | | | | | | | -| layer_norm|Y| | |Y| |Y| | | |Y| | | | | | -| layout|Y|Y| | | |Y| | |Y| | | | | | | -| layout_once|Y|Y| | | | | | | | | | | | | | -| leaky_relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y| | | | | | -| less_equal| | | | |Y| | | | |Y| | | | | | -| less_than| | | |Y|Y| | | | |Y| | | | | | -| linspace| | | | |Y| | | | | | | | | | | -| lod_array_length| | | | |Y| | | | | | | | | | | -| lod_reset| | | | |Y| | | | | | | | | | | -| log|Y| | |Y|Y| | | | |Y| | | | | | -| logical_and| | | |Y|Y| | | | | | | | | | | -| logical_not| | | |Y|Y| | | | | | | | | | | -| logical_or| | | | |Y| | | | | | | | | | | -| logical_xor| | | | |Y| | | | | | | | | | | -| lookup_table|Y| | |Y| |Y| | | | | | | | | | -| lookup_table_dequant|Y| | | | | | | | | | | | | | | -| lookup_table_v2|Y| | |Y| |Y| | | |Y| | | | | | -| lrn|Y|Y| |Y| | | | |Y| | | | | | | -| lstm|Y| | | | | | | | | | | | | | | -| match_matrix_tensor| | | |Y| |Y| | | | | | | | | | -| matmul|Y|Y|Y|Y| |Y|Y| | |Y| | | | | | -| matmul_v2|Y|Y| |Y| | | | | |Y| | | | | | -| matrix_nms| | | | |Y| | | | | | | | | | | -| max_pool2d_with_index| | | |Y| | |Y| | | | | | | | | -| mean|Y| | | | | | | | | | | | | | | -| merge_lod_tensor|Y| | | | | | | | | | | | | | | -| meshgrid| | | | |Y| | | | | | | | | | | -| mish|Y| | | | |Y| | | | | | | | | | -| mul|Y| | |Y| |Y|Y| | | | | | | | | -| multiclass_nms|Y| | | |Y| |Y| | | | | | | | | -| multiclass_nms2|Y| | | |Y| |Y| | | | | | | | | -| multiclass_nms3|Y| | | |Y| | | | | | | | | | | -| nearest_interp|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| nearest_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | | -| negative|Y| | | | | | | | | | | | | | | -| norm|Y| | |Y|Y| |Y| |Y|Y| | | | | | -| not_equal| | | | |Y| | | | |Y| | | | | | -| one_hot| | | | |Y| | | | | | | | | | | -| one_hot_v2| | | | |Y| | | | | | | | | | | -| p_norm|Y| | | |Y| | | | |Y| | | | | | -| pad2d|Y|Y|Y|Y|Y| | | | |Y| | | | | | -| pad3d| | | | |Y| | | | |Y| | | | | | -| pixel_shuffle|Y|Y| |Y|Y| | | | | | | | | | | -| polygon_box_transform| | | | |Y| | | | | | | | | | | -| pool2d|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y| -| pow|Y| | |Y| |Y| | | |Y| | | | | | -| prelu|Y|Y| |Y|Y| | | | |Y| | | | | | -| print| | | | |Y| | | | | | | | | | | -| prior_box|Y| | |Y|Y| |Y| | | | | | | | | -| range| | | | |Y| | | | |Y| | | | | | -| read_from_array| | | |Y|Y| | | | | | | | | | | -| reciprocal|Y| | |Y|Y| | | | | | | | | | | -| reduce_all| | | |Y|Y| | | | | | | | | | | -| reduce_any| | | |Y|Y| | | | | | | | | | | -| reduce_max|Y|Y| |Y| |Y|Y| | | | | | | | | -| reduce_mean|Y|Y| |Y| |Y|Y| | |Y| | | | | | -| reduce_min|Y| | |Y| |Y| | | | | | | | | | -| reduce_prod|Y| | |Y| |Y| | | | | | | | | | -| reduce_sum|Y| | |Y| |Y|Y| | | | | | | | | -| relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y| -| relu6|Y|Y|Y|Y|Y|Y| | |Y|Y|Y|Y|Y|Y|Y| -| relu_clipped|Y| | | |Y| | | | | | | | | | | -| reshape| |Y| |Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| reshape2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y| -| retinanet_detection_output| | | | |Y| | | | | | | | | | | -| reverse| | | | |Y| | | | | | | | | | | -| rnn|Y| | |Y| |Y| | | | | | | | | | -| roi_align| | | |Y|Y| | | | | | | | | | | -| roi_perspective_transform| | | | |Y| | | | | | | | | | | -| rsqrt|Y|Y| |Y|Y|Y| | | | | | | | | | -| scale|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| scatter|Y| | | | | | | | | | | | | | | -| scatter_nd_add| | | | |Y| | | | | | | | | | | -| search_aligned_mat_mul| | | | | |Y| | | | | | | | | | -| search_attention_padding_mask| | | | | |Y| | | | | | | | | | -| search_fc| | | |Y| | | | | | | | | | | | -| search_grnn| | | |Y| |Y| | | | | | | | | | -| search_group_padding| | | | | |Y| | | | | | | | | | -| search_seq_arithmetic| | | |Y| |Y| | | | | | | | | | -| search_seq_depadding| | | | | |Y| | | | | | | | | | -| search_seq_fc| | | | | |Y| | | | | | | | | | -| search_seq_softmax| | | | | |Y| | | | | | | | | | -| select_input| | | | |Y| | | | | | | | | | | -| sequence_arithmetic| | | |Y| |Y| | | | | | | | | | -| sequence_concat| | | |Y| |Y| | | | | | | | | | -| sequence_conv|Y| | | | |Y| | | | | | | | | | -| sequence_expand| | | | |Y| | | | | | | | | | | -| sequence_expand_as|Y| | | | |Y| | | | | | | | | | -| sequence_mask| | | |Y|Y| | | | | | | | | | | -| sequence_pad| | | |Y|Y| | | | | | | | | | | -| sequence_pool|Y| | |Y| |Y| | | | | | | | | | -| sequence_reshape| | | | | |Y| | | | | | | | | | -| sequence_reverse| | | |Y| |Y| | | | | | | | | | -| sequence_softmax| | | | |Y| | | | | | | | | | | -| sequence_topk_avg_pooling| | | |Y| |Y| | | | | | | | | | -| sequence_unpad| | | |Y|Y| | | | | | | | | | | -| shape| |Y| |Y|Y| |Y| | |Y| | | | | | -| shuffle_channel|Y|Y|Y| |Y| | | | | | | | | | | -| sigmoid|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y| |Y| -| sign|Y| | |Y| | | | | | | | | | | | -| sin| |Y| | |Y| | | | | | | | | | | -| slice|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | | -| softmax|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y| -| softplus|Y| | | | | | | | | | | | | | | -| softsign| | | |Y| |Y| | | | | | | | | | -| sparse_conv2d|Y| | | | | | | | | | | | | | | -| split| |Y|Y|Y|Y| |Y| |Y|Y| | |Y| | | -| split_lod_tensor|Y| | | | | | | | | | | | | | | -| sqrt|Y|Y| |Y| |Y|Y| | | | | | | | | -| square|Y|Y| |Y|Y|Y|Y| | | | | | | | | -| squeeze| |Y| |Y|Y| |Y| |Y|Y| | | | | | -| squeeze2| |Y| |Y|Y| |Y| |Y|Y| | | | | | -| stack| | | |Y|Y|Y| | | |Y| | | | | | -| strided_slice| | | | |Y| | | | | | | | | | | -| subgraph| | | | | | | | |Y| | | | | | | -| sum|Y| | |Y| | | | | | | | | | | | -| swish|Y|Y|Y|Y|Y| |Y| | |Y| | | | | | -| sync_batch_norm|Y|Y| | | |Y| | | | | | | | | | -| tanh|Y|Y| |Y|Y|Y| | | |Y|Y|Y|Y| |Y| -| tensor_array_to_tensor| | | | |Y| | | | | | | | | | | -| thresholded_relu|Y| | | |Y| | | | | | | | | | | -| tile| | | | |Y| | | | | | | | | | | -| top_k| | | |Y|Y| | | | |Y| | | | | | -| top_k_v2| | | | |Y| | | | |Y| | | | | | -| transpose|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| transpose2|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y| -| tril_triu| | | | |Y| | | | | | | | | | | -| uniform_random| | | | |Y| | | | | | | | | | | -| unique_with_counts| | | | |Y| | | | | | | | | | | -| unsqueeze| |Y| |Y|Y| | | | |Y| | | | | | -| unsqueeze2| |Y| |Y|Y| | | | |Y| | | | | | -| unstack| | | |Y|Y| | | | | | | | | | | -| var_conv_2d| | | |Y| |Y| | | | | | | | | | -| where| | | | |Y| | | | | | | | | | | -| where_index| | | | |Y| | | | | | | | | | | -| while| | | | |Y| | | | | | | | | | | -| write_back| | | | |Y| | | | | | | | | | | -| write_to_array| | | |Y|Y| | | | | | | | | | | -| yolo_box|Y|Y|Y|Y|Y| |Y| | | | | | | | | -| __xpu__bigru| | | |Y| | | | | | | | | | | | -| __xpu__conv2d| | | |Y| | | | | | | | | | | | -| __xpu__dynamic_lstm_fuse_op| | | |Y| | | | | | | | | | | | -| __xpu__embedding_with_eltwise_add| | | |Y| | | | | | | | | | | | -| __xpu__fc| | | |Y| | | | | | | | | | | | -| __xpu__generate_sequence| | | |Y| | | | | | | | | | | | -| __xpu__logit| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_bid_emb_att| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_bid_emb_grnn_att| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_bid_emb_grnn_att2| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_match_conv_topk| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_merge_all| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_search_attention| | | |Y| | | | | | | | | | | | -| __xpu__mmdnn_search_attention2| | | |Y| | | | | | | | | | | | -| __xpu__multi_encoder| | | |Y| | | | | | | | | | | | -| __xpu__multi_softmax| | | |Y| | | | | | | | | | | | -| __xpu__resnet50| | | |Y| | | | | | | | | | | | -| __xpu__resnet_cbam| | | |Y| | | | | | | | | | | | -| __xpu__sfa_head| | | |Y| | | | | | | | | | | | -| __xpu__softmax_topk| | | |Y| | | | | | | | | | | | -| __xpu__squeeze_excitation_block| | | |Y| | | | | | | | | | | | +| OP_name| ARM | OpenCL | Metal | 昆仑芯XPU | Host | X86 | 比特大陆 | 英特尔FPGA | 寒武纪mlu | 华为昇腾NPU | 联发科APU | 瑞芯微NPU | 华为麒麟NPU | 颖脉NNA | 晶晨NPU | 芯原TIM-VX | +|-:|-| -| -| -| -| -| -| -| -| -| -| -| -| -| -| -| +| abs|Y|Y| |Y|Y| | | | |Y| | | | | | | +| affine_channel|Y| | | | | | | | | | | | | | | | +| affine_grid|Y| | | | | | | | | | | | | | | | +| anchor_generator| | | |Y|Y| | | | | | | | | | | | +| arg_max|Y|Y| |Y|Y| | | |Y|Y| | | | | | | +| arg_min| | | | | | | | | |Y| | | | | | | +| argsort| | | | |Y| | | | | | | | | | | | +| assign| | | |Y|Y| | | | |Y| | | | | | | +| assign_value| | | |Y|Y| |Y| | |Y| | | | | | | +| attention_padding_mask| | | | | | | | | | | | | | | | | +| axpy|Y| | | | | | | | | | | | | | | | +| batch_norm|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| beam_search| | | | |Y| | | | | | | | | | | | +| beam_search_decode| | | | |Y| | | | | | | | | | | | +| bilinear_interp|Y|Y|Y|Y| |Y|Y| | |Y| | | | | |Y| +| bilinear_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | |Y| +| bmm| | | |Y| | | | | | | | | | | | | +| box_clip| | | |Y|Y| | | | | | | | | | | | +| box_coder|Y|Y|Y|Y|Y|Y|Y| | | | | | | | | | +| calib|Y| | |Y| | | | |Y| | | | | | | | +| calib_once|Y| | |Y| | | | |Y| | | | | | | | +| cast| | | |Y|Y| |Y| |Y|Y| | | | | | | +| clip|Y|Y| |Y| |Y| | | |Y| | | | | | | +| collect_fpn_proposals| | | | |Y| | | | | | | | | | | | +| concat|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| conditional_block| | | | |Y| | | | | | | | | | | | +| conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| +| conv2d_transpose|Y|Y|Y|Y| |Y|Y| | |Y| | | | |Y|Y| +| correlation| | | |Y|Y| | | | | | | | | | | | +| cos| |Y| | |Y| | | | | | | | | | | | +| cos_sim| | | | |Y| | | | | | | | | | | | +| crf_decoding| | | | |Y| | | | | | | | | | | | +| crop| | | | |Y| | | | | | | | | | | | +| crop_tensor| | | | |Y| | | | | | | | | | | | +| ctc_align| | | | |Y| | | | | | | | | | | | +| cumsum| | | | |Y| | | | |Y| | | | | | | +| decode_bboxes|Y| | | | | | | | | | | | | | | | +| deformable_conv|Y| | | |Y| | | | |Y| | | | | | | +| density_prior_box| | | |Y|Y|Y|Y| | | | | | | | | | +| depthwise_conv2d|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y| +| depthwise_conv2d_transpose| |Y| | | | |Y| | | | | | | | | | +| distribute_fpn_proposals| | | | |Y| | | | | | | | | | | | +| dropout|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| elementwise_add|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| elementwise_div|Y| | |Y| |Y|Y| | |Y|Y|Y|Y| |Y|Y| +| elementwise_floordiv|Y| | | | |Y| | | | | | | | | | | +| elementwise_max|Y| | |Y| |Y| | | |Y| | | | | |Y| +| elementwise_min|Y| | | | |Y| | | |Y| | | | | |Y| +| elementwise_mod|Y| | | | |Y| | | | | | | | | | | +| elementwise_mul|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| elementwise_pow|Y| | | | |Y| | | |Y| | | | | |Y| +| elementwise_sub|Y|Y|Y|Y| |Y|Y| | |Y|Y|Y|Y| |Y|Y| +| elu|Y| | | |Y| | | | | | | | | | | | +| equal| | | | |Y| | | | |Y| | | | | | | +| erf|Y| | | | | | | | | | | | | | | | +| exp|Y|Y|Y|Y|Y| | | | |Y| | | | | | | +| expand| |Y| | |Y| | | | | | | | | | | | +| expand_as| | | | |Y| | | | | | | | | | | | +| expand_v2| | | |Y|Y| | | | |Y| | | | | | | +| fake_channel_wise_dequantize_max_abs| | | | | | | | | | | | | | | | | +| fake_channel_wise_quantize_dequantize_abs_max| | | | | | | | | | | | | | | | | +| fake_dequantize_max_abs| | | | | | | | | | | | | | | | | +| fake_quantize_abs_max| | | | | | | | | | | | | | | | | +| fake_quantize_dequantize_abs_max| | | | | | | | | | | | | | | | | +|fake_quantize_dequantize_moving_average_abs_max| | | | | | | | | | | | | | | | | +| fake_quantize_moving_average_abs_max| | | | | | | | | | | | | | | | | +| fake_quantize_range_abs_max| | | | | | | | | | | | | | | | | +| fc|Y|Y|Y| | |Y| | |Y|Y|Y|Y|Y|Y|Y|Y| +| feed| | |Y| |Y| | | | | | | | | | | | +| fetch| | |Y| |Y| | | | | | | | | | | | +| fill_any_like| | | |Y|Y| | | | |Y| | | | | |Y| +| fill_constant| | | |Y|Y| |Y| | |Y| | | | | | | +| fill_constant_batch_size_like| | | |Y|Y| | | | | | | | | | |Y| +| fill_zeros_like| | | |Y|Y| | | | | | | | | | | | +| flatten| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| flatten2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| flatten_contiguous_range| | | |Y|Y| | | | |Y|Y|Y|Y| |Y|Y| +| flip| | | | |Y| | | | | | | | | | | | +| floor|Y| | | |Y| | | | | | | | | | | | +| fusion_elementwise_add_activation|Y|Y|Y| | |Y| | | |Y|Y|Y|Y| |Y| | +| fusion_elementwise_div_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| | +| fusion_elementwise_max_activation|Y| | | | |Y| | | |Y| | | | | | | +| fusion_elementwise_min_activation|Y| | | | |Y| | | |Y| | | | | | | +| fusion_elementwise_mul_activation|Y| | | | |Y| | | |Y|Y|Y|Y| |Y| | +| fusion_elementwise_pow_activation| | | | | | | | | |Y| | | | | | | +| fusion_elementwise_sub_activation|Y|Y| | | |Y| | | |Y|Y|Y|Y| |Y| | +| gather|Y| | |Y|Y|Y| | |Y| | | | | | | | +| gather_nd| | | | |Y| | | | | | | | | | | | +| gather_tree| | | | |Y| | | | | | | | | | | | +| gelu|Y| | |Y| |Y| | | |Y| | | | | | | +| generate_proposals| | | |Y|Y| | | | | | | | | | | | +| generate_proposals_v2|Y| | | | | | | | | | | | | | | | +| greater_equal| | | | |Y| | | | |Y| | | | | | | +| greater_than| |Y| | |Y| | | | |Y| | | | | | | +| grid_sampler|Y|Y| |Y| |Y| | | | | | | | | | | +| group_norm|Y| | | | |Y| | | | | | | | | | | +| gru|Y| | |Y| |Y| | | | | | | | | | | +| gru_unit|Y| | |Y| |Y| | | | | | | | | | | +| hard_sigmoid|Y|Y|Y|Y|Y| |Y| | |Y| | | | | |Y| +| hard_swish|Y|Y|Y|Y|Y|Y|Y| | |Y| | | | | |Y| +| im2sequence|Y| | |Y| | |Y| | | | | | | | | | +| increment| | | |Y|Y| | | | | | | | | | | | +| index_select| | | | |Y| | | | | | | | | | | | +| instance_norm|Y|Y| |Y| |Y| | | |Y| | | | | | | +| inverse| | | | |Y| | | | | | | | | | | | +| io_copy| |Y|Y|Y| | | | |Y| | | | | | | | +| io_copy_once| |Y|Y|Y| | | | | | | | | | | | | +| is_empty| | | |Y|Y| | | | | | | | | | | | +| layer_norm|Y| | |Y| |Y| | | |Y| | | | | | | +| layout|Y|Y| | | |Y| | |Y| | | | | | | | +| layout_once|Y|Y| | | | | | | | | | | | | | | +| leaky_relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y| | | | | |Y| +| less_equal| | | | |Y| | | | |Y| | | | | | | +| less_than| | | |Y|Y| | | | |Y| | | | | | | +| linspace| | | | |Y| | | | | | | | | | | | +| lod_array_length| | | | |Y| | | | | | | | | | | | +| lod_reset| | | | |Y| | | | | | | | | | | | +| log|Y| | |Y|Y| | | | |Y| | | | | | | +| logical_and| | | |Y|Y| | | | | | | | | | | | +| logical_not| | | |Y|Y| | | | | | | | | | | | +| logical_or| | | | |Y| | | | | | | | | | | | +| logical_xor| | | | |Y| | | | | | | | | | | | +| lookup_table|Y| | |Y| |Y| | | | | | | | | | | +| lookup_table_dequant|Y| | | | | | | | | | | | | | | | +| lookup_table_v2|Y| | |Y| |Y| | | |Y| | | | | | | +| lrn|Y|Y| |Y| | | | |Y| | | | | | | | +| lstm|Y| | | | | | | | | | | | | | | | +| match_matrix_tensor| | | |Y| |Y| | | | | | | | | | | +| matmul|Y|Y|Y|Y| |Y|Y| | |Y| | | | | |Y| +| matmul_v2|Y|Y| |Y| | | | | |Y| | | | | |Y| +| matrix_nms| | | | |Y| | | | | | | | | | | | +| max_pool2d_with_index| | | |Y| | |Y| | | | | | | | | | +| mean|Y| | | | | | | | | | | | | | | | +| merge_lod_tensor|Y| | | | | | | | | | | | | | | | +| meshgrid| | | | |Y| | | | | | | | | | | | +| mish|Y| | | | |Y| | | | | | | | | | | +| mul|Y| | |Y| |Y|Y| | | | | | | | | | +| multiclass_nms|Y| | | |Y| |Y| | | | | | | | | | +| multiclass_nms2|Y| | | |Y| |Y| | | | | | | | | | +| multiclass_nms3|Y| | | |Y| | | | | | | | | | | | +| nearest_interp|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| nearest_interp_v2|Y|Y|Y|Y| |Y| | | |Y| | | | | |Y| +| negative|Y| | | | | | | | | | | | | | | | +| norm|Y| | |Y|Y| |Y| |Y|Y| | | | | | | +| not_equal| | | | |Y| | | | |Y| | | | | | | +| one_hot| | | | |Y| | | | | | | | | | | | +| one_hot_v2| | | | |Y| | | | | | | | | | | | +| p_norm|Y| | | |Y| | | | |Y| | | | | | | +| pad2d|Y|Y|Y|Y|Y| | | | |Y| | | | | | | +| pad3d| | | | |Y| | | | |Y| | | | | | | +| pixel_shuffle|Y|Y| |Y|Y| | | | | | | | | | | | +| polygon_box_transform| | | | |Y| | | | | | | | | | | | +| pool2d|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y| +| pow|Y| | |Y| |Y| | | |Y| | | | | | | +| prelu|Y|Y| |Y|Y| | | | |Y| | | | | | | +| print| | | | |Y| | | | | | | | | | | | +| prior_box|Y| | |Y|Y| |Y| | | | | | | | | | +| range| | | | |Y| | | | |Y| | | | | | | +| read_from_array| | | |Y|Y| | | | | | | | | | | | +| reciprocal|Y| | |Y|Y| | | | | | | | | | | | +| reduce_all| | | |Y|Y| | | | | | | | | | | | +| reduce_any| | | |Y|Y| | | | | | | | | | | | +| reduce_max|Y|Y| |Y| |Y|Y| | | | | | | | | | +| reduce_mean|Y|Y| |Y| |Y|Y| | |Y| | | | | | | +| reduce_min|Y| | |Y| |Y| | | | | | | | | | | +| reduce_prod|Y| | |Y| |Y| | | | | | | | | | | +| reduce_sum|Y| | |Y| |Y|Y| | | | | | | | | | +| relu|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y| +| relu6|Y|Y|Y|Y|Y|Y| | |Y|Y|Y|Y|Y|Y|Y|Y| +| relu_clipped|Y| | | |Y| | | | | | | | | | | | +| reshape| |Y| |Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| reshape2| |Y|Y|Y|Y| |Y| |Y|Y|Y|Y|Y| |Y|Y| +| retinanet_detection_output| | | | |Y| | | | | | | | | | | | +| reverse| | | | |Y| | | | | | | | | | | | +| rnn|Y| | |Y| |Y| | | | | | | | | | | +| roi_align| | | |Y|Y| | | | | | | | | | | | +| roi_perspective_transform| | | | |Y| | | | | | | | | | | | +| rsqrt|Y|Y| |Y|Y|Y| | | | | | | | | | | +| scale|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| scatter|Y| | | | | | | | | | | | | | | | +| scatter_nd_add| | | | |Y| | | | | | | | | | | | +| search_aligned_mat_mul| | | | | |Y| | | | | | | | | | | +| search_attention_padding_mask| | | | | |Y| | | | | | | | | | | +| search_fc| | | |Y| | | | | | | | | | | | | +| search_grnn| | | |Y| |Y| | | | | | | | | | | +| search_group_padding| | | | | |Y| | | | | | | | | | | +| search_seq_arithmetic| | | |Y| |Y| | | | | | | | | | | +| search_seq_depadding| | | | | |Y| | | | | | | | | | | +| search_seq_fc| | | | | |Y| | | | | | | | | | | +| search_seq_softmax| | | | | |Y| | | | | | | | | | | +| select_input| | | | |Y| | | | | | | | | | | | +| sequence_arithmetic| | | |Y| |Y| | | | | | | | | | | +| sequence_concat| | | |Y| |Y| | | | | | | | | | | +| sequence_conv|Y| | | | |Y| | | | | | | | | | | +| sequence_expand| | | | |Y| | | | | | | | | | | | +| sequence_expand_as|Y| | | | |Y| | | | | | | | | | | +| sequence_mask| | | |Y|Y| | | | | | | | | | | | +| sequence_pad| | | |Y|Y| | | | | | | | | | | | +| sequence_pool|Y| | |Y| |Y| | | | | | | | | | | +| sequence_reshape| | | | | |Y| | | | | | | | | | | +| sequence_reverse| | | |Y| |Y| | | | | | | | | | | +| sequence_softmax| | | | |Y| | | | | | | | | | | | +| sequence_topk_avg_pooling| | | |Y| |Y| | | | | | | | | | | +| sequence_unpad| | | |Y|Y| | | | | | | | | | | | +| shape| |Y| |Y|Y| |Y| | |Y| | | | | | | +| shuffle_channel|Y|Y|Y| |Y| | | | | | | | | | |Y| +| sigmoid|Y|Y|Y|Y|Y|Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| sign|Y| | |Y| | | | | | | | | | | | | +| sin| |Y| | |Y| | | | | | | | | | | | +| slice|Y|Y|Y|Y| |Y|Y| |Y|Y| | | | | |Y| +| softmax|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y|Y|Y|Y| +| softplus|Y| | | | | | | | | | | | | | | | +| softsign| | | |Y| |Y| | | | | | | | | | | +| sparse_conv2d|Y| | | | | | | | | | | | | | | | +| split| |Y|Y|Y|Y| |Y| |Y|Y| | |Y| | |Y| +| split_lod_tensor|Y| | | | | | | | | | | | | | | | +| sqrt|Y|Y| |Y| |Y|Y| | | | | | | | | | +| square|Y|Y| |Y|Y|Y|Y| | | | | | | | | | +| squeeze| |Y| |Y|Y| |Y| |Y|Y| | | | | |Y| +| squeeze2| |Y| |Y|Y| |Y| |Y|Y| | | | | |Y| +| stack| | | |Y|Y|Y| | | |Y| | | | | | | +| strided_slice| | | | |Y| | | | | | | | | | | | +| subgraph| | | | | | | | |Y| | | | | | | | +| sum|Y| | |Y| | | | | | | | | | | | | +| swish|Y|Y|Y|Y|Y| |Y| | |Y| | | | | | | +| sync_batch_norm|Y|Y| | | |Y| | | | | | | | | | | +| tanh|Y|Y| |Y|Y|Y| | | |Y|Y|Y|Y| |Y| | +| tensor_array_to_tensor| | | | |Y| | | | | | | | | | | | +| thresholded_relu|Y| | | |Y| | | | | | | | | | | | +| tile| | | | |Y| | | | | | | | | | | | +| top_k| | | |Y|Y| | | | |Y| | | | | | | +| top_k_v2| | | | |Y| | | | |Y| | | | | | | +| transpose|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| transpose2|Y|Y|Y|Y| |Y|Y| |Y|Y|Y|Y|Y| |Y|Y| +| tril_triu| | | | |Y| | | | | | | | | | | | +| uniform_random| | | | |Y| | | | | | | | | | | | +| unique_with_counts| | | | |Y| | | | | | | | | | | | +| unsqueeze| |Y| |Y|Y| | | | |Y| | | | | |Y| +| unsqueeze2| |Y| |Y|Y| | | | |Y| | | | | |Y| +| unstack| | | |Y|Y| | | | | | | | | | | | +| var_conv_2d| | | |Y| |Y| | | | | | | | | | | +| where| | | | |Y| | | | | | | | | | | | +| where_index| | | | |Y| | | | | | | | | | | | +| while| | | | |Y| | | | | | | | | | | | +| write_back| | | | |Y| | | | | | | | | | | | +| write_to_array| | | |Y|Y| | | | | | | | | | | | +| yolo_box|Y|Y|Y|Y|Y| |Y| | | | | | | | | | +| __xpu__bigru| | | |Y| | | | | | | | | | | | | +| __xpu__conv2d| | | |Y| | | | | | | | | | | | | +| __xpu__dynamic_lstm_fuse_op| | | |Y| | | | | | | | | | | | | +| __xpu__embedding_with_eltwise_add| | | |Y| | | | | | | | | | | | | +| __xpu__fc| | | |Y| | | | | | | | | | | | | +| __xpu__generate_sequence| | | |Y| | | | | | | | | | | | | +| __xpu__logit| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_bid_emb_att| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_bid_emb_grnn_att| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_bid_emb_grnn_att2| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_match_conv_topk| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_merge_all| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_search_attention| | | |Y| | | | | | | | | | | | | +| __xpu__mmdnn_search_attention2| | | |Y| | | | | | | | | | | | | +| __xpu__multi_encoder| | | |Y| | | | | | | | | | | | | +| __xpu__multi_softmax| | | |Y| | | | | | | | | | | | | +| __xpu__resnet50| | | |Y| | | | | | | | | | | | | +| __xpu__resnet_cbam| | | |Y| | | | | | | | | | | | | +| __xpu__sfa_head| | | |Y| | | | | | | | | | | | | +| __xpu__softmax_topk| | | |Y| | | | | | | | | | | | | +| __xpu__squeeze_excitation_block| | | |Y| | | | | | | | | | | | | diff --git a/docs/source_compile/include/multi_device_support/nnadapter_support_verisilicon_timvx.rst b/docs/source_compile/include/multi_device_support/nnadapter_support_verisilicon_timvx.rst new file mode 100644 index 00000000000..fcb3c51d3bb --- /dev/null +++ b/docs/source_compile/include/multi_device_support/nnadapter_support_verisilicon_timvx.rst @@ -0,0 +1,27 @@ +NNAdapter 支持芯原 TIM-VX +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. list-table:: + + * - 参数 + - 说明 + - 可选范围 + - 默认值 + * - nnadapter_with_verisilicon_timvx + - 是否编译芯原 TIM-VX 的 NNAdapter HAL 库 + - OFF / ON + - OFF + * - nnadapter_verisilicon_timvx_src_git_tag + - 设置芯原 TIM-VX 的代码分支 + - TIM-VX repo 分支名 + - main + * - nnadapter_verisilicon_timvx_viv_sdk_url + - 设置芯原 TIM-VX SDK 的下载链接 + - 用户自定义 + - Android系统:http://paddlelite-demo.bj.bcebos.com/devices/verisilicon/sdk/viv_sdk_android_9_armeabi_v7a_6_4_4_3_generic.tgz + Linux系统:http://paddlelite-demo.bj.bcebos.com/devices/verisilicon/sdk/viv_sdk_linux_arm64_6_4_4_3_generic.tgz + * - nnadapter_verisilicon_timvx_viv_sdk_root + - 设置芯原 TIM-VX 的本地路径 + - 用户自定义 + - 空值 +详细请参考 `芯原 TIM-VX 部署示例 `_ diff --git a/docs/source_compile/linux_x86_compile_android.rst b/docs/source_compile/linux_x86_compile_android.rst index e32c4ff45d1..c69e0e7e358 100644 --- a/docs/source_compile/linux_x86_compile_android.rst +++ b/docs/source_compile/linux_x86_compile_android.rst @@ -242,3 +242,5 @@ Paddle Lite 仓库中\ ``/lite/tools/build_android.sh``\ 脚本文件用于构 .. include:: include/multi_device_support/nnadapter_support_mediatek_apu.rst .. include:: include/multi_device_support/nnadapter_support_amlogic_npu.rst + +.. include:: include/multi_device_support/nnadapter_support_verisilicon_timvx.rst \ No newline at end of file diff --git a/docs/source_compile/linux_x86_compile_arm_linux.rst b/docs/source_compile/linux_x86_compile_arm_linux.rst index 136309891ad..4b0635827b1 100644 --- a/docs/source_compile/linux_x86_compile_arm_linux.rst +++ b/docs/source_compile/linux_x86_compile_arm_linux.rst @@ -186,3 +186,5 @@ Paddle Lite 仓库中 \ ``./lite/tools/build_linux.sh``\ 脚本文件用于构 .. include:: include/multi_device_support/nnadapter_support_rockchip_npu.rst .. include:: include/multi_device_support/nnadapter_support_amlogic_npu.rst + +.. include:: include/multi_device_support/nnadapter_support_verisilicon_timvx.rst \ No newline at end of file