tvm-build

1. Pull

docker pull pinto0309/ubuntu2004-cuda114-cudnn8-tensorrt823-tvm:09dev0

2. Build

docker build -t pinto0309/ubuntu2004-cuda114-cudnn8-tensorrt823-tvm:09dev0 .

3. Run

docker run --rm -it --gpus all \
-v `pwd`:/home/user/workdir \
pinto0309/ubuntu2004-cuda114-cudnn8-tensorrt823-tvm:09dev0

4. TVM Summary

Build summary

--   ---------------- Summary ----------------
--   CMake version         : 3.16.3
--   CMake executable      : /usr/bin/cmake
--   Generator             : Ninja
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler ID       : GNU
--   C++ compiler version  : 9.3.0
--   CXX flags             : -std=c++14 -faligned-new -O2 -Wall -fPIC 
--   Build type            : 
--   Compile definitions   : TVM_INDEX_DEFAULT_I64=1;USE_PROFILER=1;TVM_THREADPOOL_USE_OPENMP=0;DMLC_USE_FOPEN64=0;NDEBUG=1;_GNU_SOURCE;__STDC_CONSTANT_MACROS;__STDC_FORMAT_MACROS;__STDC_LIMIT_MACROS;TVM_LLVM_VERSION=140;USE_DNNL=1;TF_TVMDSOOP_ENABLE_GPU;PT_TVMDSOOP_ENABLE_GPU;TVM_GRAPH_EXECUTOR_TENSORRT
--   Options:
--    BUILD_STATIC_RUNTIME               : ON
--    COMPILER_RT_PATH                   : 3rdparty/compiler-rt
--    DLPACK_PATH                        : 3rdparty/dlpack/include
--    DMLC_PATH                          : 3rdparty/dmlc-core/include
--    HIDE_PRIVATE_SYMBOLS               : OFF
--    INDEX_DEFAULT_I64                  : ON
--    INSTALL_DEV                        : OFF
--    PICOJSON_PATH                      : 3rdparty/picojson
--    RANG_PATH                          : 3rdparty/rang/include
--    ROCM_PATH                          : /opt/rocm
--    SUMMARIZE                          : ON
--    USE_ARM_COMPUTE_LIB                : OFF
--    USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR : OFF
--    USE_BLAS                           : none
--    USE_BNNS                           : OFF
--    USE_BYODT_POSIT                    : OFF
--    USE_CMSISNN                        : OFF
--    USE_COREML                         : OFF
--    USE_CPP_RPC                        : ON
--    USE_CUBLAS                         : OFF
--    USE_CUDA                           : ON
--    USE_CUDNN                          : ON
--    USE_CUTLASS                        : OFF
--    USE_DNNL_CODEGEN                   : OFF
--    USE_ETHOSN                         : OFF
--    USE_FALLBACK_STL_MAP               : OFF
--    USE_GRAPH_EXECUTOR                 : ON
--    USE_GRAPH_EXECUTOR_CUDA_GRAPH      : ON
--    USE_GTEST                          : AUTO
--    USE_HEXAGON_DEVICE                 : OFF
--    USE_HEXAGON_RPC                    : OFF
--    USE_HEXAGON_SDK                    : /path/to/sdk
--    USE_IOS_RPC                        : OFF
--    USE_LIBBACKTRACE                   : ON
--    USE_LLVM                           : ON
--    USE_METAL                          : OFF
--    USE_MICRO                          : OFF
--    USE_MICRO_STANDALONE_RUNTIME       : OFF
--    USE_MIOPEN                         : OFF
--    USE_MKL                            : OFF
--    USE_MKLDNN                         : ON
--    USE_MSVC_MT                        : OFF
--    USE_NNPACK                         : ON
--    USE_OPENCL                         : OFF
--    USE_OPENMP                         : ON
--    USE_PAPI                           : OFF
--    USE_PROFILER                       : ON
--    USE_PT_TVMDSOOP                    : ON
--    USE_RANDOM                         : ON
--    USE_RELAY_DEBUG                    : OFF
--    USE_ROCBLAS                        : OFF
--    USE_ROCM                           : OFF
--    USE_RPC                            : ON
--    USE_RTTI                           : ON
--    USE_RUST_EXT                       : OFF
--    USE_SORT                           : ON
--    USE_STACKVM_RUNTIME                : OFF
--    USE_TARGET_ONNX                    : ON
--    USE_TENSORFLOW_PATH                : none
--    USE_TENSORRT_CODEGEN               : ON
--    USE_TENSORRT_RUNTIME               : ON
--    USE_TFLITE                         : OFF
--    USE_TF_TVMDSOOP                    : ON
--    USE_THREADS                        : ON
--    USE_THRUST                         : OFF
--    USE_VITIS_AI                       : OFF
--    USE_VULKAN                         : OFF

5. Environment

List of installed tools

TVM v0.9.dev0
Python 3.8+
TensorFlow v2.8.0+
PyTorch v1.10.0+
TorchVision
TorchAudio
OpenVINO 2021.4.582+
TensorRT 8.2+
trtexec
pycuda 2021.1
tensorflowjs
coremltools
paddle2onnx
onnx
onnxruntime
onnxruntime-extensions
onnx_graphsurgeon
onnx-simplifier
onnxconverter-common
onnxmltools
onnx-tensorrt
onnx2json
json2onnx
tf2onnx
torch2trt
onnx-tf
tensorflow-datasets
tf_slim
edgetpu_compiler
tflite2tensorflow
openvino2tensorflow
gdown
pandas
matplotlib
paddlepaddle
paddle2onnx
pycocotools
scipy
Intel-Media-SDK
Intel iHD GPU (iGPU) support
OpenCL

6. Tutorial

6-1. tvmc

https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html#compiling-and-optimizing-a-model-with-tvmc

$ python -m tvm.driver.tvmc
usage: tvmc [-v] [--version] [-h] {run,tune,compile} ...

TVM compiler driver

optional arguments:
  -v, --verbose       increase verbosity
  --version           print the version and exit
  -h, --help          show this help message and exit.

commands:
  {run,tune,compile}
    run               run a compiled module
    tune              auto-tune a model
    compile           compile a model.

TVMC - TVM driver command-line interface

$ wget https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx

$ python -m tvm.driver.tvmc compile \
--target "llvm" \
--output resnet50-v2-7-tvm.tar \
resnet50-v2-7.onnx

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.

$ mkdir model
$ tar -xvf resnet50-v2-7-tvm.tar -C model
$ ls -l model

total 100496
-rw-r--r-- 1 user user     89142 Feb 23 15:41 mod.json
-rw-r--r-- 1 user user 102125470 Feb 23 15:41 mod.params
-rwxr-xr-x 1 user user    685072 Feb 23 15:41 mod.so

mod.so is the model, represented as a C++ library, that can be loaded by the TVM runtime.
mod.json is a text representation of the TVM Relay computation graph.
mod.params is a file containing the parameters for the pre-trained model.

preprocess.py

from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# ONNX expects NCHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))

# Normalize according to ImageNet
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
    norm_img_data[i,:,:] = (img_data[i,:,:] / 255 - imagenet_mean[i]) / imagenet_stddev[i]

# Add batch dimension
img_data = np.expand_dims(norm_img_data, axis=0)

# Save to .npz (outputs imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)

kitten.jpg

$ python preprocess.py

$ python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm.tar

postprocess.py

import os.path
import numpy as np

from scipy.special import softmax
from tvm.contrib.download import download_testdata

# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

output_file = "predictions.npz"

# Open the output and read the output tensor
if os.path.exists(output_file):
    with np.load(output_file) as data:
        scores = softmax(data["output_0"])
        scores = np.squeeze(scores)
        ranks = np.argsort(scores)[::-1]

        for rank in ranks[0:5]:
            print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

$ python postprocess.py

class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

$ python -m tvm.driver.tvmc tune --help

parameters

usage: tvmc tune [-h]
--target TARGET
-o OUTPUT
[--early-stopping EARLY_STOPPING]
[--min-repeat-ms MIN_REPEAT_MS]
[--model-format {keras,onnx,pb,tflite,pytorch,paddle}]
[--number NUMBER]
[--parallel PARALLEL]
[--repeat REPEAT]
[--rpc-key RPC_KEY]
[--rpc-tracker RPC_TRACKER]
[--target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE]
[--target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS]
[--target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL]
[--target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG]
[--target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE]
[--target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS]
[--target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE]
[--target-ext_dev-libs TARGET_EXT_DEV_LIBS]
[--target-ext_dev-model TARGET_EXT_DEV_MODEL]
[--target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB]
[--target-ext_dev-tag TARGET_EXT_DEV_TAG]
[--target-ext_dev-device TARGET_EXT_DEV_DEVICE]
[--target-ext_dev-keys TARGET_EXT_DEV_KEYS]
[--target-llvm-fast-math TARGET_LLVM_FAST_MATH]
[--target-llvm-opt-level TARGET_LLVM_OPT_LEVEL]
[--target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API]
[--target-llvm-from_device TARGET_LLVM_FROM_DEVICE]
[--target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF]
[--target-llvm-mattr TARGET_LLVM_MATTR]
[--target-llvm-num-cores TARGET_LLVM_NUM_CORES]
[--target-llvm-libs TARGET_LLVM_LIBS]
[--target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ]
[--target-llvm-link-params TARGET_LLVM_LINK_PARAMS]
[--target-llvm-interface-api TARGET_LLVM_INTERFACE_API]
[--target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT]
[--target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB]
[--target-llvm-tag TARGET_LLVM_TAG]
[--target-llvm-mtriple TARGET_LLVM_MTRIPLE]
[--target-llvm-model TARGET_LLVM_MODEL]
[--target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI]
[--target-llvm-mcpu TARGET_LLVM_MCPU]
[--target-llvm-device TARGET_LLVM_DEVICE]
[--target-llvm-runtime TARGET_LLVM_RUNTIME]
[--target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP]
[--target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC]
[--target-llvm-mabi TARGET_LLVM_MABI]
[--target-llvm-keys TARGET_LLVM_KEYS]
[--target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN]
[--target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE]
[--target-hybrid-libs TARGET_HYBRID_LIBS]
[--target-hybrid-model TARGET_HYBRID_MODEL]
[--target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB]
[--target-hybrid-tag TARGET_HYBRID_TAG]
[--target-hybrid-device TARGET_HYBRID_DEVICE]
[--target-hybrid-keys TARGET_HYBRID_KEYS]
[--target-aocl-from_device TARGET_AOCL_FROM_DEVICE]
[--target-aocl-libs TARGET_AOCL_LIBS]
[--target-aocl-model TARGET_AOCL_MODEL]
[--target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB]
[--target-aocl-tag TARGET_AOCL_TAG]
[--target-aocl-device TARGET_AOCL_DEVICE]
[--target-aocl-keys TARGET_AOCL_KEYS]
[--target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS]
[--target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE]
[--target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE]
[--target-nvptx-libs TARGET_NVPTX_LIBS]
[--target-nvptx-model TARGET_NVPTX_MODEL]
[--target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB]
[--target-nvptx-mtriple TARGET_NVPTX_MTRIPLE]
[--target-nvptx-tag TARGET_NVPTX_TAG]
[--target-nvptx-mcpu TARGET_NVPTX_MCPU]
[--target-nvptx-device TARGET_NVPTX_DEVICE]
[--target-nvptx-keys TARGET_NVPTX_KEYS]
[--target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS]
[--target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE]
[--target-opencl-from_device TARGET_OPENCL_FROM_DEVICE]
[--target-opencl-libs TARGET_OPENCL_LIBS]
[--target-opencl-model TARGET_OPENCL_MODEL]
[--target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB]
[--target-opencl-tag TARGET_OPENCL_TAG]
[--target-opencl-device TARGET_OPENCL_DEVICE]
[--target-opencl-keys TARGET_OPENCL_KEYS]
[--target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS]
[--target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE]
[--target-metal-from_device TARGET_METAL_FROM_DEVICE]
[--target-metal-libs TARGET_METAL_LIBS]
[--target-metal-keys TARGET_METAL_KEYS]
[--target-metal-model TARGET_METAL_MODEL]
[--target-metal-system-lib TARGET_METAL_SYSTEM_LIB]
[--target-metal-tag TARGET_METAL_TAG]
[--target-metal-device TARGET_METAL_DEVICE]
[--target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS]
[--target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS]
[--target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE]
[--target-webgpu-libs TARGET_WEBGPU_LIBS]
[--target-webgpu-model TARGET_WEBGPU_MODEL]
[--target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB]
[--target-webgpu-tag TARGET_WEBGPU_TAG]
[--target-webgpu-device TARGET_WEBGPU_DEVICE]
[--target-webgpu-keys TARGET_WEBGPU_KEYS]
[--target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS]
[--target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE]
[--target-rocm-from_device TARGET_ROCM_FROM_DEVICE]
[--target-rocm-libs TARGET_ROCM_LIBS]
[--target-rocm-model TARGET_ROCM_MODEL]
[--target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB]
[--target-rocm-mtriple TARGET_ROCM_MTRIPLE]
[--target-rocm-tag TARGET_ROCM_TAG]
[--target-rocm-mcpu TARGET_ROCM_MCPU]
[--target-rocm-device TARGET_ROCM_DEVICE]
[--target-rocm-keys TARGET_ROCM_KEYS]
[--target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS]
[--target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE]
[--target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z]
[--target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER]
[--target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION]
[--target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE]
[--target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER]
[--target-vulkan-libs TARGET_VULKAN_LIBS]
[--target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS]
[--target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION]
[--target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE]
[--target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE]
[--target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR]
[--target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64]
[--target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32]
[--target-vulkan-model TARGET_VULKAN_MODEL]
[--target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X]
[--target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB]
[--target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y]
[--target-vulkan-tag TARGET_VULKAN_TAG]
[--target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8]
[--target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION]
[--target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION]
[--target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER]
[--target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE]
[--target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32]
[--target-vulkan-device TARGET_VULKAN_DEVICE]
[--target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME]
[--target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16]
[--target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS]
[--target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64]
[--target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE]
[--target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME]
[--target-vulkan-keys TARGET_VULKAN_KEYS]
[--target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK]
[--target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16]
[--target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS]
[--target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE]
[--target-cuda-from_device TARGET_CUDA_FROM_DEVICE]
[--target-cuda-arch TARGET_CUDA_ARCH]
[--target-cuda-libs TARGET_CUDA_LIBS]
[--target-cuda-shared_memory_per_block TARGET_CUDA_SHARED_MEMORY_PER_BLOCK]
[--target-cuda-model TARGET_CUDA_MODEL]
[--target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB]
[--target-cuda-tag TARGET_CUDA_TAG]
[--target-cuda-device TARGET_CUDA_DEVICE]
[--target-cuda-mcpu TARGET_CUDA_MCPU]
[--target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK]
[--target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK]
[--target-cuda-keys TARGET_CUDA_KEYS]
[--target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE]
[--target-sdaccel-libs TARGET_SDACCEL_LIBS]
[--target-sdaccel-model TARGET_SDACCEL_MODEL]
[--target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB]
[--target-sdaccel-tag TARGET_SDACCEL_TAG]
[--target-sdaccel-device TARGET_SDACCEL_DEVICE]
[--target-sdaccel-keys TARGET_SDACCEL_KEYS]
[--target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE]
[--target-composite-libs TARGET_COMPOSITE_LIBS]
[--target-composite-devices TARGET_COMPOSITE_DEVICES]
[--target-composite-model TARGET_COMPOSITE_MODEL]
[--target-composite-tag TARGET_COMPOSITE_TAG]
[--target-composite-device TARGET_COMPOSITE_DEVICE]
[--target-composite-keys TARGET_COMPOSITE_KEYS]
[--target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE]
[--target-stackvm-libs TARGET_STACKVM_LIBS]
[--target-stackvm-model TARGET_STACKVM_MODEL]
[--target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB]
[--target-stackvm-tag TARGET_STACKVM_TAG]
[--target-stackvm-device TARGET_STACKVM_DEVICE]
[--target-stackvm-keys TARGET_STACKVM_KEYS]
[--target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE]
[--target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS]
[--target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL]
[--target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB]
[--target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG]
[--target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE]
[--target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS]
[--target-c-unpacked-api TARGET_C_UNPACKED_API]
[--target-c-from_device TARGET_C_FROM_DEVICE]
[--target-c-libs TARGET_C_LIBS]
[--target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT]
[--target-c-executor TARGET_C_EXECUTOR]
[--target-c-link-params TARGET_C_LINK_PARAMS]
[--target-c-model TARGET_C_MODEL]
[--target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT]
[--target-c-system-lib TARGET_C_SYSTEM_LIB]
[--target-c-tag TARGET_C_TAG]
[--target-c-interface-api TARGET_C_INTERFACE_API]
[--target-c-mcpu TARGET_C_MCPU]
[--target-c-device TARGET_C_DEVICE]
[--target-c-runtime TARGET_C_RUNTIME]
[--target-c-keys TARGET_C_KEYS]
[--target-c-march TARGET_C_MARCH]
[--target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE]
[--target-hexagon-libs TARGET_HEXAGON_LIBS]
[--target-hexagon-mattr TARGET_HEXAGON_MATTR]
[--target-hexagon-model TARGET_HEXAGON_MODEL]
[--target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS]
[--target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE]
[--target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB]
[--target-hexagon-mcpu TARGET_HEXAGON_MCPU]
[--target-hexagon-device TARGET_HEXAGON_DEVICE]
[--target-hexagon-tag TARGET_HEXAGON_TAG]
[--target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS]
[--target-hexagon-keys TARGET_HEXAGON_KEYS]
[--target-host TARGET_HOST]
[--timeout TIMEOUT]
[--trials TRIALS]
[--tuning-records PATH]
[--desired-layout {NCHW,NHWC}]
[--enable-autoscheduler]
[--cache-line-bytes CACHE_LINE_BYTES]
[--num-cores NUM_CORES]
[--vector-unit-bytes VECTOR_UNIT_BYTES]
[--max-shared-memory-per-block MAX_SHARED_MEMORY_PER_BLOCK]
[--max-local-memory-per-block MAX_LOCAL_MEMORY_PER_BLOCK]
[--max-threads-per-block MAX_THREADS_PER_BLOCK]
[--max-vthread-extent MAX_VTHREAD_EXTENT]
[--warp-size WARP_SIZE]
[--include-simple-tasks]
[--log-estimated-latency]
[--tuner {ga,gridsearch,random,xgb,xgb_knob,xgb-rank}]
[--input-shapes INPUT_SHAPES]
FILE

positional arguments:
    FILE
        path to the input model file

optional arguments:
    -h, --help
        show this help message and exit
    --early-stopping EARLY_STOPPING
        minimum number of trials before early stopping
    --min-repeat-ms MIN_REPEAT_MS
        minimum time to run each trial, in milliseconds.
        Defaults to 0 on x86 and 1000 on all other targets
    --model-format {keras,onnx,pb,tflite,pytorch,paddle}
        specify input model format
    --number NUMBER
        number of runs a single repeat is made of.
        The final number of tuning executions is: (1 + number * repeat)
    -o OUTPUT, --output OUTPUT
        output file to store the tuning records for the tuning process
    --parallel PARALLEL
        the maximum number of parallel devices to use when tuning
    --repeat REPEAT
        how many times to repeat each measurement
    --rpc-key RPC_KEY
        the RPC tracker key of the target device.
        Required when --rpc-tracker is provided.
    --rpc-tracker RPC_TRACKER
        hostname (required) and port (optional, defaults to 9090) of the RPC tracker,
        e.g. '192.168.0.100:9999'
    --target TARGET
        compilation target as plain string, inline JSON or path to a JSON file
    --target-host TARGET_HOST
        the host compilation target, defaults to 'llvm'
    --timeout TIMEOUT
        compilation timeout, in seconds
    --trials TRIALS
        the maximum number of tuning trials to perform
    --tuning-records PATH
        path to an auto-tuning log file by AutoTVM.
    --desired-layout {NCHW,NHWC}
        change the data layout of the whole graph
    --enable-autoscheduler
        enable tuning the graph through the autoscheduler
    --input-shapes INPUT_SHAPES
        specify non-generic shapes for model to run,
        format is "input_name:[dim1,dim2,...,dimn] input_name2:[dim1,dim2]"

    target example_target_hook:
    --target-example_target_hook-from_device TARGET_EXAMPLE_TARGET_HOOK_FROM_DEVICE
        target example_target_hook from_device
    --target-example_target_hook-libs TARGET_EXAMPLE_TARGET_HOOK_LIBS
        target example_target_hook libs options
    --target-example_target_hook-model TARGET_EXAMPLE_TARGET_HOOK_MODEL
        target example_target_hook model string
    --target-example_target_hook-tag TARGET_EXAMPLE_TARGET_HOOK_TAG
        target example_target_hook tag string
    --target-example_target_hook-device TARGET_EXAMPLE_TARGET_HOOK_DEVICE
        target example_target_hook device string
    --target-example_target_hook-keys TARGET_EXAMPLE_TARGET_HOOK_KEYS
        target example_target_hook keys options

    target ext_dev:
    --target-ext_dev-from_device TARGET_EXT_DEV_FROM_DEVICE
        target ext_dev from_device
    --target-ext_dev-libs TARGET_EXT_DEV_LIBS
        target ext_dev libs options
    --target-ext_dev-model TARGET_EXT_DEV_MODEL
        target ext_dev model string
    --target-ext_dev-system-lib TARGET_EXT_DEV_SYSTEM_LIB
        target ext_dev system-lib
    --target-ext_dev-tag TARGET_EXT_DEV_TAG
        target ext_dev tag string
    --target-ext_dev-device TARGET_EXT_DEV_DEVICE
        target ext_dev device string
    --target-ext_dev-keys TARGET_EXT_DEV_KEYS
        target ext_dev keys options

    target llvm:
    --target-llvm-fast-math TARGET_LLVM_FAST_MATH
        target llvm fast-math
    --target-llvm-opt-level TARGET_LLVM_OPT_LEVEL
        target llvm opt-level
    --target-llvm-unpacked-api TARGET_LLVM_UNPACKED_API
        target llvm unpacked-api
    --target-llvm-from_device TARGET_LLVM_FROM_DEVICE
        target llvm from_device
    --target-llvm-fast-math-ninf TARGET_LLVM_FAST_MATH_NINF
        target llvm fast-math-ninf
    --target-llvm-mattr TARGET_LLVM_MATTR
        target llvm mattr options
    --target-llvm-num-cores TARGET_LLVM_NUM_CORES
        target llvm num-cores
    --target-llvm-libs TARGET_LLVM_LIBS
        target llvm libs options
    --target-llvm-fast-math-nsz TARGET_LLVM_FAST_MATH_NSZ
        target llvm fast-math-nsz
    --target-llvm-link-params TARGET_LLVM_LINK_PARAMS
        target llvm link-params
    --target-llvm-interface-api TARGET_LLVM_INTERFACE_API
        target llvm interface-api string
    --target-llvm-fast-math-contract TARGET_LLVM_FAST_MATH_CONTRACT
        target llvm fast-math-contract
    --target-llvm-system-lib TARGET_LLVM_SYSTEM_LIB
        target llvm system-lib
    --target-llvm-tag TARGET_LLVM_TAG
        target llvm tag string
    --target-llvm-mtriple TARGET_LLVM_MTRIPLE
        target llvm mtriple string
    --target-llvm-model TARGET_LLVM_MODEL
        target llvm model string
    --target-llvm-mfloat-abi TARGET_LLVM_MFLOAT_ABI
        target llvm mfloat-abi string
    --target-llvm-mcpu TARGET_LLVM_MCPU
        target llvm mcpu string
    --target-llvm-device TARGET_LLVM_DEVICE
        target llvm device string
    --target-llvm-runtime TARGET_LLVM_RUNTIME
        target llvm runtime string
    --target-llvm-fast-math-arcp TARGET_LLVM_FAST_MATH_ARCP
        target llvm fast-math-arcp
    --target-llvm-fast-math-reassoc TARGET_LLVM_FAST_MATH_REASSOC
        target llvm fast-math-reassoc
    --target-llvm-mabi TARGET_LLVM_MABI
        target llvm mabi string
    --target-llvm-keys TARGET_LLVM_KEYS
        target llvm keys options
    --target-llvm-fast-math-nnan TARGET_LLVM_FAST_MATH_NNAN
        target llvm fast-math-nnan

    target hybrid:
    --target-hybrid-from_device TARGET_HYBRID_FROM_DEVICE
        target hybrid from_device
    --target-hybrid-libs TARGET_HYBRID_LIBS
        target hybrid libs options
    --target-hybrid-model TARGET_HYBRID_MODEL
        target hybrid model string
    --target-hybrid-system-lib TARGET_HYBRID_SYSTEM_LIB
        target hybrid system-lib
    --target-hybrid-tag TARGET_HYBRID_TAG
        target hybrid tag string
    --target-hybrid-device TARGET_HYBRID_DEVICE
        target hybrid device string
    --target-hybrid-keys TARGET_HYBRID_KEYS
        target hybrid keys options

    target aocl:
    --target-aocl-from_device TARGET_AOCL_FROM_DEVICE
        target aocl from_device
    --target-aocl-libs TARGET_AOCL_LIBS
        target aocl libs options
    --target-aocl-model TARGET_AOCL_MODEL
        target aocl model string
    --target-aocl-system-lib TARGET_AOCL_SYSTEM_LIB
        target aocl system-lib
    --target-aocl-tag TARGET_AOCL_TAG
        target aocl tag string
    --target-aocl-device TARGET_AOCL_DEVICE
        target aocl device string
    --target-aocl-keys TARGET_AOCL_KEYS
        target aocl keys options

    target nvptx:
    --target-nvptx-max_num_threads TARGET_NVPTX_MAX_NUM_THREADS
        target nvptx max_num_threads
    --target-nvptx-thread_warp_size TARGET_NVPTX_THREAD_WARP_SIZE
        target nvptx thread_warp_size
    --target-nvptx-from_device TARGET_NVPTX_FROM_DEVICE
        target nvptx from_device
    --target-nvptx-libs TARGET_NVPTX_LIBS
        target nvptx libs options
    --target-nvptx-model TARGET_NVPTX_MODEL
        target nvptx model string
    --target-nvptx-system-lib TARGET_NVPTX_SYSTEM_LIB
        target nvptx system-lib
    --target-nvptx-mtriple TARGET_NVPTX_MTRIPLE
        target nvptx mtriple string
    --target-nvptx-tag TARGET_NVPTX_TAG
        target nvptx tag string
    --target-nvptx-mcpu TARGET_NVPTX_MCPU
        target nvptx mcpu string
    --target-nvptx-device TARGET_NVPTX_DEVICE
        target nvptx device string
    --target-nvptx-keys TARGET_NVPTX_KEYS
        target nvptx keys options

    target opencl:
    --target-opencl-max_num_threads TARGET_OPENCL_MAX_NUM_THREADS
        target opencl max_num_threads
    --target-opencl-thread_warp_size TARGET_OPENCL_THREAD_WARP_SIZE
        target opencl thread_warp_size
    --target-opencl-from_device TARGET_OPENCL_FROM_DEVICE
        target opencl from_device
    --target-opencl-libs TARGET_OPENCL_LIBS
        target opencl libs options
    --target-opencl-model TARGET_OPENCL_MODEL
        target opencl model string
    --target-opencl-system-lib TARGET_OPENCL_SYSTEM_LIB
        target opencl system-lib
    --target-opencl-tag TARGET_OPENCL_TAG
        target opencl tag string
    --target-opencl-device TARGET_OPENCL_DEVICE
        target opencl device string
    --target-opencl-keys TARGET_OPENCL_KEYS
        target opencl keys options

    target metal:
    --target-metal-max_num_threads TARGET_METAL_MAX_NUM_THREADS
        target metal max_num_threads
    --target-metal-thread_warp_size TARGET_METAL_THREAD_WARP_SIZE
        target metal thread_warp_size
    --target-metal-from_device TARGET_METAL_FROM_DEVICE
        target metal from_device
    --target-metal-libs TARGET_METAL_LIBS
        target metal libs options
    --target-metal-keys TARGET_METAL_KEYS
        target metal keys options
    --target-metal-model TARGET_METAL_MODEL
        target metal model string
    --target-metal-system-lib TARGET_METAL_SYSTEM_LIB
        target metal system-lib
    --target-metal-tag TARGET_METAL_TAG
        target metal tag string
    --target-metal-device TARGET_METAL_DEVICE
        target metal device string
    --target-metal-max_function_args TARGET_METAL_MAX_FUNCTION_ARGS
        target metal max_function_args

    target webgpu:
    --target-webgpu-max_num_threads TARGET_WEBGPU_MAX_NUM_THREADS
        target webgpu max_num_threads
    --target-webgpu-from_device TARGET_WEBGPU_FROM_DEVICE
        target webgpu from_device
    --target-webgpu-libs TARGET_WEBGPU_LIBS
        target webgpu libs options
    --target-webgpu-model TARGET_WEBGPU_MODEL
        target webgpu model string
    --target-webgpu-system-lib TARGET_WEBGPU_SYSTEM_LIB
        target webgpu system-lib
    --target-webgpu-tag TARGET_WEBGPU_TAG
        target webgpu tag string
    --target-webgpu-device TARGET_WEBGPU_DEVICE
        target webgpu device string
    --target-webgpu-keys TARGET_WEBGPU_KEYS
        target webgpu keys options

    target rocm:
    --target-rocm-max_num_threads TARGET_ROCM_MAX_NUM_THREADS
        target rocm max_num_threads
    --target-rocm-thread_warp_size TARGET_ROCM_THREAD_WARP_SIZE
        target rocm thread_warp_size
    --target-rocm-from_device TARGET_ROCM_FROM_DEVICE
        target rocm from_device
    --target-rocm-libs TARGET_ROCM_LIBS
        target rocm libs options
    --target-rocm-model TARGET_ROCM_MODEL
        target rocm model string
    --target-rocm-system-lib TARGET_ROCM_SYSTEM_LIB
        target rocm system-lib
    --target-rocm-mtriple TARGET_ROCM_MTRIPLE
        target rocm mtriple string
    --target-rocm-tag TARGET_ROCM_TAG
        target rocm tag string
    --target-rocm-mcpu TARGET_ROCM_MCPU
        target rocm mcpu string
    --target-rocm-device TARGET_ROCM_DEVICE
        target rocm device string
    --target-rocm-keys TARGET_ROCM_KEYS
        target rocm keys options

    target vulkan:
    --target-vulkan-max_num_threads TARGET_VULKAN_MAX_NUM_THREADS
        target vulkan max_num_threads
    --target-vulkan-thread_warp_size TARGET_VULKAN_THREAD_WARP_SIZE
        target vulkan thread_warp_size
    --target-vulkan-max_block_size_z TARGET_VULKAN_MAX_BLOCK_SIZE_Z
        target vulkan max_block_size_z
    --target-vulkan-max_per_stage_descriptor_storage_buffer TARGET_VULKAN_MAX_PER_STAGE_DESCRIPTOR_STORAGE_BUFFER
        target vulkan max_per_stage_descriptor_storage_buffer
    --target-vulkan-driver_version TARGET_VULKAN_DRIVER_VERSION
        target vulkan driver_version
    --target-vulkan-from_device TARGET_VULKAN_FROM_DEVICE
        target vulkan from_device
    --target-vulkan-supports_16bit_buffer TARGET_VULKAN_SUPPORTS_16BIT_BUFFER
        target vulkan supports_16bit_buffer
    --target-vulkan-libs TARGET_VULKAN_LIBS
        target vulkan libs options
    --target-vulkan-supported_subgroup_operations TARGET_VULKAN_SUPPORTED_SUBGROUP_OPERATIONS
        target vulkan supported_subgroup_operations
    --target-vulkan-supports_dedicated_allocation TARGET_VULKAN_SUPPORTS_DEDICATED_ALLOCATION
        target vulkan supports_dedicated_allocation
    --target-vulkan-max_storage_buffer_range TARGET_VULKAN_MAX_STORAGE_BUFFER_RANGE
        target vulkan max_storage_buffer_range
    --target-vulkan-max_push_constants_size TARGET_VULKAN_MAX_PUSH_CONSTANTS_SIZE
        target vulkan max_push_constants_size
    --target-vulkan-supports_push_descriptor TARGET_VULKAN_SUPPORTS_PUSH_DESCRIPTOR
        target vulkan supports_push_descriptor
    --target-vulkan-supports_int64 TARGET_VULKAN_SUPPORTS_INT64
        target vulkan supports_int64
    --target-vulkan-supports_float32 TARGET_VULKAN_SUPPORTS_FLOAT32
        target vulkan supports_float32
    --target-vulkan-model TARGET_VULKAN_MODEL
        target vulkan model string
    --target-vulkan-max_block_size_x TARGET_VULKAN_MAX_BLOCK_SIZE_X
        target vulkan max_block_size_x
    --target-vulkan-system-lib TARGET_VULKAN_SYSTEM_LIB
        target vulkan system-lib
    --target-vulkan-max_block_size_y TARGET_VULKAN_MAX_BLOCK_SIZE_Y
        target vulkan max_block_size_y
    --target-vulkan-tag TARGET_VULKAN_TAG
        target vulkan tag string
    --target-vulkan-supports_int8 TARGET_VULKAN_SUPPORTS_INT8
        target vulkan supports_int8
    --target-vulkan-max_spirv_version TARGET_VULKAN_MAX_SPIRV_VERSION
        target vulkan max_spirv_version
    --target-vulkan-vulkan_api_version TARGET_VULKAN_VULKAN_API_VERSION
        target vulkan vulkan_api_version
    --target-vulkan-supports_8bit_buffer TARGET_VULKAN_SUPPORTS_8BIT_BUFFER
        target vulkan supports_8bit_buffer
    --target-vulkan-device_type TARGET_VULKAN_DEVICE_TYPE
        target vulkan device_type string
    --target-vulkan-supports_int32 TARGET_VULKAN_SUPPORTS_INT32
        target vulkan supports_int32
    --target-vulkan-device TARGET_VULKAN_DEVICE
        target vulkan device string
    --target-vulkan-driver_name TARGET_VULKAN_DRIVER_NAME
        target vulkan driver_name string
    --target-vulkan-supports_float16 TARGET_VULKAN_SUPPORTS_FLOAT16
        target vulkan supports_float16
    --target-vulkan-supports_storage_buffer_storage_class TARGET_VULKAN_SUPPORTS_STORAGE_BUFFER_STORAGE_CLASS
        target vulkan supports_storage_buffer_storage_class
    --target-vulkan-supports_float64 TARGET_VULKAN_SUPPORTS_FLOAT64
        target vulkan supports_float64
    --target-vulkan-max_uniform_buffer_range TARGET_VULKAN_MAX_UNIFORM_BUFFER_RANGE
        target vulkan max_uniform_buffer_range
    --target-vulkan-device_name TARGET_VULKAN_DEVICE_NAME
        target vulkan device_name string
    --target-vulkan-keys TARGET_VULKAN_KEYS
        target vulkan keys options
    --target-vulkan-max_shared_memory_per_block TARGET_VULKAN_MAX_SHARED_MEMORY_PER_BLOCK
        target vulkan max_shared_memory_per_block
    --target-vulkan-supports_int16 TARGET_VULKAN_SUPPORTS_INT16
        target vulkan supports_int16

    target cuda:
    --target-cuda-max_num_threads TARGET_CUDA_MAX_NUM_THREADS
        target cuda max_num_threads
    --target-cuda-thread_warp_size TARGET_CUDA_THREAD_WARP_SIZE
        target cuda thread_warp_size
    --target-cuda-from_device TARGET_CUDA_FROM_DEVICE
        target cuda from_device
    --target-cuda-arch TARGET_CUDA_ARCH
        target cuda arch string
    --target-cuda-libs TARGET_CUDA_LIBS
        target cuda libs options
    --target-cuda-shared_memory_per_block TARGET_CUDA_SHARED_MEMORY_PER_BLOCK
        target cuda shared_memory_per_block
    --target-cuda-model TARGET_CUDA_MODEL
        target cuda model string
    --target-cuda-system-lib TARGET_CUDA_SYSTEM_LIB
        target cuda system-lib
    --target-cuda-tag TARGET_CUDA_TAG
        target cuda tag string
    --target-cuda-device TARGET_CUDA_DEVICE
        target cuda device string
    --target-cuda-mcpu TARGET_CUDA_MCPU
        target cuda mcpu string
    --target-cuda-max_threads_per_block TARGET_CUDA_MAX_THREADS_PER_BLOCK
        target cuda max_threads_per_block
    --target-cuda-registers_per_block TARGET_CUDA_REGISTERS_PER_BLOCK
        target cuda registers_per_block
    --target-cuda-keys TARGET_CUDA_KEYS
        target cuda keys options

    target sdaccel:
    --target-sdaccel-from_device TARGET_SDACCEL_FROM_DEVICE
        target sdaccel from_device
    --target-sdaccel-libs TARGET_SDACCEL_LIBS
        target sdaccel libs options
    --target-sdaccel-model TARGET_SDACCEL_MODEL
        target sdaccel model string
    --target-sdaccel-system-lib TARGET_SDACCEL_SYSTEM_LIB
        target sdaccel system-lib
    --target-sdaccel-tag TARGET_SDACCEL_TAG
        target sdaccel tag string
    --target-sdaccel-device TARGET_SDACCEL_DEVICE
        target sdaccel device string
    --target-sdaccel-keys TARGET_SDACCEL_KEYS
        target sdaccel keys options

    target composite:
    --target-composite-from_device TARGET_COMPOSITE_FROM_DEVICE
        target composite from_device
    --target-composite-libs TARGET_COMPOSITE_LIBS
        target composite libs options
    --target-composite-devices TARGET_COMPOSITE_DEVICES
        target composite devices options
    --target-composite-model TARGET_COMPOSITE_MODEL
        target composite model string
    --target-composite-tag TARGET_COMPOSITE_TAG
        target composite tag string
    --target-composite-device TARGET_COMPOSITE_DEVICE
        target composite device string
    --target-composite-keys TARGET_COMPOSITE_KEYS
        target composite keys options

    target stackvm:
    --target-stackvm-from_device TARGET_STACKVM_FROM_DEVICE
        target stackvm from_device
    --target-stackvm-libs TARGET_STACKVM_LIBS
        target stackvm libs options
    --target-stackvm-model TARGET_STACKVM_MODEL
        target stackvm model string
    --target-stackvm-system-lib TARGET_STACKVM_SYSTEM_LIB
        target stackvm system-lib
    --target-stackvm-tag TARGET_STACKVM_TAG
        target stackvm tag string
    --target-stackvm-device TARGET_STACKVM_DEVICE
        target stackvm device string
    --target-stackvm-keys TARGET_STACKVM_KEYS
        target stackvm keys options

    target aocl_sw_emu:
    --target-aocl_sw_emu-from_device TARGET_AOCL_SW_EMU_FROM_DEVICE
        target aocl_sw_emu from_device
    --target-aocl_sw_emu-libs TARGET_AOCL_SW_EMU_LIBS
        target aocl_sw_emu libs options
    --target-aocl_sw_emu-model TARGET_AOCL_SW_EMU_MODEL
        target aocl_sw_emu model string
    --target-aocl_sw_emu-system-lib TARGET_AOCL_SW_EMU_SYSTEM_LIB
        target aocl_sw_emu system-lib
    --target-aocl_sw_emu-tag TARGET_AOCL_SW_EMU_TAG
        target aocl_sw_emu tag string
    --target-aocl_sw_emu-device TARGET_AOCL_SW_EMU_DEVICE
        target aocl_sw_emu device string
    --target-aocl_sw_emu-keys TARGET_AOCL_SW_EMU_KEYS
        target aocl_sw_emu keys options

    target c:
    --target-c-unpacked-api TARGET_C_UNPACKED_API
        target c unpacked-api
    --target-c-from_device TARGET_C_FROM_DEVICE
        target c from_device
    --target-c-libs TARGET_C_LIBS
        target c libs options
    --target-c-constants-byte-alignment TARGET_C_CONSTANTS_BYTE_ALIGNMENT
        target c constants-byte-alignment
    --target-c-executor TARGET_C_EXECUTOR
        target c executor string
    --target-c-link-params TARGET_C_LINK_PARAMS
        target c link-params
    --target-c-model TARGET_C_MODEL
        target c model string
    --target-c-workspace-byte-alignment TARGET_C_WORKSPACE_BYTE_ALIGNMENT
        target c workspace-byte-alignment
    --target-c-system-lib TARGET_C_SYSTEM_LIB
        target c system-lib
    --target-c-tag TARGET_C_TAG
        target c tag string
    --target-c-interface-api TARGET_C_INTERFACE_API
        target c interface-api string
    --target-c-mcpu TARGET_C_MCPU
        target c mcpu string
    --target-c-device TARGET_C_DEVICE
        target c device string
    --target-c-runtime TARGET_C_RUNTIME
        target c runtime string
    --target-c-keys TARGET_C_KEYS
        target c keys options
    --target-c-march TARGET_C_MARCH
        target c march string

    target hexagon:
    --target-hexagon-from_device TARGET_HEXAGON_FROM_DEVICE
        target hexagon from_device
    --target-hexagon-libs TARGET_HEXAGON_LIBS
        target hexagon libs options
    --target-hexagon-mattr TARGET_HEXAGON_MATTR
        target hexagon mattr options
    --target-hexagon-model TARGET_HEXAGON_MODEL
        target hexagon model string
    --target-hexagon-llvm-options TARGET_HEXAGON_LLVM_OPTIONS
        target hexagon llvm-options options
    --target-hexagon-mtriple TARGET_HEXAGON_MTRIPLE
        target hexagon mtriple string
    --target-hexagon-system-lib TARGET_HEXAGON_SYSTEM_LIB
        target hexagon system-lib
    --target-hexagon-mcpu TARGET_HEXAGON_MCPU
        target hexagon mcpu string
    --target-hexagon-device TARGET_HEXAGON_DEVICE
        target hexagon device string
    --target-hexagon-tag TARGET_HEXAGON_TAG
        target hexagon tag string
    --target-hexagon-link-params TARGET_HEXAGON_LINK_PARAMS
        target hexagon link-params
    --target-hexagon-keys TARGET_HEXAGON_KEYS
        target hexagon keys options

    Autoscheduler options:
    Autoscheduler options, used when --enable-autoscheduler is provided

    --cache-line-bytes CACHE_LINE_BYTES
        the size of cache line in bytes.
        If not specified, it will be autoset for the current machine.
    --num-cores NUM_CORES
        the number of device cores.
        If not specified, it will be autoset for the current machine.
    --vector-unit-bytes VECTOR_UNIT_BYTES
        the width of vector units in bytes.
        If not specified, it will be autoset for the current machine.
    --max-shared-memory-per-block MAX_SHARED_MEMORY_PER_BLOCK
        the max shared memory per block in bytes.
        If not specified, it will be autoset for the current machine.
    --max-local-memory-per-block MAX_LOCAL_MEMORY_PER_BLOCK
        the max local memory per block in bytes.
        If not specified, it will be autoset for the current machine.
    --max-threads-per-block MAX_THREADS_PER_BLOCK
        the max number of threads per block.
        If not specified, it will be autoset for the current machine.
    --max-vthread-extent MAX_VTHREAD_EXTENT
        the max vthread extent.
        If not specified, it will be autoset for the current machine.
    --warp-size WARP_SIZE
        the thread numbers of a warp.
        If not specified, it will be autoset for the current machine.
    --include-simple-tasks
        whether to extract simple tasks that do not include complicated ops
    --log-estimated-latency
        whether to log the estimated latency to the file after tuning a task

    autotvm options:
    autotvm options, used when the autoscheduler is not enabled

    --tuner {ga,gridsearch,random,xgb,xgb_knob,xgb-rank}
        type of tuner to use when tuning with autotvm.

6-2. -march=x86 -mcpu=xxx

$ llc-14 -march=x86 -mattr=help

CPU list

Available CPUs for this target:

  alderlake      - Select the alderlake processor.
  amdfam10       - Select the amdfam10 processor.
  athlon         - Select the athlon processor.
  athlon-4       - Select the athlon-4 processor.
  athlon-fx      - Select the athlon-fx processor.
  athlon-mp      - Select the athlon-mp processor.
  athlon-tbird   - Select the athlon-tbird processor.
  athlon-xp      - Select the athlon-xp processor.
  athlon64       - Select the athlon64 processor.
  athlon64-sse3  - Select the athlon64-sse3 processor.
  atom           - Select the atom processor.
  barcelona      - Select the barcelona processor.
  bdver1         - Select the bdver1 processor.
  bdver2         - Select the bdver2 processor.
  bdver3         - Select the bdver3 processor.
  bdver4         - Select the bdver4 processor.
  bonnell        - Select the bonnell processor.
  broadwell      - Select the broadwell processor.
  btver1         - Select the btver1 processor.
  btver2         - Select the btver2 processor.
  c3             - Select the c3 processor.
  c3-2           - Select the c3-2 processor.
  cannonlake     - Select the cannonlake processor.
  cascadelake    - Select the cascadelake processor.
  cooperlake     - Select the cooperlake processor.
  core-avx-i     - Select the core-avx-i processor.
  core-avx2      - Select the core-avx2 processor.
  core2          - Select the core2 processor.
  corei7         - Select the corei7 processor.
  corei7-avx     - Select the corei7-avx processor.
  generic        - Select the generic processor.
  geode          - Select the geode processor.
  goldmont       - Select the goldmont processor.
  goldmont-plus  - Select the goldmont-plus processor.
  haswell        - Select the haswell processor.
  i386           - Select the i386 processor.
  i486           - Select the i486 processor.
  i586           - Select the i586 processor.
  i686           - Select the i686 processor.
  icelake-client - Select the icelake-client processor.
  icelake-server - Select the icelake-server processor.
  ivybridge      - Select the ivybridge processor.
  k6             - Select the k6 processor.
  k6-2           - Select the k6-2 processor.
  k6-3           - Select the k6-3 processor.
  k8             - Select the k8 processor.
  k8-sse3        - Select the k8-sse3 processor.
  knl            - Select the knl processor.
  knm            - Select the knm processor.
  lakemont       - Select the lakemont processor.
  nehalem        - Select the nehalem processor.
  nocona         - Select the nocona processor.
  opteron        - Select the opteron processor.
  opteron-sse3   - Select the opteron-sse3 processor.
  penryn         - Select the penryn processor.
  pentium        - Select the pentium processor.
  pentium-m      - Select the pentium-m processor.
  pentium-mmx    - Select the pentium-mmx processor.
  pentium2       - Select the pentium2 processor.
  pentium3       - Select the pentium3 processor.
  pentium3m      - Select the pentium3m processor.
  pentium4       - Select the pentium4 processor.
  pentium4m      - Select the pentium4m processor.
  pentiumpro     - Select the pentiumpro processor.
  prescott       - Select the prescott processor.
  rocketlake     - Select the rocketlake processor.
  sandybridge    - Select the sandybridge processor.
  sapphirerapids - Select the sapphirerapids processor.
  silvermont     - Select the silvermont processor.
  skx            - Select the skx processor.
  skylake        - Select the skylake processor.
  skylake-avx512 - Select the skylake-avx512 processor.
  slm            - Select the slm processor.
  tigerlake      - Select the tigerlake processor.
  tremont        - Select the tremont processor.
  westmere       - Select the westmere processor.
  winchip-c6     - Select the winchip-c6 processor.
  winchip2       - Select the winchip2 processor.
  x86-64         - Select the x86-64 processor.
  x86-64-v2      - Select the x86-64-v2 processor.
  x86-64-v3      - Select the x86-64-v3 processor.
  x86-64-v4      - Select the x86-64-v4 processor.
  yonah          - Select the yonah processor.
  znver1         - Select the znver1 processor.
  znver2         - Select the znver2 processor.
  znver3         - Select the znver3 processor.

6-3. -march=aarch64 -mcpu=xxx

$ llc-14 -march=aarch64 -mattr=help

CPU list

Available CPUs for this target:

  a64fx           - Select the a64fx processor.
  apple-a10       - Select the apple-a10 processor.
  apple-a11       - Select the apple-a11 processor.
  apple-a12       - Select the apple-a12 processor.
  apple-a13       - Select the apple-a13 processor.
  apple-a14       - Select the apple-a14 processor.
  apple-a7        - Select the apple-a7 processor.
  apple-a8        - Select the apple-a8 processor.
  apple-a9        - Select the apple-a9 processor.
  apple-latest    - Select the apple-latest processor.
  apple-m1        - Select the apple-m1 processor.
  apple-s4        - Select the apple-s4 processor.
  apple-s5        - Select the apple-s5 processor.
  carmel          - Select the carmel processor.
  cortex-a34      - Select the cortex-a34 processor.
  cortex-a35      - Select the cortex-a35 processor.
  cortex-a510     - Select the cortex-a510 processor.
  cortex-a53      - Select the cortex-a53 processor.
  cortex-a55      - Select the cortex-a55 processor.
  cortex-a57      - Select the cortex-a57 processor.
  cortex-a65      - Select the cortex-a65 processor.
  cortex-a65ae    - Select the cortex-a65ae processor.
  cortex-a710     - Select the cortex-a710 processor.
  cortex-a72      - Select the cortex-a72 processor.
  cortex-a73      - Select the cortex-a73 processor.
  cortex-a75      - Select the cortex-a75 processor.
  cortex-a76      - Select the cortex-a76 processor.
  cortex-a76ae    - Select the cortex-a76ae processor.
  cortex-a77      - Select the cortex-a77 processor.
  cortex-a78      - Select the cortex-a78 processor.
  cortex-a78c     - Select the cortex-a78c processor.
  cortex-r82      - Select the cortex-r82 processor.
  cortex-x1       - Select the cortex-x1 processor.
  cortex-x1c      - Select the cortex-x1c processor.
  cortex-x2       - Select the cortex-x2 processor.
  cyclone         - Select the cyclone processor.
  exynos-m3       - Select the exynos-m3 processor.
  exynos-m4       - Select the exynos-m4 processor.
  exynos-m5       - Select the exynos-m5 processor.
  falkor          - Select the falkor processor.
  generic         - Select the generic processor.
  kryo            - Select the kryo processor.
  neoverse-512tvb - Select the neoverse-512tvb processor.
  neoverse-e1     - Select the neoverse-e1 processor.
  neoverse-n1     - Select the neoverse-n1 processor.
  neoverse-n2     - Select the neoverse-n2 processor.
  neoverse-v1     - Select the neoverse-v1 processor.
  saphira         - Select the saphira processor.
  thunderx        - Select the thunderx processor.
  thunderx2t99    - Select the thunderx2t99 processor.
  thunderx3t110   - Select the thunderx3t110 processor.
  thunderxt81     - Select the thunderxt81 processor.
  thunderxt83     - Select the thunderxt83 processor.
  thunderxt88     - Select the thunderxt88 processor.
  tsv110          - Select the tsv110 processor.

$ sudo pip3 install xgboost
$ python -m tvm.driver.tvmc tune \
--target "llvm -mcpu=x86-64-v3" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx

$ python -m tvm.driver.tvmc compile \
--target "llvm" \
--tuning-records resnet50-v2-7-autotuner_records.json  \
--output resnet50-v2-7-tvm_autotuned.tar \
resnet50-v2-7.onnx

$ python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm_autotuned.tar

$ python postprocess.py

class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

Inference performance of tuned models

$ python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz  \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm_autotuned.tar

Execution time summary:
mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
29.6162      29.6069      33.3455      28.5231       0.6250

Inference performance of untuned models

$ python -m tvm.driver.tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz  \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm.tar

Execution time summary:
mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
36.8816      36.5966      43.1287      35.5101       1.1949

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
kitten.jpg		kitten.jpg
postprocess.py		postprocess.py
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tvm-build

1. Pull

2. Build

3. Run

4. TVM Summary

5. Environment

6. Tutorial

6-1. tvmc

6-2. -march=x86 -mcpu=xxx

6-3. -march=aarch64 -mcpu=xxx

About

Releases

Packages

Languages

License

PINTO0309/tvm-build

Folders and files

Latest commit

History

Repository files navigation

tvm-build

1. Pull

2. Build

3. Run

4. TVM Summary

5. Environment

6. Tutorial

6-1. tvmc

6-2. -march=x86 -mcpu=xxx

6-3. -march=aarch64 -mcpu=xxx

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages