MicroTVM is an effort to run TVM on bare-metal microcontrollers. You can read more about the current design in the original MicroTVM RFC. This repo shows you how to run CIFAR10-CNN on the host machine and on an STM Nucleo-F746ZG development board.
- Linux machine (OS X is also unofficially supported, and will be officially supported in the future)
- STM Nucleo-F746ZG development board
- Autotuning can be sped up by adding more of these development boards.
- micro USB cable
- A recent version of Linux (TVM tests against Ubuntu 18.04).
- Python 3.6+
- Clone this repository (use
git clone --recursive
to clone submodules). - Install TVM.
- NOTE: Ensure you enable MicroTVM by setting
set(USE_MICRO ON)
inbuild/config.cmake
. - NOTE: Ensure you have
0884659eb8c5fe51cc4cac9f2f8b6400f47fdee6
plus PR 5648.
-
(maybe optional, required to repro performance) Install GNU ARM Embedded Toolchain 9.2.1.
-
Build OpenOCD. Here we have chosen a specific commit that works with the Nucleo board, but
$ tools/patch-openocd.sh # If using clang, fix a compiler error.
$ cd 3rdparty/openocd
# First, install dependencies listed in the README for your platform.
$ ./bootstrap
$ ./configure --prefix=$(pwd)/prefix
$ make && make install
-
Install prerequisites:
$ apt-get install gcc-arm-none-eabi
-
Configure hardware and external binaries.
Copy
env-config.json.template
toenv-config.json
and modify the values called out in that file. Be sure you remove the# comments
, which aren't valid JSON. -
Setup virtualenv. Use
requirements.txt
andconstraints.txt
.$ python3 -mvenv _venv $ . _venv/bin/activate $ pip install -r requirements.txt -c constraints.txt $ export PYTHONPATH=$(pwd)/python:$PYTHONPATH
-
Install mBED CLI.
You need to do this once per dev board.
$ mbed import https://github.com/areusch/utvm-mbed-runtime utvm-mbed-runtime
$ cd utvm-mbed-runtime
$ mbed compile -m NUCLEO_F746ZG -t GCC_ARM --flash
Image: ./BUILD/NUCLEO_F746ZG/GCC_ARM/utvm-mbed-runtime.bin
--- Terminal on /dev/ttyACM0 - 9600,8,N,1 ---
Once you see --- Terminal ...
on the console, you can press Ctrl+C
. You know that flashing
has completed. If multiple boards are connected, you may not see the Terminal.
Ensure that this command successfully flashes the device.
You can use python -m micro_eval.bin.eval
to evaluate models on the host or device. The simplest
invocation is as follows:
$ python -m micro_eval.bin.eval cifar10_cnn:interp
Try it now. You should see the results of running 10 iterations of model evaluation through the TVM host interpreter:
INFO eval.py:108 got result: {'label': array([-33, 1, -38, -36, -43, -66, -38, -58, -44, -6], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ -70, -46, -4, -89, -37, -63, -110, -108, 17, -6], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ 9, 7, 4, -41, -46, -51, -49, -19, -45, -26], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ 1, -6, 8, -21, -25, -5, -20, -3, -4, 1], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ -2, 5, 5, -5, -21, -11, -10, -5, -9, -10], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ 1, -4, 3, -14, -14, 1, -19, -9, 0, -13], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ 14, -23, 30, -28, -33, -1, -46, -36, -33, -45], dtype=int8)}
INFO eval.py:108 got result: {'label': array([-10, -19, -4, 5, -5, -5, -19, -29, -18, -21], dtype=int8)}
INFO eval.py:108 got result: {'label': array([-49, -28, 28, -2, -18, -58, -48, -73, -32, 5], dtype=int8)}
INFO eval.py:108 got result: {'label': array([ 15, 10, 0, -17, -42, -25, -23, -39, -44, 0], dtype=int8)}
You can use the TVM interpreter as a source of correctness for checking other invocations of the same model. Try checking against the x86 runtime:
$ python -m micro_eval.bin.eval cifar10_cnn:cpu --validate-against=cifar10_cnn:interp
INFO eval.py:370 model_name setting config 1 2 3 4 5 6 7 8 9
ERROR eval.py:376 cifar10_cnn cpu +005 -016 -027 -011 +059 -033 -032 -033 -015
ERROR eval.py:376 cifar10_cnn interp -030 +023 -027 -010 -003 +010 +013 -023 +023
...
This didn't work, because the cifar10_cnn model uses random parameters by default. Model configuration can be passed as JSON. Now try using the configuration for validating output, which loads specific parameters:
$ python -m micro_eval.bin.eval cifar10_cnn:cpu:data/ --validate-against=cifar10_cnn:interp
INFO eval.py:370 model_name setting config 1 2 3 4 5 6 7 8 9
INFO eval.py:376 cifar10_cnn interp data/cifar10-config-validate.json -021 +017 +022 +018 +004 +007 -013 -008 -013
INFO eval.py:376 cifar10_cnn cpu data/cifar10-config-validate.json -021 +017 +022 +018 +004 +007 -013 -008 -013
Now try running on the attached STM-Nucleo board:
$ python -m micro_eval.bin.eval cifar10_cnn:micro_dev
INFO eval.py:184 got prediction after 293.912 ms: {'label': array([ 1, -16, -2, -13, -8, 17, -10, -15, -17, -14], dtype=int8)}
...
See if you can verify the output against the TVM interpreter runtime.
The previous result didn't run very quickly--294 ms vs 100 ms in the equivalent CMSIS model. TVM can improve the runtime by automatically reorganizing and optimizing the computation, a process known as autotuning. First, let's see the result:
$ python -m micro_eval.bin.eval cifar10_cnn:micro_dev --use-tuned-schedule
INFO eval.py:184 got prediction after 157.354 ms: {'label': array([ -32, -58, -50, -48, 103, 36, -110, 54, 66, 94], dtype=int8)}
This gave us about a 47% improvement in runtime.
You can run and time CMSIS-NN's version of this CIFAR10-CNN network:
$ python -m micro_eval.bin.eval cmsis_cifar10_cnn:micro_dev
INFO eval.py:189 got prediction after 105.403 ms: {'label': array([-27, 11, -14, -16, -35, -35, -35, -11, 27, 127], dtype=int8)}
...
There are a couple of points to note about this runtime:
- It uses ARM's quantization scheme, which isn't TFLite-compatible.
- The output from this model will differ from the TVM-generated model because they aren't quite identical. See our blog post for more on this.
You can also run and time a CMSIS-NN model using TFLite-compatible quantization. While this model doesn't output accurate results, it runs the same operations and differs only in weights, so the runtime should roughly match what you would expect running CMSIS-CNN as a TFLite model:
$ python -m micro_eval.bin.eval cmsis_cifar10_cnn:micro_dev:data/cmsis-config-tflite.json
INFO eval.py:189 got prediction after 135.856 ms: {'label': array([ -44, -128, -128, 127, -128, -128, -128, -128, 113, 127], dtype=int8)}
...
NOTE: Autotuning takes a while on hardware--the above result was achieved by running autotuning for hundreds of iterations. However, you can still try it and see some speedup without waiting as long. Autotuning is a parallel process and can be sped up by adding more development boards.
$ python -m micro_eval.bin.autotune cifar10_cnn:micro_dev tune --num-iterations 20
When the script exits successfully, you'll notice that
autotune-logs/eval/cifar10_cnn-micro_dev-HWOI-HWOI-HWOI.log
has been changed. This is a symlink
that points to the default tuned schedule when --use-tuned-schedule
is passed. A new file now
also exists in its directory--this is the autotuning log. It contains the fastest configurations
found during autotuning for each tuned task in the graph.
There are a lot of moving pieces here and it's easy for the system to fail. Here I've tried to document some of the problems you can run into, and how to solve them.
-
Make sure the board is plugged in with a data cable :)
-
Double-check
hla_serial
inenv-config.json
(an empty string matches all boards). -
Try running openocd separately:
-
First, generate the transport config:
$ python -m micro_eval.bin.autotune cifar10_cnn:micro_dev rpc_dev_config
-
Now, try launching openocd:
$ 3rdparty/openocd/prefix/bin/openocd -f microrpc-dev-config/dev-0/openocd.cfg
Troubleshoot this until it connects to the device succesfully.
-
-
Double check a tracker/rpc server is not still running:
$ ps ax | grep tvm.exec.rpc_tracker $ ps ax | grep tvm.exec.rpc_server
Kill them if so.
Sometimes openocd, the tracker, or the RPC server need to be launched separately. A script
microrpc-dev-config/launch-openocds.sh
shows you how to do this, but you can also launch
them yourself. Here is how to use that script with autotuing and eval:
# adjust task-index or omit for evaluating the whole model
$ python -m micro_eval.bin.autotune cifar10_cnn:micro_dev rpc_dev_config --task-index=2
# In another terminal:
$ cd microrpc-dev-config
$ ./launch-openocds.sh
# In original terminal
$ python -m micro_eval.bin.autotune --pre-launched-tracker-hostport 127.0.0.1:9190 \
--single-task-index=2...
$ python -m micro_eval.bin.eval --openocd-server-hostport 127.0.0.1:6666 ...
While rare with this demo, there's currently no protection included if the STM CPU enters an exception handler. In this case, it might not be possible for the debugger to reset the pending exception, and execution might start to produce obviously-wrong or unpredictable results. You're more likely to encounter this if you experiment with the model. Hard-reset the board or power it off in this case. We're working to address this issue in a future PR.