Performance issues with inference on CPU backend #39

Michael-F-Bryan · 2021-10-05T08:59:14Z

While @Mohit0928 and I were playing around with the efficientdet_lite4_detection Rune, we noticed inference would take significantly longer than expected (the model says it can run in 0.7s while we were seeing run times of 30s+).

After some investigation, we found that almost all of the run time is spent inside TensorFlow Lite's CPU backend.

Steps to reproduce:

$ cd ~/Documents/hotg-ai/test-runes
$ git fetch && git checkout 3667f8ed096adfe63096a432f6d7b971c93f8d8a
$ cd image/efficientdet_lite4_detection

$ rune version
rune 0.8.0 (face129 2021-10-05)

# patch that zero/one-based indexing thing we talked about
$ sed -i 's/one-based/zero-based/g' Runefile.yml

$ rune build Runefile.yml
    Updating crates.io index
   Compiling efficientdet_lite4_detection v0.0.0 (/home/michael/.cache/runes/efficientdet_lite4_detection)
    Finished release [optimized] target(s) in 4.40s

$ RUST_LOG=debug sudo -E perf record --call-graph dwarf rune run efficientdet_lite4_detection.rune --image image.jpeg && \
    sudo perf script | inferno-collapse-perf > stacks.folded && \
    cat stacks.folded | inferno-flamegraph > flamegraph.svg
[2021-10-05T08:34:59.236Z INFO  hotg_rune_cli::run::command] Running rune: efficientdet_lite4_detection.rune
[2021-10-05T08:34:59.306Z DEBUG hotg_rune_wasmer_runtime] Loading image
[2021-10-05T08:34:59.307Z DEBUG hotg_rune_wasmer_runtime] Instantiating the WebAssembly module
[2021-10-05T08:34:59.332Z DEBUG hotg_rune_wasmer_runtime] Loaded the Rune
[2021-10-05T08:34:59.332Z DEBUG hotg_rune_wasmer_runtime] Running the rune
[2021-10-05T08:34:59.335Z DEBUG hotg_rune_cli::run::multi] Initializing the "hotg_rune_cli::run::image::Image" with ImageSettings { pixel_format: Some(RGB), width: Some(640), height: Some(640) } and ImageSource { dimensions: (1024, 768), pixel_type: "Rgb8", .. }
{"type_name":"&str","channel":2,"elements":["horse","horse","skateboard","bird","bird","bird","horse","skateboard","bird","horse","horse","horse","kite","bird","bird","bird","bottle","horse","horse","person","bird","horse","horse","bird","bird"],"dimensions":[25]}
[ perf record: Woken up 5977 times to write data ]
Warning:
Processed 192707 events and lost 1 chunks!

Check IO/CPU overload!

[ perf record: Captured and wrote 1494.947 MB perf.data (185775 samples) ]
RUST_LOG=debug sudo -E perf record --call-graph dwarf rune run  --image   48.10s user 1.61s system 100% cpu 49.320 total
Warning:
Processed 192707 events and lost 1 chunks!

Check IO/CPU overload!

sudo perf script  6.35s user 1.13s system 99% cpu 7.554 total
inferno-collapse-perf > stacks.folded  2.17s user 0.24s system 31% cpu 7.556 total

Generated flamegraph:

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with inference on CPU backend #39

Performance issues with inference on CPU backend #39

Michael-F-Bryan commented Oct 5, 2021 •

edited by Mohit0928

Loading

Performance issues with inference on CPU backend #39

Performance issues with inference on CPU backend #39

Comments

Michael-F-Bryan commented Oct 5, 2021 • edited by Mohit0928 Loading

Michael-F-Bryan commented Oct 5, 2021 •

edited by Mohit0928

Loading