You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While @Mohit0928 and I were playing around with the efficientdet_lite4_detection Rune, we noticed inference would take significantly longer than expected (the model says it can run in 0.7s while we were seeing run times of 30s+).
After some investigation, we found that almost all of the run time is spent inside TensorFlow Lite's CPU backend.
Steps to reproduce:
$ cd~/Documents/hotg-ai/test-runes
$ git fetch && git checkout 3667f8ed096adfe63096a432f6d7b971c93f8d8a
$ cd image/efficientdet_lite4_detection
$ rune version
rune 0.8.0 (face129 2021-10-05)
# patch that zero/one-based indexing thing we talked about
$ sed -i 's/one-based/zero-based/g' Runefile.yml
$ rune build Runefile.yml
Updating crates.io index
Compiling efficientdet_lite4_detection v0.0.0 (/home/michael/.cache/runes/efficientdet_lite4_detection)
Finished release [optimized] target(s) in 4.40s
$ RUST_LOG=debug sudo -E perf record --call-graph dwarf rune run efficientdet_lite4_detection.rune --image image.jpeg && \
sudo perf script | inferno-collapse-perf > stacks.folded && \
cat stacks.folded | inferno-flamegraph > flamegraph.svg
[2021-10-05T08:34:59.236Z INFO hotg_rune_cli::run::command] Running rune: efficientdet_lite4_detection.rune
[2021-10-05T08:34:59.306Z DEBUG hotg_rune_wasmer_runtime] Loading image
[2021-10-05T08:34:59.307Z DEBUG hotg_rune_wasmer_runtime] Instantiating the WebAssembly module
[2021-10-05T08:34:59.332Z DEBUG hotg_rune_wasmer_runtime] Loaded the Rune
[2021-10-05T08:34:59.332Z DEBUG hotg_rune_wasmer_runtime] Running the rune
[2021-10-05T08:34:59.335Z DEBUG hotg_rune_cli::run::multi] Initializing the "hotg_rune_cli::run::image::Image" with ImageSettings { pixel_format: Some(RGB), width: Some(640), height: Some(640) } and ImageSource { dimensions: (1024, 768), pixel_type: "Rgb8", .. }
{"type_name":"&str","channel":2,"elements":["horse","horse","skateboard","bird","bird","bird","horse","skateboard","bird","horse","horse","horse","kite","bird","bird","bird","bottle","horse","horse","person","bird","horse","horse","bird","bird"],"dimensions":[25]}
[ perf record: Woken up 5977 times to write data ]
Warning:
Processed 192707 events and lost 1 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 1494.947 MB perf.data (185775 samples) ]
RUST_LOG=debug sudo -E perf record --call-graph dwarf rune run --image 48.10s user 1.61s system 100% cpu 49.320 total
Warning:
Processed 192707 events and lost 1 chunks!
Check IO/CPU overload!
sudo perf script 6.35s user 1.13s system 99% cpu 7.554 total
inferno-collapse-perf > stacks.folded 2.17s user 0.24s system 31% cpu 7.556 total
Generated flamegraph:
The text was updated successfully, but these errors were encountered:
While @Mohit0928 and I were playing around with the
efficientdet_lite4_detection
Rune, we noticed inference would take significantly longer than expected (the model says it can run in 0.7s while we were seeing run times of 30s+).After some investigation, we found that almost all of the run time is spent inside TensorFlow Lite's CPU backend.
Steps to reproduce:
Generated flamegraph:
The text was updated successfully, but these errors were encountered: