Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with inference on CPU backend #39

Open
Michael-F-Bryan opened this issue Oct 5, 2021 · 0 comments
Open

Performance issues with inference on CPU backend #39

Michael-F-Bryan opened this issue Oct 5, 2021 · 0 comments

Comments

@Michael-F-Bryan
Copy link

Michael-F-Bryan commented Oct 5, 2021

While @Mohit0928 and I were playing around with the efficientdet_lite4_detection Rune, we noticed inference would take significantly longer than expected (the model says it can run in 0.7s while we were seeing run times of 30s+).

After some investigation, we found that almost all of the run time is spent inside TensorFlow Lite's CPU backend.

Steps to reproduce:

$ cd ~/Documents/hotg-ai/test-runes
$ git fetch && git checkout 3667f8ed096adfe63096a432f6d7b971c93f8d8a
$ cd image/efficientdet_lite4_detection

$ rune version
rune 0.8.0 (face129 2021-10-05)

# patch that zero/one-based indexing thing we talked about
$ sed -i 's/one-based/zero-based/g' Runefile.yml

$ rune build Runefile.yml
    Updating crates.io index
   Compiling efficientdet_lite4_detection v0.0.0 (/home/michael/.cache/runes/efficientdet_lite4_detection)
    Finished release [optimized] target(s) in 4.40s

$ RUST_LOG=debug sudo -E perf record --call-graph dwarf rune run efficientdet_lite4_detection.rune --image image.jpeg && \
    sudo perf script | inferno-collapse-perf > stacks.folded && \
    cat stacks.folded | inferno-flamegraph > flamegraph.svg
[2021-10-05T08:34:59.236Z INFO  hotg_rune_cli::run::command] Running rune: efficientdet_lite4_detection.rune
[2021-10-05T08:34:59.306Z DEBUG hotg_rune_wasmer_runtime] Loading image
[2021-10-05T08:34:59.307Z DEBUG hotg_rune_wasmer_runtime] Instantiating the WebAssembly module
[2021-10-05T08:34:59.332Z DEBUG hotg_rune_wasmer_runtime] Loaded the Rune
[2021-10-05T08:34:59.332Z DEBUG hotg_rune_wasmer_runtime] Running the rune
[2021-10-05T08:34:59.335Z DEBUG hotg_rune_cli::run::multi] Initializing the "hotg_rune_cli::run::image::Image" with ImageSettings { pixel_format: Some(RGB), width: Some(640), height: Some(640) } and ImageSource { dimensions: (1024, 768), pixel_type: "Rgb8", .. }
{"type_name":"&str","channel":2,"elements":["horse","horse","skateboard","bird","bird","bird","horse","skateboard","bird","horse","horse","horse","kite","bird","bird","bird","bottle","horse","horse","person","bird","horse","horse","bird","bird"],"dimensions":[25]}
[ perf record: Woken up 5977 times to write data ]
Warning:
Processed 192707 events and lost 1 chunks!

Check IO/CPU overload!

[ perf record: Captured and wrote 1494.947 MB perf.data (185775 samples) ]
RUST_LOG=debug sudo -E perf record --call-graph dwarf rune run  --image   48.10s user 1.61s system 100% cpu 49.320 total
Warning:
Processed 192707 events and lost 1 chunks!

Check IO/CPU overload!

sudo perf script  6.35s user 1.13s system 99% cpu 7.554 total
inferno-collapse-perf > stacks.folded  2.17s user 0.24s system 31% cpu 7.556 total

Generated flamegraph:

flamegraph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant