Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build for TARGET_ARCH=fusion_f1 via reference implementation fallback. #45464

Merged
merged 3 commits into from
Dec 9, 2020

Conversation

advaitjain
Copy link
Member

This change adds reference fallbacks to the optimized xtensa kernels for the case when TARGET_ARCH is anything other than hifimini.

This sets the stage for a baseline from which we can incrementally optimize for architectures other than hifimini.

The goal is to have a starting point where all the unit tests pass for TARGET_ARCH=hifimini (which will use the optimized implementations) or any other TARGET_ARCH (with reference fallback).

Tested for TARGET_ARCH=fusion_f1 with:

make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=Google_F1 test
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=Google_F1 test_keyword_benchmark

InitializeKeywordRunner() took 239061 ticks (239 ms)
KeywordRunNIerations(1) took 168564 ticks (168 ms)
KeywordRunNIerations(10) took 1685111 ticks (1685 ms)
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=Google_F1 keyword_benchmark BUILD_TYPE=release
xt-size tensorflow/lite/micro/tools/make/gen/xtensa_fusion_f1/bin/keyword_benchmark

   text	   data	    bss	    dec	    hex	filename
  48256	  40132	  24952	 113340	  1babc	tensorflow/lite/micro/tools/make/gen/xtensa_fusion_f1/bin/keyword_benchmark

After this change, we can:

  • add a continuous build for Hifi4
  • add optimizations for Hifi4 on a per kernelbasis and keep profiling the impact of these optimizations on the keyword benchmark cycles and binary size.

Also tested that TARGET_ARCH=hifimini is unaffected:

make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=hifimini XTENSA_CORE=mini1m1m_RG test_keyword_benchmark

InitializeKeywordRunner() took 1392788 ticks (1392 ms)
KeywordRunNIerations(1) took 89195 ticks (89 ms)
KeywordRunNIerations(10) took 891509 ticks (891 ms)
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=hifimini XTENSA_CORE=mini1m1m_RG keyword_benchmark BUILD_TYPE=release
xt-size tensorflow/lite/micro/tools/make/gen/xtensa_hifimini/bin/keyword_benchmark
   text	   data	    bss	    dec	    hex	filename
  46080	  40204	  24952	 111236	  1b284	tensorflow/lite/micro/tools/make/gen/xtensa_hifimini/bin/keyword_benchmark

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label Dec 8, 2020
@google-ml-butler
Copy link

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

@advaitjain
Copy link
Member Author

tagging @pnikam-cad @nyadla-sys @kpraving

@gbaned gbaned self-assigned this Dec 8, 2020
@gbaned gbaned added the comp:micro Related to TensorFlow Lite Microcontrollers label Dec 8, 2020
This change adds reference fallbacks to the optimized xtensa kernels for
the case when TARGET_ARCH is anything other than hifimini.

This sets the stage for a baseline from which we can incrementally
optimize for architectures other than hifimini.

The goal is to have a starting point where all the unit tests pass for
`TARGET_ARCH=hifimini` (which will use the optimized implementations) or
any other `TARGET_ARCH` (with reference fallback).

Tested for `TARGET_ARCH=fusion_f1` with:

```
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=Google_F1 test
```

With the following profiling results:

```
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=Google_F1 test_keyword_benchmark

InitializeKeywordRunner() took 239061 ticks (239 ms)
KeywordRunNIerations(1) took 168564 ticks (168 ms)
KeywordRunNIerations(10) took 1685111 ticks (1685 ms)
```

```
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=Google_F1 keyword_benchmark BUILD_TYPE=release
xt-size tensorflow/lite/micro/tools/make/gen/xtensa_fusion_f1/bin/keyword_benchmark

   text	   data	    bss	    dec	    hex	filename
  48256	  40132	  24952	 113340	  1babc	tensorflow/lite/micro/tools/make/gen/xtensa_fusion_f1/bin/keyword_benchmark
```

After this change, we can:
 * add a continuous build for Hifi4
 * add optimizations for Hifi4 on a per kernelbasis and keep profiling
   the impact of these optimizations on the keyword benchmark cycles and
   binary size.

Also tested that `TARGET_ARCH=hifimini` is unaffected:

```
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=hifimini XTENSA_CORE=mini1m1m_RG test_keyword_benchmark

InitializeKeywordRunner() took 1392788 ticks (1392 ms)
KeywordRunNIerations(1) took 89195 ticks (89 ms)
KeywordRunNIerations(10) took 891509 ticks (891 ms)
```

```
make -f tensorflow/lite/micro/tools/make/Makefile -j8 TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=hifimini XTENSA_CORE=mini1m1m_RG keyword_benchmark BUILD_TYPE=release
xt-size tensorflow/lite/micro/tools/make/gen/xtensa_hifimini/bin/keyword_benchmark
   text	   data	    bss	    dec	    hex	filename
  46080	  40204	  24952	 111236	  1b284	tensorflow/lite/micro/tools/make/gen/xtensa_hifimini/bin/keyword_benchmark
```
@advaitjain advaitjain added the kokoro:force-run Tests on submitted change label Dec 8, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 8, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 9, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 9, 2020
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Dec 9, 2020
@advaitjain advaitjain added the kokoro:force-run Tests on submitted change label Dec 9, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 9, 2020
@advaitjain
Copy link
Member Author

Internal checks were failing (while the external build was ok) because there is an automatic clang-format step prior to the code being imported into the google codebase. And my original commit was missing clang-format and an associated header.

d2fd64f fixes the issue.

@advaitjain advaitjain added the ready to pull PR ready for merge process label Dec 9, 2020
@copybara-service copybara-service bot merged commit 07208f7 into tensorflow:master Dec 9, 2020
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2020
#45464 added a new file but did not add the Apache header. Instead the internal change was force submitted. This resulted in breaking all sync between internal and external.

PiperOrigin-RevId: 346613912
Change-Id: I078c18f677dcf05be01966b2277f28b4ef42ad68
@advaitjain advaitjain deleted the xtensa-fusion-f1 branch December 9, 2020 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:micro Related to TensorFlow Lite Microcontrollers ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants