esp32s3_eye_kws_demo

Speech recognition is based on this architecture and examples from the same repository. The cell type in this model is FastGRNN. More detailed view on data flow through the network with specific vector/matrix sizes:

The inference is run nine times a second. The CPU utilization due to inference is only ~24%. FastRNN cell is also supported (can be changed via menuconfig).

A bigger, LSTM-based model with ~550ms inference time can be found here. It is slightly more accurate, especially to the up label.

demo_3.mov

Notes

Number of TinyML model conversion frameworks were tested, but none gave satisfactory results. The main problem seems to be that the graphs exported from PyTorch (or other training-oriented NN frameworks) contain much additional information needed only for training, but information which obscures the essential structure needed only for inference. Here is for example a ONNX graph exported directly from PyTorch:

and this is all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
images		images
main		main
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
sdkconfig.defaults		sdkconfig.defaults

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

esp32s3_eye_kws_demo

Notes

About

Languages

License

mryndzionek/esp32s3_eye_kws_demo

Folders and files

Latest commit

History

Repository files navigation

esp32s3_eye_kws_demo

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Languages