Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 1.48 KB

README.md

File metadata and controls

32 lines (21 loc) · 1.48 KB

esp32s3_eye_kws_demo

Speech recognition is based on this architecture and examples from the same repository. The cell type in this model is FastGRNN. More detailed view on data flow through the network with specific vector/matrix sizes:

sharnn

The inference is run nine times a second. The CPU utilization due to inference is only ~24%. FastRNN cell is also supported (can be changed via menuconfig).

A bigger, LSTM-based model with ~550ms inference time can be found here. It is slightly more accurate, especially to the up label.

demo_3.mov

Notes

Number of TinyML model conversion frameworks were tested, but none gave satisfactory results. The main problem seems to be that the graphs exported from PyTorch (or other training-oriented NN frameworks) contain much additional information needed only for training, but information which obscures the essential structure needed only for inference. Here is for example a ONNX graph exported directly from PyTorch:

graph

and this is all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...