Skip to content

mryndzionek/esp32s3_eye_kws_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

esp32s3_eye_kws_demo

Speech recognition is based on this architecture and examples from the same repository. The cell type in this model is FastGRNN. More detailed view on data flow through the network with specific vector/matrix sizes:

sharnn

The inference is run nine times a second. The CPU utilization due to inference is only ~24%. FastRNN cell is also supported (can be changed via menuconfig).

A bigger, LSTM-based model with ~550ms inference time can be found here. It is slightly more accurate, especially to the up label.

demo_3.mov

Notes

Number of TinyML model conversion frameworks were tested, but none gave satisfactory results. The main problem seems to be that the graphs exported from PyTorch (or other training-oriented NN frameworks) contain much additional information needed only for training, but information which obscures the essential structure needed only for inference. Here is for example a ONNX graph exported directly from PyTorch:

graph

and this is all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...