Name	Name	Last commit message	Last commit date
parent directory ..
.settings	.settings
Debug	Debug
Drivers	Drivers
Inc	Inc
Middlewares/ST/AI/AI	Middlewares/ST/AI/AI
Src	Src
startup	startup
.code_review_properties	.code_review_properties
.cproject	.cproject
.mxproject	.mxproject
.project	.project
README.md	README.md
STM32L476RG_FLASH.ld	STM32L476RG_FLASH.ld
acoustic_feature_camera.elf.launch	acoustic_feature_camera.elf.launch
acoustic_feature_camera.ioc	acoustic_feature_camera.ioc
acoustic_feature_camera.pdf	acoustic_feature_camera.pdf
acoustic_feature_camera.txt	acoustic_feature_camera.txt
ai_memory_usage.jpg	ai_memory_usage.jpg
ai_memory_usage2.jpg	ai_memory_usage2.jpg
syscalls.c	syscalls.c

Acoustic feature camera (STM32L4 with one MEMS microphone)

This device is a sort of human ear: log-scale auditory perception and Fourier transform with Mel scaling as feature for training a brain. Connecting this device to Keras/TensorFlow mimics the human auditory system.

STM32L476RG as a core of this device seems a right choice, since the core of STMicro's sensor tile is also STM32L476.

STM32L4 configuration

The configuration below assumes my original "Knowles MEMS mic Arduino shield".

CubeMX report for STM32L476RG and the Arduino shield

Making use of DMA

STMicro's HAL library supports "HAL_DFSDM_FilterRegConvHalfCpltCallback" that is very useful to implemente ring-buffer-like buffering for real-time processing.

I split buffers for DMA into two segments: segment A and segment B.

                                                  Interrupt
                          Clock                 ..............
                      +--------------+          : .......... :
                      |              |          : :        V V
                      V              |          : :   +-------------+
Sound/voice ))) [MEMS mic]-+-PDM->[DFSDM]-DMA->[A|B]->|             |->[A|B]->DMA->[DAC] --> Analog filter->head phone ))) Sound/Voice
                                                      |ARM Cortex-M4|->[Feature]->DMA->[UART] --> Oscilloscope on PC or RasPi3
                                                      |             |
                                                      +-------------+

All the DMAs are synchronized, because their master clock is the system clock.

Sampling frequency

The highest frequency on a piano is 4186Hz, but it generate overtones: ~10kHz.
Human voice also generates overtones: ~ 10kHz.

So the sampling frequency of MEMS mic should be around 20kHz: 20kHz/2 = 10kHz (Nyquist frequency)

Parameters of DFSDM (digital filter for sigma-delta modulators) on STM32L4

System clock: 80MHz
Clock divider: 32
FOSR/decimation: 128
sinc filter: sinc3
right bit shift: 6 (2 * 128^3 = 2^22, so 6-bit-right-shift is required to output 16bit PCM)
Sampling frequency: 80_000_000/32/128 = 19.5kHz

Pre-processing on STM32L4/CMSIS-DSP

   << MEMS mic >>
         |
         V
   DFSDM w/ DMA
         |
  [16bit PCM data] --> DAC w/ DMA for montoring the sound with a headset
         |
  float32_t data
         |
         |                .... CMSIS-DSP APIs() .........................................
  [ AC coupling  ]-----+  arm_mean_f32(), arm_offset_f32
         |             |
  [ Pre-emphasis ]-----+  arm_fir_f32()
         |             |
[Overlapping frames]   |  arm_copy_f32()
         |             |
  [Windowing(hann)]    |  arm_mult_f32()
         |             |
  [   Real FFT   ]     |  arm_rfft_fast_f32()
         |             |
  [     PSD      ]-----+  arm_cmplx_mag_f32(), arm_scale_f32()
         |             |
  [Filterbank(MFSCs)]--+  arm_dot_prod_f32()
         |             |
     [Log scale]-------+  arm_scale_f32() with log10 approximation
         |             |
 [DCT Type-II(MFCCs)]  |  my original "dct_f32()" function based on CMSIS-DSP
         |             |
         +<------------+
         |
 data the size of int8_t or int16_t (i.e., quantization)
         |
         V
    UART w/ DMA
         |
         V
<< Oscilloscope GUI >>

Frame/stride/overlap

number of samples per frame: 512
length: 512/19.5kHz = 26.3msec
stride: 13.2msec
overlap: 50%(13.2msec)

  26.3msec          stride 13.2msec
  --- overlap dsp -------------
  [b0|a0]            a(1/2)
     [a0|a1]         a(2/2)
  --- overlap dsp -------------
        [a1|b0]      b(1/2)
           [b0|b1]   b(2/2)
  --- overlap dsp -------------
              :

Mel filter bank

The number of filters is 40. The reason is that most of the technical papers I have read uses 40 filters.
The filter bank is applied to the spectrogram to extract MFSCs and MFCCs for training a neural network.
I have developed DCT Type-II function in C language based on CMSIS-DSP to calculate MFCCs on STM32 in real time.

log10 processing time issue

PSD calculation uses log10 math function, but CMSIS-DSP does not support log10. log10 on the standard "math.h" is too slow. I tried math.h log10, and the time required for calculating log10(x) does not fit into the time slot of sound frame, so I decided to adopt log10 approximation. The approximation has been working perfect so far.

Processing time (actual measurement)

In case of 1024 samples per frame:

fir (cfft/mult/cifft/etc * 2 times): 17msec
log10: 54msec
log10 fast approximation: 1msec
atan2: 53msec

Note: log10(x) = log10(2) * log2(x)

Reference: https://community.arm.com/tools/f/discussions/4292/cmsis-dsp-new-functionality-proposal

Command over UART (USB-serial)

UART baudrate: 460800bps


        Sequence over UART(USB-serial)

    ARM Cortex-M4L                    PC
           |                          |
           |<-------- cmd ------------|
           |                          |
           |------ data output ------>|
           |                          |


Data is send in int8_t.

Output

cmd	description	output size	purpose	transfer mode
1	RAW_WAVE	N x 1	Input to oscilloscope	one frame
2	FFT	N/2 x 1	Input to oscilloscope	one frame
3	SPECTROGRAM	N/2 x 200	Input to oscilloscope	streaming
4	FEATURES	NUM_FILTERS x 400	Input to ML	buffered

Pre-emphasis

cmd	description	output size	purpose
P	Enable pre-emphasis
p	Disable pre-emphasis

Data format of features

The PC issues "FEATURES" command to the device to fetch features that are the last 2.6sec MFSCs and MFCCs buffered in a memory.

      shape: (200, 40, 1)       shape: (200, 40, 1)
   +------------------------+------------------------+
   |    MFSCs (40 * 200)    |    MFCCs (40 * 200)    |
   +------------------------+------------------------+

The GUI flatten features and convert it into CSV to save it as a csv file in a dataset folder.

Beam forming

Although I developed beam forming, it takes too much cost for tuning. So I removed it, and the code remains in this "old" folder.

Note on enabling AI inference

[Step 1] Uncomment #define INFERENE

"ai.h"

      :
/**
 * Enable inference by X-CUBE-AI
 */
#define INFERENCE   <== Uncomment this line.
      :

[Step 2] Manual modification

"app_x-cube-ai.c"

       :
/* Includes ------------------------------------------------------------------*/
#include <string.h>
#include "app_x-cube-ai.h"
#include "bsp_ai.h"
#include "ai_datatypes_defines.h"

#include "ai.h"   <== Add this line manually at every code generation by CubeMX/X-CUBE-AI. 
        :
/*************************************************************************

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acoustic_feature_camera

acoustic_feature_camera

README.md

Acoustic feature camera (STM32L4 with one MEMS microphone)

STM32L4 configuration

Making use of DMA

Sampling frequency

Parameters of DFSDM (digital filter for sigma-delta modulators) on STM32L4

Pre-processing on STM32L4/CMSIS-DSP

Frame/stride/overlap

Mel filter bank

log10 processing time issue

Processing time (actual measurement)

Command over UART (USB-serial)

Output

Pre-emphasis

Data format of features

Beam forming

Note on enabling AI inference

Files

acoustic_feature_camera

Directory actions

More options

Directory actions

More options

Latest commit

History

acoustic_feature_camera

Folders and files

parent directory

README.md

Acoustic feature camera (STM32L4 with one MEMS microphone)

STM32L4 configuration

Making use of DMA

Sampling frequency

Parameters of DFSDM (digital filter for sigma-delta modulators) on STM32L4

Pre-processing on STM32L4/CMSIS-DSP

Frame/stride/overlap

Mel filter bank

log10 processing time issue

Processing time (actual measurement)

Command over UART (USB-serial)

Output

Pre-emphasis

Data format of features

Beam forming

Note on enabling AI inference