Low signal level distorts MFCC/GFCC values #543

dbogdanov · 2016-12-22T17:07:12Z

Ideally we would expect to have identical MFCC coefficients, except for the 0th coefficient, on different input levels for the same input signal frame. However, in the case when the input signal level is very low, the MFCC values get distorted.

Low signal level leads to small spectrum values. Using power spectrum for computation of mel bands reduces these values further. Taking log to compute log-energies we apply thresholding to truncate very silent bands (currently we truncate to -90dB).

For some signal frames it may occur that some bands truncated being below the threshold, while others are not. This lead to MFCC values different from expected.

When all bands values are truncated, the resulting MFCC vector contains zeros except for the 0th coefficient which will receive its minimum negative value. Avoiding distortion by lowering silence threshold comes at cost of more frames containing non-zero MFCC vectors. This threshold might depend on application.

Solutions:

disable truncation when computing dbamp/dbpow log in MFCC/GFCC.
lower silence threshold (-180dB) for MFCC/GFCC
truncate bands at -90dB if magnitude spectrum was used, at -180dB if power spectrum was used
implement silenceThreshold parameter
print a warning on every truncated frame
leave as it is, but note this issue in documentation (will affect some tasks)

dbogdanov · 2016-12-22T17:07:42Z

@georgid

georgid · 2016-12-23T19:03:14Z

Ideally we would expect to have identical MFCC coefficients, except for the 0th coefficient, on different input levels for the same input signal frame.

For some signal frames it may occur that some bands truncated being below the threshold, while others are not. This lead to MFCC values different from expected.

In MFCC literature normalization for different signal levels is handled by using the cepstral mean normalizarion. It is usually applied after taking log of the mel-scale filterbank and before DCT:

http://dsp.stackexchange.com/questions/19564/cepstral-mean-normalization
I think it is a good idea to implement that.

Also I think that taking db instead of log is a reason for having the values of spectral energies even lower. Unless somebody points me to a MFCC reference implementation, in which they use db, I suggest we stick to log.

dbogdanov · 2016-12-27T15:23:13Z

We won't be able to implement this normalization into the MFCC algorithm. This is a very application specific post-processing step that can be done on Pool containing MFCC frames.

Log type does not affect inconsistencies in MFCC values. If we disable thresholding, any log will work equally well. Perhaps we can have an parameter that enables/disables threshold clipping of energy bands before taking log.

dbogdanov · 2016-12-27T15:26:43Z

@edufonseca

dbogdanov · 2016-12-28T13:54:08Z

We can't disable thresholding completely otherwise we'll get NaN values. Therefore the easiest solution will be:

lower hardcoded silence threshold in MFCC and GFCC (create new versions of lin2db, pow2db, amp2db with a threshold argument in essentiamath)
explain this issue in documentation

If one wants to avoid distorting MFCC values on low level signal, he will need to lower the threshold and/or post-process the resulting values. One can define a threshold in the 0th coefficient to filter out silent/distorted MFCC frames.

Add silenceThreshold parameter default to 1e-9. Get rid of pointer to compressor function because it is not flexible as soon as one needs to run functions with different number of parameters (amp2db vs linear) or different threshold log values (amb2db db threshold vs log)

dbogdanov · 2016-12-28T17:22:37Z

@georgid @pabloEntropia
I've done some changes in the new mfcc_thresholding branch. If you like the idea, the same should be done for GFCC.

georgid · 2016-12-29T11:49:57Z

ok, decreasing the thresholding makes sense. 1e-9 seems reasonable value for a threshold.

dbogdanov · 2016-12-29T13:59:40Z

For me it is not clear which is the best default value. 1e-9 is a default value we had before. librosa is using 1e-10. Maybe @edufonseca can test a few different ones.

Add silenceThreshold parameter default to 1e-9. Modified unit test to to fit the new results.

edufonseca · 2017-01-04T23:38:15Z

It is hard to define the "best" default threshold (ie, best in what terms, and for what application?). As we found out, the distortion occurs when EXTREME signal attenuation takes place, e.g.:

…

-dealing with environmental sounds (with few scenes belonging to particular classes like 'home', or 'library', where frames of almost silence may appear) -applying window normalization in Essentia with relatively long window length (2048 samples) (the longer the window, the higher the attenuation, so that area = 1) Then, the Mel Freq. Spectrum (MFS) is computed and, in this domain, values lesser than 1e-9, are fixed to such a constant. It appears (@dbogdanov correct me if I'm wrong), that some audio frames ended up with such a severe signal attenuation that their MFS had values around the threshold, hence distorting the energies in some (or all) mel bands. The consequence of this fact was lower performance on scene classification when using MFCCs computed by Essentia when compared to those by librosa. Instead of figuring out an optimal threshold, perhaps we could be more pragmatic: why this did not happen with librosa? Because: -There was no severe attenuation by window normalization. -The MFS minimum threshold is 1e-10. What could we do about it? The following are just some ideas: a) Set our threshold to 1e-10. In this way, both libraries (Essentia and librosa) are (a bit more) comparable. b) Most importantly: explain briefly in the documentation: b1) the potential issues of dealing with very low signal levels, with the specific example of background noise from environmental sounds. b2) add in the window normalization parameter, a sentence in order to warn that it involves (serious?) signal attenuation (depending on win length) b3) allow users to define the MFS minimum threshold as input parameter? (with a brief explanation of its purpose and repercussion; maybe b1) could fit here)

On Thu, Dec 29, 2016 at 2:59 PM, Dmitry Bogdanov ***@***.***> wrote: For me it is not clear which is the best default value. 1e-9 is a default value we had before. librosa is using 1e-10. Maybe @edufonseca <https://github.com/edufonseca> can test a few different ones. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#543 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ARVBB00Eawd1QZysajTsb0kPDED2pGloks5rM7zNgaJpZM4LULlA> .

Add silenceThreshold parameter default to 1e-9. Get rid of pointer to compressor function because it is not flexible as soon as one needs to run functions with different number of parameters (amp2db vs linear) or different threshold log values (amb2db db threshold vs log) Update test to the new results of the `testZero`

GFCC: Add silenceThreshold parameter #543

dbogdanov · 2017-01-09T12:30:17Z

I can conclude we should set threshold to 1e-10 for better consistency with librosa.

dbogdanov added the algorithms QA label Dec 22, 2016

dbogdanov added this to the 2.1 milestone Dec 22, 2016

dbogdanov mentioned this issue Dec 22, 2016

comparison of MFCC computation between librosa and essentia, for acoustic scene classification #525

Open

dbogdanov assigned palonso Dec 28, 2016

palonso pushed a commit to palonso/essentia that referenced this issue Jan 4, 2017

GFCC: Added thresholding parameter MTG#543

27a7757

Add silenceThreshold parameter default to 1e-9. Modified unit test to to fit the new results.

dbogdanov added a commit that referenced this issue Jan 5, 2017

Merge pull request #550 from pabloEntropia/gfcc-thresholding

fe011a0

GFCC: Add silenceThreshold parameter #543

dbogdanov closed this as completed in bfe09ef Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low signal level distorts MFCC/GFCC values #543

Low signal level distorts MFCC/GFCC values #543

dbogdanov commented Dec 22, 2016 •

edited

Loading

dbogdanov commented Dec 22, 2016

georgid commented Dec 23, 2016 •

edited

Loading

dbogdanov commented Dec 27, 2016

dbogdanov commented Dec 27, 2016

dbogdanov commented Dec 28, 2016

dbogdanov commented Dec 28, 2016

georgid commented Dec 29, 2016 •

edited

Loading

dbogdanov commented Dec 29, 2016

edufonseca commented Jan 4, 2017 via email

dbogdanov commented Jan 9, 2017

Low signal level distorts MFCC/GFCC values #543

Low signal level distorts MFCC/GFCC values #543

Comments

dbogdanov commented Dec 22, 2016 • edited Loading

dbogdanov commented Dec 22, 2016

georgid commented Dec 23, 2016 • edited Loading

dbogdanov commented Dec 27, 2016

dbogdanov commented Dec 27, 2016

dbogdanov commented Dec 28, 2016

dbogdanov commented Dec 28, 2016

georgid commented Dec 29, 2016 • edited Loading

dbogdanov commented Dec 29, 2016

edufonseca commented Jan 4, 2017 via email

dbogdanov commented Jan 9, 2017

dbogdanov commented Dec 22, 2016 •

edited

Loading

georgid commented Dec 23, 2016 •

edited

Loading

georgid commented Dec 29, 2016 •

edited

Loading