-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low signal level distorts MFCC/GFCC values #543
Comments
In MFCC literature normalization for different signal levels is handled by using the cepstral mean normalizarion. It is usually applied after taking log of the mel-scale filterbank and before DCT: http://dsp.stackexchange.com/questions/19564/cepstral-mean-normalization Also I think that taking db instead of log is a reason for having the values of spectral energies even lower. Unless somebody points me to a MFCC reference implementation, in which they use db, I suggest we stick to log. |
We won't be able to implement this normalization into the MFCC algorithm. This is a very application specific post-processing step that can be done on Pool containing MFCC frames. Log type does not affect inconsistencies in MFCC values. If we disable thresholding, any log will work equally well. Perhaps we can have an parameter that enables/disables threshold clipping of energy bands before taking log. |
We can't disable thresholding completely otherwise we'll get NaN values. Therefore the easiest solution will be:
If one wants to avoid distorting MFCC values on low level signal, he will need to lower the threshold and/or post-process the resulting values. One can define a threshold in the 0th coefficient to filter out silent/distorted MFCC frames. |
Add silenceThreshold parameter default to 1e-9. Get rid of pointer to compressor function because it is not flexible as soon as one needs to run functions with different number of parameters (amp2db vs linear) or different threshold log values (amb2db db threshold vs log)
@georgid @pabloEntropia |
ok, decreasing the thresholding makes sense. 1e-9 seems reasonable value for a threshold. |
For me it is not clear which is the best default value. 1e-9 is a default value we had before. librosa is using 1e-10. Maybe @edufonseca can test a few different ones. |
Add silenceThreshold parameter default to 1e-9. Modified unit test to to fit the new results.
It is hard to define the "best" default threshold (ie, best in what terms,
and for what application?). As we found out, the distortion occurs when
EXTREME signal attenuation takes place, e.g.:
…-dealing with environmental sounds (with few scenes belonging to particular
classes like 'home', or 'library', where frames of almost silence may
appear)
-applying window normalization in Essentia with relatively long window
length (2048 samples) (the longer the window, the higher the attenuation,
so that area = 1)
Then, the Mel Freq. Spectrum (MFS) is computed and, in this domain, values
lesser than 1e-9, are fixed to such a constant. It appears (@dbogdanov
correct me if I'm wrong), that some audio frames ended up with such a
severe signal attenuation that their MFS had values around the threshold,
hence distorting the energies in some (or all) mel bands. The consequence
of this fact was lower performance on scene classification when using MFCCs
computed by Essentia when compared to those by librosa.
Instead of figuring out an optimal threshold, perhaps we could be more
pragmatic: why this did not happen with librosa? Because:
-There was no severe attenuation by window normalization.
-The MFS minimum threshold is 1e-10.
What could we do about it? The following are just some ideas:
a) Set our threshold to 1e-10. In this way, both libraries (Essentia and
librosa) are (a bit more) comparable.
b) Most importantly: explain briefly in the documentation:
b1) the potential issues of dealing with very low signal levels, with the
specific example of background noise from environmental sounds.
b2) add in the window normalization parameter, a sentence in order to warn
that it involves (serious?) signal attenuation (depending on win length)
b3) allow users to define the MFS minimum threshold as input parameter?
(with a brief explanation of its purpose and repercussion; maybe b1) could
fit here)
On Thu, Dec 29, 2016 at 2:59 PM, Dmitry Bogdanov ***@***.***> wrote:
For me it is not clear which is the best default value. 1e-9 is a default
value we had before. librosa is using 1e-10. Maybe @edufonseca
<https://github.com/edufonseca> can test a few different ones.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#543 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARVBB00Eawd1QZysajTsb0kPDED2pGloks5rM7zNgaJpZM4LULlA>
.
|
Add silenceThreshold parameter default to 1e-9. Get rid of pointer to compressor function because it is not flexible as soon as one needs to run functions with different number of parameters (amp2db vs linear) or different threshold log values (amb2db db threshold vs log) Update test to the new results of the `testZero`
GFCC: Add silenceThreshold parameter #543
I can conclude we should set threshold to 1e-10 for better consistency with librosa. |
Ideally we would expect to have identical MFCC coefficients, except for the 0th coefficient, on different input levels for the same input signal frame. However, in the case when the input signal level is very low, the MFCC values get distorted.
Low signal level leads to small spectrum values. Using power spectrum for computation of mel bands reduces these values further. Taking log to compute log-energies we apply thresholding to truncate very silent bands (currently we truncate to -90dB).
For some signal frames it may occur that some bands truncated being below the threshold, while others are not. This lead to MFCC values different from expected.
When all bands values are truncated, the resulting MFCC vector contains zeros except for the 0th coefficient which will receive its minimum negative value. Avoiding distortion by lowering silence threshold comes at cost of more frames containing non-zero MFCC vectors. This threshold might depend on application.
Solutions:
The text was updated successfully, but these errors were encountered: