Overfitting? #11

alezenonos · 2020-05-04T07:36:05Z

This is an inspiring piece of work and thank you for keeping it open source. I was just wondering whether it is demonstrating an overfitting situation. Specifically, audio from the video is the same as the speech files. This contaminates the test set with examples from the training set.

For example let's say you have X1,X2,X3,X4,X5 data points in training set and X2,X4,X5,X6 in test set. X2,X4,X5 are more likely to get right since already seen in the training phase and thus do not reflect the true predictive power of your models.

To validate this, we can either remove extracted audio from the video or make sure that both the audio extracted from video and speech audio file are in either train or test set together.

marcogdepinto · 2020-05-12T18:18:00Z

Hi, this model is built only to solve this problem, predicting emotions on the RAVDESS dataset. Overfitting is correct in this case because is what I wanted to achieve. If you want a more generalized model, feel free to reduce the number of layers in the neural network and/or remove/add different training data. This is not an issue but expected behaviour, hence I am resolving the issue.

EnisBerk · 2020-05-13T01:54:07Z

I wish I saw this last week. OMG. I came here to open the exact same issue.

Dear @marcogdepinto , @alezenonos is right. You are training and testing on the same files. So your model is not learning anything about emotions but it is just memorizing files. You could just give filenames rather than sound and you would get the same results.

You are separating audio files from the video for no reason because audio files are already coming from the videos. Please share this with big letters on top of your readMe file because you just cost me hours of wondering why I cannot reproduce your results.

marcogdepinto · 2020-05-13T04:26:02Z

@EnisBerk I am afraid that was not clear. The name of the project is "Emotion Classification RAVDESS" for a reason.

I have added the following sentence on top of the README:
"Please note this project is not made for generalization: it is built to work only with the files of the RAVDESS dataset, not for any audio file".

Sorry again for the misunderstanding.

alezenonos · 2020-05-13T06:34:07Z

Hi @marcogdepinto. This is not about generalising on other datasets. This is about learning features from this dataset. By contaminating the test set with data from the training set you wouldn't need an ML algorithm at all. It almost becomes a deterministic problem. Nevertheless, your code is really useful and thank you for this. The only change i would make to deal with this issue is remove the video extracted audio in order for the ML algorithm to actually learn the features from the audio. I understand this is not an active project so this might help others as well. Accuracy won't be as good but it would be more realistic. This issue is also called data leakage.

marcogdepinto · 2020-05-13T06:45:51Z

Hi @alezenonos , thanks for the hints, very much appreciated! One thing that I noticed reading previous issues is that the feature extracted from audio should have some noise within it (#6 ). The FFMPEG library that extracts additional data from the video (https://github.com/marcogdepinto/Emotion-Classification-Ravdess/blob/master/Mp4ToWav.py) is setting a frequency of 44100 (ffmpeg command -ar, more on https://ffmpeg.org/ffmpeg-all.html ) that corresponds to 44,1kHz. The original frequency of the audio files in the dataset is 48kHz (source: https://zenodo.org/record/1188976#.XYendZMzZN1 ), so the features created by librosa MFCC should be different. Unfortunately I have never had time to make a test on two files to check if the generated arrays have different values from the original ones. If the values are different, this could be considered as a data augmentation approach (e.g. what is done when rotating pictures in case of computer vision problems ). Correct me if I am wrong here. On the other hand, if the array is the same you are right, when I have some time I will re-train the model without the audio extracted from the videos to review the changes (if you want to do the test yourself, that would be great!). I may also work on having a test set with the files of the last 3 actors and excluding those from the training. Only issue is that I do not have time to do this now honestly, hopefully in the next months I'll be able to code a different approach and compare results.

alezenonos · 2020-05-13T07:04:23Z

Hi @marcogdepinto, you are right! Data augmentation would help a lot as long as the values are different and as long as it does not fundamentally change how the emotion is conveyed through audio. So someone should still consider what augmentations to do. In this particular case, if you have no duplicate data from the training set also in test set it should be fine.

EnisBerk · 2020-05-13T15:16:25Z

Hi @alezenonos @marcogdepinto ,
Thanks for your quick responses on an inactive repo. I agree that the data augmentation approach is a good idea. You just need to make sure the augmentations of the same file are not in the training and test at the same time. Unless augmentation changes the emotion in the audio.

marcogdepinto · 2020-05-13T15:22:43Z

@EnisBerk @alezenonos no worries. Again, have no bandwidth to work on this now, also considering this will require huge refactoring (written in 2018 with VERY BAD code style, it is also necessary to migrate from jupyter to proper .py files, classes etc). Hope I'll be able to pick it at a certain point in the next months. Thanks both for the valuable input.

marcogdepinto · 2020-05-16T18:34:02Z

@EnisBerk @alezenonos hey both, just an heads up. I found the time to do all the stuff above. This project just received a major refactoring. I have

moved everything from jupyter to proper python files;
refactored the code (still some work to do there) and added docstrings/comments;
removed the audio features extracted from the videos;
added a pipeline to include a new set of features extracted from the TESS dataset.
reviewed the README to explain better how everything works.

Points 3-4 reduced accuracy to 80% but the model should be able to generalize better.

Thank you both for having inspired the revamping of this project!

EnisBerk · 2020-05-16T19:29:34Z

Thank you for taking the time to improve the repository. I am sure it will be helpful to others.

marcogdepinto closed this as completed May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfitting? #11

Overfitting? #11

alezenonos commented May 4, 2020 •

edited

Loading

marcogdepinto commented May 12, 2020

EnisBerk commented May 13, 2020

marcogdepinto commented May 13, 2020 •

edited

Loading

alezenonos commented May 13, 2020

marcogdepinto commented May 13, 2020 •

edited

Loading

alezenonos commented May 13, 2020

EnisBerk commented May 13, 2020

marcogdepinto commented May 13, 2020

marcogdepinto commented May 16, 2020 •

edited

Loading

EnisBerk commented May 16, 2020

Overfitting? #11

Overfitting? #11

Comments

alezenonos commented May 4, 2020 • edited Loading

marcogdepinto commented May 12, 2020

EnisBerk commented May 13, 2020

marcogdepinto commented May 13, 2020 • edited Loading

alezenonos commented May 13, 2020

marcogdepinto commented May 13, 2020 • edited Loading

alezenonos commented May 13, 2020

EnisBerk commented May 13, 2020

marcogdepinto commented May 13, 2020

marcogdepinto commented May 16, 2020 • edited Loading

EnisBerk commented May 16, 2020

alezenonos commented May 4, 2020 •

edited

Loading

marcogdepinto commented May 13, 2020 •

edited

Loading

marcogdepinto commented May 13, 2020 •

edited

Loading

marcogdepinto commented May 16, 2020 •

edited

Loading