-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overfitting? #11
Comments
Hi, this model is built only to solve this problem, predicting emotions on the RAVDESS dataset. Overfitting is correct in this case because is what I wanted to achieve. If you want a more generalized model, feel free to reduce the number of layers in the neural network and/or remove/add different training data. This is not an issue but expected behaviour, hence I am resolving the issue. |
I wish I saw this last week. OMG. I came here to open the exact same issue. Dear @marcogdepinto , @alezenonos is right. You are training and testing on the same files. So your model is not learning anything about emotions but it is just memorizing files. You could just give filenames rather than sound and you would get the same results. You are separating audio files from the video for no reason because audio files are already coming from the videos. Please share this with big letters on top of your readMe file because you just cost me hours of wondering why I cannot reproduce your results. |
@EnisBerk I am afraid that was not clear. The name of the project is "Emotion Classification RAVDESS" for a reason. I have added the following sentence on top of the README: Sorry again for the misunderstanding. |
Hi @marcogdepinto. This is not about generalising on other datasets. This is about learning features from this dataset. By contaminating the test set with data from the training set you wouldn't need an ML algorithm at all. It almost becomes a deterministic problem. Nevertheless, your code is really useful and thank you for this. The only change i would make to deal with this issue is remove the video extracted audio in order for the ML algorithm to actually learn the features from the audio. I understand this is not an active project so this might help others as well. Accuracy won't be as good but it would be more realistic. This issue is also called data leakage. |
Hi @alezenonos , thanks for the hints, very much appreciated! One thing that I noticed reading previous issues is that the feature extracted from audio should have some noise within it (#6 ). The FFMPEG library that extracts additional data from the video (https://github.com/marcogdepinto/Emotion-Classification-Ravdess/blob/master/Mp4ToWav.py) is setting a frequency of 44100 (ffmpeg command -ar, more on https://ffmpeg.org/ffmpeg-all.html ) that corresponds to 44,1kHz. The original frequency of the audio files in the dataset is 48kHz (source: https://zenodo.org/record/1188976#.XYendZMzZN1 ), so the features created by librosa MFCC should be different. Unfortunately I have never had time to make a test on two files to check if the generated arrays have different values from the original ones. If the values are different, this could be considered as a data augmentation approach (e.g. what is done when rotating pictures in case of computer vision problems ). Correct me if I am wrong here. On the other hand, if the array is the same you are right, when I have some time I will re-train the model without the audio extracted from the videos to review the changes (if you want to do the test yourself, that would be great!). I may also work on having a test set with the files of the last 3 actors and excluding those from the training. Only issue is that I do not have time to do this now honestly, hopefully in the next months I'll be able to code a different approach and compare results. |
Hi @marcogdepinto, you are right! Data augmentation would help a lot as long as the values are different and as long as it does not fundamentally change how the emotion is conveyed through audio. So someone should still consider what augmentations to do. In this particular case, if you have no duplicate data from the training set also in test set it should be fine. |
Hi @alezenonos @marcogdepinto , |
@EnisBerk @alezenonos no worries. Again, have no bandwidth to work on this now, also considering this will require huge refactoring (written in 2018 with VERY BAD code style, it is also necessary to migrate from jupyter to proper .py files, classes etc). Hope I'll be able to pick it at a certain point in the next months. Thanks both for the valuable input. |
@EnisBerk @alezenonos hey both, just an heads up. I found the time to do all the stuff above. This project just received a major refactoring. I have
Points 3-4 reduced accuracy to 80% but the model should be able to generalize better. Thank you both for having inspired the revamping of this project! |
Thank you for taking the time to improve the repository. I am sure it will be helpful to others. |
This is an inspiring piece of work and thank you for keeping it open source. I was just wondering whether it is demonstrating an overfitting situation. Specifically, audio from the video is the same as the speech files. This contaminates the test set with examples from the training set.
For example let's say you have X1,X2,X3,X4,X5 data points in training set and X2,X4,X5,X6 in test set. X2,X4,X5 are more likely to get right since already seen in the training phase and thus do not reflect the true predictive power of your models.
To validate this, we can either remove extracted audio from the video or make sure that both the audio extracted from video and speech audio file are in either train or test set together.
The text was updated successfully, but these errors were encountered: