The aim of this project is to build various models to classify a given audio input as a music or speech file. The dataset used was obtained from Marsyas. It contains 64 samples of each, speech and music. The mentioned features were extracted from the audio files using librosa. Scipy was used to build the models. The parameters of the model were fine tuned to get the best results.
Dataset used: "http://marsyas.info/downloads/datasets.html".
Research Paper Referred: "https://link.springer.com/article/10.1155/2009/239892".
- Standard deviation of energy.
- Mean value and standard deviation of difference energy.
- Standard deviation of autocorrelation.
- Standard deviation of autocorrelation difference.
- Mean and standard deviation of difference of 9th, 7th, 4th Mel Frequency Cepstrum Coefficients.
- Low Short time Energy ratio
- K-Nearest Neighbour
- Decision Tree
- SVC (kernel: linear)
- SVC (kernel: rbf)
- Logistic Regression
- Naive Bayes
- Ensemble-Random Forest
- numpy for array related operations and pandas.
- scikit for built in models.
- librosa
- spyder
- Bhargav S (Mentor)
- Skanda U
- Rahul Gite
- Abhishek Ranjan