Deep Learning Audio Clustering

Experimented with different audio feature extraction techniques and created a deep Learning based feature extraction method to cluster the audios based on their features.

Introduction

Deep neural networks are popular for various image processing or NLP tasks. In recent times, however, research focused on audio tasks using deep learning techniques has seen a surge. Some of the deep learning techniques have been adopted from image processing tasks, however audios are quite different as they are one-dimensional time series signal which is different from two-dimensional images. Deep learning methods with audio as input are important as audio is a very prevalent medium in our daily lives. In this project, the main objective was to train a deep neural network for the purpose of feature extraction, clustering or both. The aim was to use this network to successfully extract feature from audio samples and cluster them into 20 clusters. Clustering could be part of the network or a separate function could be used. For this project, I converted the audio samples into spectrograms and saved them as images. A sample spectogram can be seen below.

Model Structure

Then I used an autoencoder to compress and reconstruct the images. The autoencoder consists of the encoder and decoder. Following is the model structure.

The encoder and decoder are block based i.e. I split the original image (spectrogram) into equal sized blocks of size (144, 144, 3) and input those to the encoder. The encoder compresses the original block to size (9, 9, 1) which is a compression ratio of 0.0013. Follwing figure shows the image of size (288, 432, 3) being split into 6 equal sized blocks.

Clustering

Since there were 671 samples, my dataset consisted of 4026 total blocks. I then trained the autoencoder model to compress and reconstruct the spectrograms. I used the encoder to predict the original spectrograms. The output is the compressed version of the samples thereby extracting important features. I used these features to cluster the audio samples using K-Means algorithm into 20 possible clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
sample-input		sample-input
ML-Audio-Clustering-Report.pdf		ML-Audio-Clustering-Report.pdf
ML-Audio-Clustering-Test.ipynb		ML-Audio-Clustering-Test.ipynb
ML-Audio-Clustering-Train.ipynb		ML-Audio-Clustering-Train.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Audio Clustering

Introduction

Model Structure

Clustering

About

Releases

Packages

Languages

ridakn/Deep-Learning-Audio-Clustering

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Audio Clustering

Introduction

Model Structure

Clustering

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages