In this project, we introduce an innovative approach to emotion detection in conversational speech. Utilizing a multi-modal framework that integrates both audio and textual data, our system is capable of discerning emotions within multi-speaker dialogues. This is achieved through the extraction of audio features using pre-trained models and processing of text embeddings, coupled with an attentive bi-directional GRU network. This network dynamically captures the context and the inter-speaker emotional influences.
- Multi-modal emotion recognition leveraging audio and text data.
- Utilization of pre-trained models for robust audio feature extraction.
- Implementation of an attentive bi-directional GRU for contextual understanding.
- Evaluation on the MELD dataset, demonstrating effective emotion detection.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Python 3.x
- Pip (Python Package Installer)
- Clone the repository
git clone https://github.com/siddhantpathakk/emotion-rnn.git
- Navigate to the project directory:
cd emotion-rnn
- Install the required packages (preferably using Conda/Miniconda):
conda env myenv --file environment.yml
To see the training notebook, please refer to ./audio/train.ipynb
This project uses the MELD conversational dataset for training and testing. Ensure that you have the dataset downloaded and placed in the audio
directory. We used preprocessed features available here.
audio/
: Contains audio data and extracted features.src/
: Source code for the project including model definitions and training scripts.inference.py
: Script for performing inference.model.py
: Defines the bi-directional GRU model.dataloader.py
: Code for loading and preprocessing data.attention.py
: Implementation of the attention mechanisms.
train.ipynb
: Jupyter notebook for training the model.infer.ipynb
: Jupyter notebook for using the model forexploratory_data_analysis.ipynb
: Jupyter notebook for performing standard EDA about the MELD dataset.environment.yml
: Conda environment file for setting up the Python environment.LICENSE
: The license under which this project is distributed.README.md
: This file, describing the project and how to use it.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
- Siddhant Pathak - Initial work - @siddhantpathakk
This project is licensed under the MIT License - see the LICENSE file for details.
Our methodology opens new avenues in speech emotion recognition by focusing on the nuances of conversational context and speaker interactions. By implementing dual attention mechanisms and a bi-directional GRU, our system adeptly identifies emotional cues