Deep Audiobook Tuner (DAT)

A system that generates an apt, emotionally pertinent, unique sequences of music for audiobooks based on the current narrative for the purpose of ameliorating user-experience while being accurate, cost-efficient, and time saving.

This repository is about the innerworkings of DAT. Check out the Flask application made for this project at https://github.com/jendcruz22/DeepAudiobookTunerApp

This project was made in collaboration with:

1. About

Audiobooks are being used on a regular basis by hundreds of users. However most audiobooks do have background music or in some cases very generic soundtracks. This system aims to develop unique and emotionally relevant soundtracks for audiobook recordings.

To extract the sentiments from the audiobook, we use a hybrid sentiment analysis approach consisting of text as well as audio sentiment analysis. The text sentiment model is a product of transfer learning on Google's BERT language model. Both the text as well as the audio model have been trained on four emotions: Anger, Happiness, Neutral and Sadness.

In order to perform text sentiment analysis, we require the transcripts of the audiobook. We are using IBM's Watson Speech to Text to transcribe the audiobooks.

The audio sentiment model is a fully connected Dense Neural Network with four hidden layers. It takes in audio features as its input which are extracted from the audiobooks using Librosa.

For music generation, we've implemented bearpelican's approach. They created a music generation model using transformers and built using the fastai library. We are using their MusicTransformer model which uses Transformer-XL to take a sequence of music notes and predict the next note. A huge Thank you! to bearpelican and do check out their project.

Given below is the workflow of our system:

2. Folder Structure

deep-audiobook-tuner
├───assets
│   ├───audiobooks
│   ├───audio_sentiment_data_v1
│   ├───audio_sentiment_data_v2
│   │   ├───datasets
│   │   ├───data_features
│   │   ├───models
│   │   └───pickles
│   ├───music_generation_data
│   │   ├───datasets
|   |   |   └───vg-midi-annotated
│   │   ├───models
│   │   └───pickles
│   ├───temp
│   └───text_sentiment_data
│       ├───datasets
│       └───models
|
├───deepaudiobooktuner
│   ├───music_generation
│   │   └───music_transformer
│   ├───sentiment_analysis
│   └───utils
|
├───examples
|
├───images
|
├───notebooks
│   ├───demo
│   ├───music_generation
│   └───sentiment_analysis
│       ├───audio_segmentation
│       ├───audio_sentiment_analysis_v1
│       │   └───feature_ext_and_dataprep
│       ├───audio_sentiment_analysis_v2
│       │   └───feature_ext_and_dataprep
│       ├───audio_transcription
│       ├───text_sentiment_analysis
│       └───text_sentiment_analysis_v2
|
└───tests

3. Installation

Install the requirements for Tensorflow before you run the following commands.

Run pip install -r requirements.txt to install all the required libraries (python version = 3.7)

Or

Create a Conda environment: conda env create -f environment.yml
(This method requires Tensorflow 2.4 to be installed separately in the environment.
Run conda activate deepaudiobooktuner and pip install tensorflow==2.4.1)

Additional requirements:

Ffmpeg is available here.
The package midi2audio requires a sound font which can be downloaded here. The sound font should be place in deep-audiobook-tuner/assets/music_generation_data/soundfont/ (Refer the folder structure)

4. Setup

To run this project the following API key and models are required.

Transcription API key

The transcription process is done using a cloud service, specifically IBM's Watson Speech To Text. In order to use this service an API key is required. Create a free account and obtain your API key and URL. These values are to be saved in a file called .env as shown here api_key = 'your_api_key' url = 'your_url' . Keep this file in the root directory.

Music generation model

The music generation model trained by bearpelican is available at here. This model is to be placed in deep-audiobook-tuner/assets/music_generation_data/models/ (Refer the folder structure)

Text sentiment analysis model

A pre-trained text sentiment analysis model is available here. This model is to be placed in deep-audiobook-tuner/assets/text_sentiment_data/models/neubias_bert_model/ (Refer the folder structure)

5. Datasets used

Text Sentiment Analysis

The DailyDialog, the ISEAR and the Emotion-Stimulus datasets were mixed together to create a dataset with 4 labels: Anger, Happiness, Neutrality and Sadness. We trained, validated and tested our model on these datasets and the accuracy obtained is discussed in the results section.
Audio Sentiment Analysis

A combination of 3 datasets were used. The TESS dataset, the RAVDESS dataset and the SAVEE dataset. We trained our model on these datasets for the following emotions: Anger, Happiness, Neutral and Sadness. The model was then tested and validated. The accuracy obtained is discussed in the results section.
Music Generation

We used a pre-trained model for music generation, but we required our model to generate music based on the emotion. For this, we built a tiny dataset of video-game piano music that was hand-labeled according to the emotions that our system was using. The music generating model uses this dataset as its input. The data set is located here deep-audiobook-tuner/assets/music_generation_data/datasets/vg-midi-annotated (Refer the folder structure)

6. Source Code

deepaudiobooktuner/
sentiment_analysis/ - submodule for analyzing sentiment from the audiobook.
music_generation/ - submodule for generating emotional music.
deep_audiobook_tuner.py - consists of all the functions need to run the application.

7. Results

Some examples of our system are available in the examples directory.

Given below are the accuracy metrics of our sentiment analysis models.

Text-Based-Sentiment-Analysis	Audio-Based-Sentiment-Analysis

8. References

[1] Google's BERT model
[2] Ktrain wrapper for Keras
[3] Speech-Emotion-Analyzer
[4] Musicautobot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Audiobook Tuner (DAT)

Table of Contents:

1. About

2. Folder Structure

3. Installation

4. Setup

Transcription API key

Music generation model

Text sentiment analysis model

5. Datasets used

6. Source Code

7. Results

8. References

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
assets		assets
deepaudiobooktuner		deepaudiobooktuner
examples		examples
images		images
notebooks		notebooks
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

danlobo1999/deep-audiobook-tuner

Folders and files

Latest commit

History

Repository files navigation

Deep Audiobook Tuner (DAT)

Table of Contents:

Transcription API key

Music generation model

Text sentiment analysis model

About

Resources

Stars

Watchers

Forks

Languages