Skip to content

Sachini-Dissanayaka/HA-EEND

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Attention-based End-to-End Neural Diarization

This repository comprises source code for two main research objectives

  1. Combining various attention mechanisms to obtain a better model for two-speaker overlapping speech speaker diarization than the current state-of-the-art approaches.
    The following combined attention mechanisms have been employed in the work. Combined as well as single attention mechanisms can be obtained by commenting the respective lines of code from pytorch_backend/models.py
  • Self Attention + Local Dense Synthesizer Attention (HA-EEND)
  • External Attention + Local Dense Synthesizer Attention
  • Relative Attention + Local Dense Synthesizer Attention
  1. Experiments on the language dependency of EEND-based speaker diarization, and testing on combined datasets in both English and Sinhala languages

The repository largely references code from the following sources:

Directory Structure

├── egs : middle tier files                   
    ├── asr-sinhala/v1 : Modelling on Sinhala ASR and CALLSINHALA
        ├── conf : configuration files
        ├── local : locally used scripts and other files
        ├── cmd.sh : file that specifies job scheduling system
        ├── path.sh : path file
        ├── run.sh : train/infer/score model
        └── run_prepare_shared.sh : prepare data
    ├── callhome/v1 : CALLHOME test set
    ├── combined/v1 : Combined modelling on Sinhala ASR/LibriSpeech and test on CALLHOME
    └── librispeech/v1 : Modelling on LibriSpeech and CALLHOME
├── eend : backend files  
    └── pytorch_backend/models.py : specify different models to be trained on
└── tools : Kaldi setup       

Installing requirements and Setting-up

The research was conducted in the following environment

  • OS : Ubuntu 18.04 LTS
  • Memory:
    • For single multi-head layered encoder blocks: 8 CPUs, 32 GB RAM
    • For double multi-head layered encoder blocks: 16 CPUs, 64 GB RAM
  • Storage : 150-200 GB

The following requirements are to be installed

  • Anaconda
  • CUDA Toolkit
  • SoX tool

Follow the following steps to install all the requirements and get going on the project.

1. Install Anaconda

sudo apt-get update 
sudo apt-get install bzip2 libxml2-dev -y 
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh (use Anaconda latest version)
bash Anaconda3-2020.11-Linux-x86_64.sh
rm Anaconda3-2020.11-Linux-x86_64.sh
source .bashrc 

2. Install the required libraries

sudo apt install nvidia-cuda-toolkit -y
sudo apt-get install unzip gfortran python2.7 -y
sudo apt-get install automake autoconf sox libtool subversion -y
sudo apt-get update -y
sudo apt-get install -y flac

3. Clone the Git repository

git clone https://github.com/Sachini-Dissanayaka/HA-EEND.git 

4. Install Kaldi and Python environment

cd HA-EEND/tools/ 
make 

5. Install Pytorch

~/HA-EEND/tools/miniconda3/envs/eend/bin/pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

6. Add paths

export PYTHONPATH="${PYTHONPATH}:~/HA-EEND/"
export PATH=~/HA-EEND/tools/miniconda3/envs/eend/bin/:$PATH
export PATH=~/HA-EEND/eend/bin:~/HA-EEND/utils:$PATH
export KALDI_ROOT=~/HA-EEND/tools/kaldi
export PATH=~/HA-EEND/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/sph2pipe_v2.5:$KALDI_ROOT/tools/sctk/bin:~/HA-EEND:$PATH

Configuration

Modify egs/librispeech/v1/cmd.sh according to your job schedular.

Data Preparation

The following datasets were used in the experiments.

For tests with English data:
Move the datasets (LibriSpeech and CALLHOME) into a folder with path egs/librispeech/v1/data/local
Run the following commands

cd egs/librispeech/v1
./run_prepare_shared.sh

Run training, inference, and scoring

./run.sh

Reach us for any further clarifications

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published