This repository is the official implementation of Towards Voice Reconstruction from EEG during Imagined Speech. This paper is published in AAAI 2023.
Y.-E. Lee, S.-H. Lee, S.-H Kim, and S.-W. Lee, "Towards Voice Reconstruction from EEG during Imagined Speech," AAAI Conference on Artificial Intelligence (AAAI), 2023.
All algorithm are developed in Python 3.8.
To install requirements:
pip install -r requirements.txt
To train the model for spoken EEG in the paper, run this command:
python train.py --vocoder_pre pretrained_model/UNIVERSAL_V1/g_02500000 --task SpokenEEG_vec --batch_size 20 --pretrain False --prefreeze False
To train the model for Imagined EEG with pretrained model of spoken EEG in the paper, run this command:
python train.py --vocoder_pre pretrained_model/UNIVERSAL_V1/g_02500000 --trained_model pretrained_model/SpokenEEG/ --task ImaginedEEG_vec --batch_size 20 --pretrain True --prefreeze True
To evaluate the trained model for spoken EEG on an example data, run:
python eval.py --trained_model pretrained_model/SpokenEEG/ --vocoder_pre pretrained_model/UNIVERSAL_V1/g_02500000 --task SpokenEEG_vec --batch_size 5
To evaluate the trained model for Imagined EEG on an example data, run:
python eval.py --trained_model pretrained_model/ImaginedEEG/ --vocoder_pre pretrained_model/UNIVERSAL_V1/g_02500000 --task ImaginedEEG_vec --batch_size 5
You can download pretrained models here:
- Pretrained model trained on participant 1
Y.-E. Lee, S.-H. Lee, S.-H Kim, and S.-W. Lee, "Towards Voice Reconstruction from EEG during Imagined Speech," AAAI Conference on Artificial Intelligence (AAAI), 2023.
-
We propose a generative model based on multi-receptive residual modules with recurrent neural networks that can extract frequency characteristics and sequential information from neural signals, to generate speech from non-invasive brain signals.
-
The fundamental constraint of the imagined speech-based BTS system lacking the ground truth voice have been addressed with the domain adaptation method to link the imagined speech EEG, spoken speech EEG, and the spoken speech audio.
-
Unseen words were able to be reconstructed from the pre-trained model by using character-level loss to adapt various phonemes. This implies that the model could learn the phoneme level information from the brain signal, which displays the potential of robust speech generation by training only several words or phrases.