Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert (INTERSPEECH 24)

Project Page | Paper

This repository contains the official implementation of the INTERSPEECH 2024 paper, "Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert"

Getting Started

Installation

This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.3, and Pytorch 1.10.0.

Clone this repo:

git clone https://github.com/postech-ami/3d-talking-head-av-guidance
cd 3d-talking-head-av-guidance

Make a virtual environment:

conda create --name av_guidance python=3.8 -y
conda activate av_guidance
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install pytorch-lightning==1.5.10
pip install hydra-core --upgrade
conda install -c conda-forge ffmpeg
pip install -r requirements.txt

Compile and install psbody-mesh package: MPI-IS/mesh:

BOOST_INCLUDE_DIRS=/usr/lib/x86_64-linux-gnu make all

Lip reading expert

For your convenience, download the model weight here, and fill in the configuration lipreader_path with the path of model.

Clone the repository Auto-AVSR in this directory and update the import lines of the below files as auto_avsr/[existing_imports]. For instance,

# BEFORE
from espnet.nets.pytorch_backend.e2e_asr_conformer_av import E2E
# AFTER
from auto_avsr.espnet.nets.pytorch_backend.e2e_asr_conformer_av import E2E

List of files

espnet/nets/pytorch_backend/backbones/modules/resnet.py
espnet/nets/pytorch_backend/backbones/modules/resnet1d.py

espnet/nets/pytorch_backend/backbones/conv1d_extractor.py
espnet/nets/pytorch_backend/backbones/conv3d_extractor.py

espnet/nets/pytorch_backend/transformer/add_sos_eos.py
espnet/nets/pytorch_backend/transformer/decoder.py
espnet/nets/pytorch_backend/transformer/decoder_layer.py
espnet/nets/pytorch_backend/transformer/encoder_layer.py
espnet/nets/pytorch_backend/transformer/encoder.py

espnet/nets/pytorch_backend/ctc.py
espnet/nets/pytorch_backend/e2e_asr_conformer_av.py
espnet/nets/pytorch_backend/e2e_asr_conformer.py ??
espnet/nets/pytorch_backend/nets_utils.py

espnet/nets/scorers/ctc.py
espnet/nets/scorers/length_bonus.py

espnet/nets/batch_beam_search.py
espnet/nets/beam_search.py

lightning_av.py

Datasets

VOCASET

Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder vocaset/. Download "FLAME_sample.ply" from VOCA and put it in vocaset/. Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy and vocaset/wav folder using a script.

Download FLAME model and fill the configuration obj_filename in config/vocaset.yaml with the path of head_template.obj.

BIWI

Follow the instructions of CodeTalker to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy and BIWI/wav, and the templates.pkl into BIWI/ folder.

To get the vertex indices of lip geion, download indices list and locate it at BIWI/lve.txt.

2024.08.24. | Unfortunately, BIWI dataset is not available now.

Training and Testing on VOCASET

To train the model on VOCASET, run:

python main.py --dataset vocaset

The trained models will be saved to outputs/model.

To test the model on VOCASET, run:

python test.py --dataset vocaset --test_model_path [path_of_model_weight]

The results will be saved to outputs/pred. You can download the pretrained model from faceformer_avguidance_vocaset.pth.

To visualize the results, run:

python render.py --dataset vocaset

The results will be saved to outputs/video.

Training and Testing on BIWI

To train the model on BIWI, run:

python main.py --dataset BIWI

The trained models will be saved to outputs/model

To test the model on BIWI, run:

python test.py --dataset BIWI --test_model_path [path_of_model_weight]

The results will be saved to outputs/pred. You can download the pretrained model from faceformer_avguidance_biwi.pth.

To visualize the results, run:

python render.py --dataset BIWI

The results will be saved to outputs/video.

Citation

If you find this code useful for your work, please consider citing:

@inproceedings{eungi24_interspeech,
  title     = {Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert},
  author    = {Han EunGi and Oh Hyun-Bin and Kim Sung-Bin and Corentin {Nivelet Etcheberry} and Suekyeong Nam and Janghoon Ju and Tae-Hyun Oh},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {2940--2944},
  doi       = {10.21437/Interspeech.2024-1595},
  issn      = {2958-1796},
}

Acknowledgement

We heavily borrow the code from FaceFormer. We sincerely appreciate those authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert (INTERSPEECH 24)

Project Page | Paper

Getting Started

Installation

Lip reading expert

Datasets

VOCASET

BIWI

Training and Testing on VOCASET

Training and Testing on BIWI

Citation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
data		data
models		models
run		run
utils		utils
.gitignore		.gitignore
README.md		README.md
render.py		render.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

postech-ami/3d-talking-head-av-guidance

Folders and files

Latest commit

History

Repository files navigation

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert (INTERSPEECH 24)

Project Page | Paper

Getting Started

Installation

Lip reading expert

Datasets

VOCASET

BIWI

Training and Testing on VOCASET

Training and Testing on BIWI

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages