Skip to content

postech-ami/3d-talking-head-av-guidance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert (INTERSPEECH 24)

This repository contains the official implementation of the INTERSPEECH 2024 paper, "Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert"

Getting Started

Installation

This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.3, and Pytorch 1.10.0.

Clone this repo:

git clone https://github.com/postech-ami/3d-talking-head-av-guidance
cd 3d-talking-head-av-guidance

Make a virtual environment:

conda create --name av_guidance python=3.8 -y
conda activate av_guidance
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install pytorch-lightning==1.5.10
pip install hydra-core --upgrade
conda install -c conda-forge ffmpeg
pip install -r requirements.txt 

Compile and install psbody-mesh package: MPI-IS/mesh:

BOOST_INCLUDE_DIRS=/usr/lib/x86_64-linux-gnu make all

Lip reading expert

For your convenience, download the model weight here, and fill in the configuration lipreader_path with the path of model.

Clone the repository Auto-AVSR in this directory and update the import lines of the below files as auto_avsr/[existing_imports]. For instance,

# BEFORE
from espnet.nets.pytorch_backend.e2e_asr_conformer_av import E2E
# AFTER
from auto_avsr.espnet.nets.pytorch_backend.e2e_asr_conformer_av import E2E
List of files
espnet/nets/pytorch_backend/backbones/modules/resnet.py
espnet/nets/pytorch_backend/backbones/modules/resnet1d.py

espnet/nets/pytorch_backend/backbones/conv1d_extractor.py
espnet/nets/pytorch_backend/backbones/conv3d_extractor.py

espnet/nets/pytorch_backend/transformer/add_sos_eos.py
espnet/nets/pytorch_backend/transformer/decoder.py
espnet/nets/pytorch_backend/transformer/decoder_layer.py
espnet/nets/pytorch_backend/transformer/encoder_layer.py
espnet/nets/pytorch_backend/transformer/encoder.py

espnet/nets/pytorch_backend/ctc.py
espnet/nets/pytorch_backend/e2e_asr_conformer_av.py
espnet/nets/pytorch_backend/e2e_asr_conformer.py ??
espnet/nets/pytorch_backend/nets_utils.py

espnet/nets/scorers/ctc.py
espnet/nets/scorers/length_bonus.py

espnet/nets/batch_beam_search.py
espnet/nets/beam_search.py

lightning_av.py

Datasets

VOCASET

Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder vocaset/. Download "FLAME_sample.ply" from VOCA and put it in vocaset/. Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy and vocaset/wav folder using a script.

Download FLAME model and fill the configuration obj_filename in config/vocaset.yaml with the path of head_template.obj.

BIWI

Follow the instructions of CodeTalker to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy and BIWI/wav, and the templates.pkl into BIWI/ folder.

To get the vertex indices of lip geion, download indices list and locate it at BIWI/lve.txt.

2024.08.24. | Unfortunately, BIWI dataset is not available now.

Training and Testing on VOCASET

  • To train the model on VOCASET, run:
python main.py --dataset vocaset

The trained models will be saved to outputs/model.

  • To test the model on VOCASET, run:
python test.py --dataset vocaset --test_model_path [path_of_model_weight]

The results will be saved to outputs/pred. You can download the pretrained model from faceformer_avguidance_vocaset.pth.

  • To visualize the results, run:
python render.py --dataset vocaset

The results will be saved to outputs/video.

Training and Testing on BIWI

  • To train the model on BIWI, run:
python main.py --dataset BIWI

The trained models will be saved to outputs/model

  • To test the model on BIWI, run:
python test.py --dataset BIWI --test_model_path [path_of_model_weight]

The results will be saved to outputs/pred. You can download the pretrained model from faceformer_avguidance_biwi.pth.

  • To visualize the results, run:
python render.py --dataset BIWI

The results will be saved to outputs/video.

Citation

If you find this code useful for your work, please consider citing:

@inproceedings{eungi24_interspeech,
  title     = {Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert},
  author    = {Han EunGi and Oh Hyun-Bin and Kim Sung-Bin and Corentin {Nivelet Etcheberry} and Suekyeong Nam and Janghoon Ju and Tae-Hyun Oh},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {2940--2944},
  doi       = {10.21437/Interspeech.2024-1595},
  issn      = {2958-1796},
}

Acknowledgement

We heavily borrow the code from FaceFormer. We sincerely appreciate those authors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages