Skip to content

Latest commit

 

History

History
34 lines (26 loc) · 955 Bytes

README.md

File metadata and controls

34 lines (26 loc) · 955 Bytes

Scripts for preprocessing audio-visual speech enhancement challenge (AVSEC) data

This script can be used to extract the following features

  • FaceMesh landmarks [1]
  • lip images using landmark
  • face embeddings using FaceNet [2]
  • lip embeddings using TCN [3]

Requirements

## CPU 
pip install -r requirements.txt

## GPU
pip install -r requirements_gpu.txt

## Apple Silicon
pip install -r requirements_mac.txt

Usage

python main.py --data-dir ./data/train/scenes \
               --save-dir ./preprocessed/train \
               --models-root ./models \
               --all-feat

References