Video Transcribing Using Lip Reading

Problem Statement

The goal of this project is to predict sentences through lip reading, utilizing the GRID dataset.

demo.mp4

Given that our dataset comprises stationary, front-facing camera videos with minimal movement, we opted for manual frame cropping.
We also verified our results with a pretrained dlib model can dynamically crop the mouth region and observed similar results.
Example of cropped frames:

The model architecture is inspired from LipNet, focusing on sentence prediction via lip reading.
To train the model, run the following command in your terminal:
```
python lipreading.py
```
After training, the model is saved in /results as checkpoint.pth, and convergence plots are saved in the root folder as convergence_plots.png.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
API		API
images		images
pretrain/dlib		pretrain/dlib
Readme.md		Readme.md
lipread.slurm		lipread.slurm
lipreading.py		lipreading.py
models.py		models.py
report.pdf		report.pdf