Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network (Tensorflow implementation)
As a personal exercise on reading and implementing SOTA papers, I implemented one of the leading state-of-the-art papers in Facial Expression Recoginition (FER), Deep-Emotion. As far as I know, there is no Tensorflow implementation of the paper so decided to go with TF as my choice of framework.
There are, however, a couple Pytorch versions. The most popular of them is omarSayed7's non-official implementation of DeepEmotion2019. I forked and used it as a reference.
In a nutshell, the paper proposes an attentional CNN that predicts facial expressions by focussing a classifier layer on the most relevant portions of the input image. This attention mechanism is achieved using a Spatial Transformer Network or STN in short. The STN works by learning a set of 6 transformation prameters that is then used to perform an affine transformation of the input image. In this implementation, I used kevinzakka's library to perform the spatial transformation.
In the proposed model, a feature extractor works in parallel with the STN to generate a feature map that is fed to a classification layer for emotion inference.
The model architecture in the Pytorch implementation differs slightly with that described in the paper (in aspects like input image flow, kernel initialization, regularization, hyperparameters etc). I tried to mirror the paper as closely as possible and made suitable changes. In additon, I worked with a couple assumptions as I was unsure of certain specifics of the model architecture as described in the paper. For this reason, the implementation might not be exactly what the authors intended, however, I have added comments in the code at all such places explaining my reasons.
This implementation uses the following datasets:
Make sure you have the following libraries installed:
- tensorflow >= 2.13.0
- stn == 1.0.1
- pandas
- pillow
- tqdm
This repository is organized as follows:
main
: Contains setup for the dataset and training loop.deep_emotion
: Defines the model class.generate_data
: Sets up the dataset.
Clone the repository and follow these steps.
This repository was tested using python==3.9.12
and pip==24.0
on a Windows machine.
To setup the environment, create and activate a virtual environment (virtualenv --python=python3.9.12 venv | venv/Scripts/activate
) and run:
pip install -r requirements.txt
- Download the dataset from Kaggle.
- Decompress
train.csv
andtest.csv
into the./data
folder within the repo.
Open terminal and run:
python main.py [-s [True]] [-d [data_path]]
--setup Setup the dataset for the first time
--data Data folder that contains data files
For example,
python main.py -s True -d data
This will produce images out of the .csv files downloaded from Kaggle and split them into training and validation datasets.
Set hyperparameters
python main.py [-t] [--data [data_path]] [--hparams [hyperparams]]
[--epochs] [--learning_rate] [--batch_size]
--data Data folder that contains training and validation files
--train True when training
--hparams True when changing the hyperparameters
--epochs Number of epochs
--learning_rate Learning rate value
--batch_size Training/validation batch size
For example, to specify your own hyperparameters, run:
python main.py -t True -d data -hparams True --epochs 5 --learning_rate 0.005 --batch_size 32
To use default hyperparameters (as specified in the paper), run:
python main.py -t True -d data