Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion

Please replace the Table 6 in the paper

Please replace the Table 6 of the paper with this table.

Basic strucutre of the code

Inspiration from fairseq

This code strcuture is built on top of Faiseq interface
Fairseq is an open source project by FacebookAI team that combined different SOTA architectures for sequencial data processing
This also consist of SOTA optimizing mechanisms such as ealry stopage, warup learnign rates, learning rate shedulers
We are trying to develop our own architecture in compatible with fairseq interface.
For more understanding please read the paper published about Fairseq interaface.

Merging of our own architecture with Fairseq interface

This can be bit tricky in the beggining. First it is important to udnestand that Fairseq has built in a way that all architectures can be access through the terminal commands (args).
Since our architecture has lot of properties in tranformer architecture, we followed the a tutorial that describe to use Roberta for the custom classification task.
We build over archtiecture by inserting new stuff to following directories in Fairseq interfeace.
- fairseq/data
- fairseq/models
- fairseq/modules
- fairseq/tasks
- fairseq/criterions

Main scripts of the code

Our main scripts are categorized in to for parts

Custom dataloader for load raw audio, faceframes and text is in the fairseq/data/raw_audio_text_video_dataset.py
The task of the emotion prediction similar to other tasks such as translation is in the fairseq/tasks/emotion_prediction.py
The custom architecture of our model similar to roberta,wav2vec is in the fairseq/models/mulT_emo.py
To obtain Inter-Modal attention we modify the self attentional architecture a bit. They can be found in fairseq/modules/transformer_multi_encoder.py and fairseq/modules/transformer_layer.py
Finally the cutom loss function scripts cab be found it fairseq/criterions/emotion_prediction_cri.py

Prerequest models

Our model uses pretrained SSL methods to extract features. It is important to download those checkpoints prior to the trainig procedure. Please you the following links to downlaod the pretrained SSL models.

For audio fetures - wav2vec
For facial features - Fabnet
For sentence (text) features - Roberta

Training Command

python train.py --data ./T_data-old/mosei_sent --restore-file None --task emotion_prediction --reset-optimizer --reset-dataloader --reset-meters --init-token 0 --separator-token 2 --arch robertEMO_large --criterion emotion_prediction_cri --num-classes 1 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 --clip-norm 0.0 --lr 1e-03 --max-epoch 32 --best-checkpoint-metric loss --encoder-layers 2 --encoder-attention-heads 4 --max-sample-size 150000 --max-tokens 150000000 --batch-size 4 --encoder-layers-cross 2 --max-positions-t 512 --max-positions-a 936 --max-positions-v 301 --no-epoch-checkpoints --update-freq 2 --find-unused-parameters --ddp-backend=no_c10d --lr-scheduler reduce_lr_on_plateau --regression-target-mos

Validation Command

CUDA_VISIBLE_DEVICES=1 python validate.py --data ./T_data/emocap --path './checkpoints/checkpoint_best.pt' --task emotion_prediction --valid-subset test --batch-size 4

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
T_data		T_data
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
gpt2_bpe		gpt2_bpe
scripts		scripts
tests		tests
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MOSI.pdf		MOSI.pdf
README.md		README.md
best_r_54_test.txt		best_r_54_test.txt
dict.txt		dict.txt
emotion_data_preprocessing.py		emotion_data_preprocessing.py
encoder.json		encoder.json
encoding_pre.ipynb		encoding_pre.ipynb
eval_lm.py		eval_lm.py
fairseq-train.save		fairseq-train.save
fairseq.gif		fairseq.gif
fairseq_logo.png		fairseq_logo.png
figure.png		figure.png
generate.py		generate.py
hubconf.py		hubconf.py
hyperparam.txt		hyperparam.txt
imdb.py		imdb.py
interactive.py		interactive.py
preprocess.py		preprocess.py
results.txt		results.txt
score.py		score.py
setup.py		setup.py
train.py		train.py
validate.py		validate.py
vocab.bpe		vocab.bpe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion

Please replace the Table 6 in the paper

Basic strucutre of the code

Inspiration from fairseq

Merging of our own architecture with Fairseq interface

Main scripts of the code

Our main scripts are categorized in to for parts

Prerequest models

Our model uses pretrained SSL methods to extract features. It is important to download those checkpoints prior to the trainig procedure. Please you the following links to downlaod the pretrained SSL models.

Training Command

Validation Command

About

Releases

Packages

Languages

License

beeldengeluid/Self-Supervised-Embedding-Fusion-Transformer

Folders and files

Latest commit

History

Repository files navigation

Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion

Please replace the Table 6 in the paper

Basic strucutre of the code

Inspiration from fairseq

Merging of our own architecture with Fairseq interface

Main scripts of the code

Our main scripts are categorized in to for parts

Prerequest models

Our model uses pretrained SSL methods to extract features. It is important to download those checkpoints prior to the trainig procedure. Please you the following links to downlaod the pretrained SSL models.

Training Command

Validation Command

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages