XFlow

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification
IEEE Transactions on Neural Networks and Learning Systems 2019, IEEE ICDL-EPIROB Workshop on Computational Models for Crossmodal Learning (CMCML) 2017, ARM Research Summit 2017
Cătălina Cangea, Petar Veličković, Pietro Liò

We propose XFlow, cross-modal deep learning architectures that allow for dataflow between several feature extractors. Our models derive more interpretable features and achieve better performances than models which do not exchange representations. They represent a novel method for performing cross-modality before features are learned from individual modalities, usefully exploiting correlations between audio and visual data, which have a different dimensionality and are nontrivially exchangeable. We also provide the research community with Digits, a new dataset consisting of three data types extracted from videos of people saying the digits 0-9. Results show that both cross-modal architectures outperform their baselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits datasets, achieving state-of-the-art results.

Getting started

$ git clone https://github.com/catalina17/XFlow
$ virtualenv -p python3 xflow
$ source xflow/bin/activate
$ pip install tensorflow-gpu==1.8.0
$ pip install keras==2.1.4

Dataset

The Digits benchmark data can be found here. After expanding the archive in a specific directory, please update BASE_DIR (declared in Datasets/data_config.py) with that directory.

Running the models

The script eval.py contains command-line arguments for models and datasets. For example, you can run the {CNN x MLP}--LSTM baseline on Digits as follows:

CUDA_VISIBLE_DEVICES=0 python eval.py --model=cnn_mlp_lstm_baseline --dataset=digits --batch_size=64

Citation

Please cite us if you get inspired by or use XFlow and/or the Digits dataset:

@ARTICLE{8894404,
  author={C. {Cangea} and P. {Veličković} and P. {Liò}},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  title={XFlow: Cross-Modal Deep Neural Networks for Audiovisual Classification},
  year={2019},
  volume={}, number={}, pages={1-10},
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Datasets		Datasets
Models		Models
images		images
README.md		README.md
eval.py		eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XFlow

Getting started

Dataset

Running the models

Citation

About

Releases

Packages

Languages

catalina17/XFlow

Folders and files

Latest commit

History

Repository files navigation

XFlow

Getting started

Dataset

Running the models

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages