Structured Prediction with Deep Value Networks

Implementation in python with PyTorch.
By Philippe Beardsell and Chih-Chao Hsu.

Implemented

Structured Prediction Energy Networks (SPEN) (Belanger & McCallum 2015)
(Implementation from David Belanger in Lua at https://github.com/davidBelanger/SPEN)
Built SPEN for multi-label classification/ image segmentation and image tagging.
Deep Value Networks (Gygli & al. 2017)
(Based on the TensorFlow implementation from the authors at https://github.com/gyglim/dvn)
Built DVN for multi-label classification/ image segmentation and image tagging.
Baseline models:

Feature network (SPEN):
Multi-layer perceptron that computes a feature representation of the inputs for multi-label classification.
Fully Convolutional Network (FCN) for Image Segmentation.
Unary model for image tagging: Pretrained AlexNet on ImageNet that was fine tuned on the MIRFLICKR25k dataset to make independent predictions for each tag of an image. Taken from the Deep Structured Prediction with Nonlinear Output Transformations paper(Graber & al 2018.)

Reproducibility

Multi-label classification on Bibtex

We could easily reproduce the authors' results with the DVN on Bibtex (F1 of 44.91% on the test set). We also achieved similar results for the SPEN model as the paper: a F1 Score of 41.6% on the test set, compared to 42.2% for the authors.

F1 Score (%) on the Bibtex dataset (higher is better):

Model	Ours	Paper
MLP	38.9	38.9
SPEN	41.6	42.2
DVN + Ground Truth	42.9	N/A
DVN + Adversarial	44.9	44.7

Image segmentation on the Weizmann Horses dataset

Dataset available at https://avaminzhang.wordpress.com/2012/12/07/%E3%80%90dataset%E3%80%91weizmann-horses/

IOU (%) on the Weizmann Horses dataset (higher is better):

Model	Ours	Paper
FCN	74.6	78.6
SPEN	73	N/A
DVN + Ground Truth	76	76.7
DVN + Adversarial	73	84.1

Image tagging on the MIRFLICKR25k dataset

Dataset available at http://press.liacs.nl/mirflickr/mirdownload.html
We compare our results to the NLTop model from Deep Structured Prediction with Nonlinear Output Transformations from Graber & al. (2018). We didn't spend much time doing hyperparameter optimization and trying different alternatives which might explain our poor results compared to a simple unary model trained to make independent predictions for each label.

Hamming Loss on the validation set of the Flickr dataset (lower is better):

Model	Ours (10k training set)	Paper (10k training set)	Ours (1k training set)
Unary (Pretrained AlexNet)	2.16	2.18	2.69
SPEN	2.24	N/A	2.51
DVN + Ground Truth	2.22	N/A	2.47
DVN + Adversarial	2.3	N/A	N/A
NLTop (Graber & al. 2018)	N/A	1.98	N/A

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
data		data
figures		figures
saved_results		saved_results
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structured Prediction with Deep Value Networks

Implemented

Reproducibility

Multi-label classification on Bibtex

Image segmentation on the Weizmann Horses dataset

Image tagging on the MIRFLICKR25k dataset

About

Releases

Packages

Contributors 2

Languages

philqc/deep-value-networks-pytorch

Folders and files

Latest commit

History

Repository files navigation

Structured Prediction with Deep Value Networks

Implemented

Reproducibility

Multi-label classification on Bibtex

Image segmentation on the Weizmann Horses dataset

Image tagging on the MIRFLICKR25k dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages