Model Zoo

Check out the model zoo documentation for details.

To acquire a model:

download the model gist by ./scripts/download_model_from_gist.sh <gist_id> <dirname> to load the model metadata, architecture, solver configuration, and so on. (<dirname> is optional and defaults to caffe/models).
download the model weights by ./scripts/download_model_binary.py <model_dir> where <model_dir> is the gist directory from the first step.

or visit the [model zoo documentation] (https://caffe.berkeleyvision.org/model_zoo.html) for complete instructions.

Table of Contents

Berkeley-trained models
Network in Network model
Models from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets"
Models used by the VGG team in ILSVRC-2014
Places-CNN model from MIT.
GoogLeNet GPU implementation from Princeton.
Fully Convolutional Networks for Semantic Segmentation (FCNs)
CaffeNet fine-tuned for Oxford flowers dataset
CNN Models for Salient Object Subitizing.
Deep Learning of Binary Hash Codes for Fast Image Retrieval
Places_CNDS_models on Scene Recognition
Models for Age and Gender Classification.
More Models for Age and Gender Classification.
GoogLeNet_cars on car model classification
ParseNet: Looking wider to see better
SegNet and Bayesian SegNet
Conditional Random Fields as Recurrent Neural Networks
Holistically-Nested Edge Detection
CCNN: Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
Facial Landmark Detection with Tweaked Convolutional Neural Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
ResNets: Deep Residual Networks from MSRA at ImageNet and COCO 2015
Pascal VOC 2012 Multilabel Classification Model
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
Mixture DCNN
CNN Object Proposal Models for Salient Object Detection
Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled
Mulimodal Compact Bilinear Pooling for VQA
Pose-Aware CNN Models (PAMs) for Face Recognition
Learning Structured Sparsity in Deep Neural Networks
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
Inception-BN full ImageNet model
ResFace101: ResNet-101 for Face Recognition
DeepYeast
ImageNet pre-trained models with batch normalization
ResNet-101 for regressing 3D morphable face models (3DMM) from single images
Cascaded Fully Convolutional Networks for Biomedical Image Segmentation
Deep Networks for Earth Observation
Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks
Striving for Simplicity: The All Convolutional Net
VGG 4x without degradation: Channel Pruning for Accelerating Very Deep Neural Networks
Using Ranking-CNN for Age Estimation
Lets Keep it Simple: Using Simple Architectures to Outperform Deeper and More Complex Architectures
Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet

Berkeley-trained models

Finetuning on Flickr Style: same as provided in models/, but listed here as a Gist for an example.
BVLC GoogleNet: models/bvlc_googlenet

Network in Network model

The Network in Network model is described in the following ICLR-2014 paper:

Network In Network
M. Lin, Q. Chen, S. Yan
International Conference on Learning Representations, 2014 (arXiv:1409.1556)

please cite the paper if you use the models.

Models:

NIN-Imagenet: a small(29MB) model for imagenet, yet performs slightly better than AlexNet, and fast to train. (Note: a more caffe-compatible version with correct convolutional weights shape: https://drive.google.com/folderview?id=0B0IedYUunOQINEFtUi1QNWVhVVU&usp=drive_web)
NIN-CIFAR10: NIN model on CIFAR10, originally published in the paper Network In Network. The error rate of this model is 10.4% on CIFAR10.

Models from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets"

The models are trained on the ILSVRC-2012 dataset. The details can be found on the project page or in the following BMVC-2014 paper:

Return of the Devil in the Details: Delving Deep into Convolutional Nets
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman
British Machine Vision Conference, 2014 (arXiv ref. cs1405.3531)

Please cite the paper if you use the models.

Models:

VGG_CNN_S: 13.1% top-5 error on ILSVRC-2012-val
VGG_CNN_M: 13.7% top-5 error on ILSVRC-2012-val
VGG_CNN_M_2048: 13.5% top-5 error on ILSVRC-2012-val
VGG_CNN_M_1024: 13.7% top-5 error on ILSVRC-2012-val
VGG_CNN_M_128: 15.6% top-5 error on ILSVRC-2012-val
VGG_CNN_F: 16.7% top-5 error on ILSVRC-2012-val

Models used by the VGG team in ILSVRC-2014

The models are the improved versions of the models used by the VGG team in the ILSVRC-2014 competition. The details can be found on the project page or in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556

Please cite the paper if you use the models.

Models:

16-layer: 7.5% top-5 error on ILSVRC-2012-val, 7.4% top-5 error on ILSVRC-2012-test
19-layer: 7.5% top-5 error on ILSVRC-2012-val, 7.3% top-5 error on ILSVRC-2012-test

In the paper, the models are denoted as configurations D and E, trained with scale jittering. The combination of the two models achieves 7.1% top-5 error on ILSVRC-2012-val, and 7.0% top-5 error on ILSVRC-2012-test.

Places-CNN model from MIT.

Places CNN is described in the following NIPS 2014 paper:

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva
Learning Deep Features for Scene Recognition using Places Database.
Advances in Neural Information Processing Systems 27 (NIPS) spotlight, 2014.

The project page is here

Models:

Places205-AlexNet: CNN trained on 205 scene categories of Places Database (used in NIPS'14) with ~2.5 million images. The architecture is the same as Caffe reference network.
Hybrid-CNN: CNN trained on 1183 categories (205 scene categories from Places Database and 978 object categories from the train data of ILSVRC2012 (ImageNet) with ~3.6 million images. The architecture is the same as Caffe reference network.
Places205-GoogLeNet: GoogLeNet CNN trained on 205 scene categories of Places Database. It is used by Google in the deep dream visualization

GoogLeNet GPU implementation from Princeton.

We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constraint by accumulating gradients over two training iterations.

Please check http://3dvision.princeton.edu/pvt/GoogLeNet/ for more information. Pre-trained models on ImageNet and Places, and the training code are available for download.
Make sure cls2_fc2 and cls3_fc have num_output = 1000 in the prototxt. Otherwise, the trained model would crash on test.

Fully Convolutional Networks for Semantic Segmentation (FCNs)

These models are described in the paper:

Fully Convolutional Models for Semantic Segmentation
Jonathan Long*, Evan Shelhamer*, Trevor Darrell
CVPR 2015
arXiv:1411.4038

Details, model definitions, pre-trained weights, and code are public on github: https://github.com/shelhamer/fcn.berkeleyvision.org.

These models are compatible with Caffe master, unlike earlier FCNs that required a pre-release branch (note: this reference edition of the models is still in progress and not all of the models have yet been ported to master). The models are available under the same license as the Caffe-bundled models (i.e., for unrestricted use; see http://caffe.berkeleyvision.org/model_zoo.html#bvlc-model-license).

CaffeNet fine-tuned for Oxford flowers dataset

https://gist.github.com/jimgoo/0179e52305ca768a601f

The is the reference CaffeNet (modified AlexNet) fine-tuned for the Oxford 102 category flower dataset. The number of outputs in the inner product layer has been set to 102 to reflect the number of flower categories. Hyperparameter choices reflect those in Fine-tuning CaffeNet for Style Recognition on “Flickr Style” Data. The global learning rate is reduced while the learning rate for the final fully connected is increased relative to the other layers.

After 50,000 iterations, the top-1 error is 7% on the test set of 1,020 images.

I0215 15:28:06.417726  6585 solver.cpp:246] Iteration 50000, loss = 0.000120038
I0215 15:28:06.417789  6585 solver.cpp:264] Iteration 50000, Testing net (#0)
I0215 15:28:30.834987  6585 solver.cpp:315]     Test net output #0: accuracy = 0.9326
I0215 15:28:30.835072  6585 solver.cpp:251] Optimization Done.
I0215 15:28:30.835083  6585 caffe.cpp:121] Optimization Done.

CNN Models for Salient Object Subitizing.

CNN subitizing models described in the following papers (project page):

Salient Object Subitizing
J. Zhang, S. Ma, M. Sameki, S. Sclaroff, M. Betke, Z. Lin, X. Shen, B. Price and R. Mech.
CVPR, 2015.
http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf

Salient Object Subitizing
J. Zhang, S. Ma, M. Sameki, S. Sclaroff, M. Betke, Z. Lin, X. Shen, B. Price and R. Mech.
arXiv, 2016.
http://arxiv.org/abs/1607.07525

Models:

GoogleNet: CNN model finetuned on the Extended Salient Object Subitizing dataset (~11K images) and synthetic images. This model significantly improves over our previous models. Recommended.
AlexNet: CNN model finetuned on our initial Salient Object Subitizing dataset (~5500 images). The architecture is the same as the Caffe reference network.
VGG16: CNN model finetuned on our initial Salient Object Subitizing dataset (~5500 images).

Deep Learning of Binary Hash Codes for Fast Image Retrieval

We present an effective deep learning framework to create the hash-like binary codes for fast image retrieval. The details can be found in the following "CVPRW'15 paper":

Deep Learning of Binary Hash Codes for Fast Image Retrieval
K. Lin, H.-F. Yang, J.-H. Hsiao, C.-S. Chen
CVPR 2015, DeepVision workshop

please cite the paper if you use the model:

caffe-cvprw15: See our code release on Github, which allows you to train your own deep hashing model and create binary hash codes.
CIFAR10-48bit: Proposed 48-bits CNN model trained on CIFAR10.

Places_CNDS_models on Scene Recognition

Places-CNDS-8 is a "8conv3fc layer" deep Convolutional neural Networks model trained on MIT Places Dataset with Deep Supervision.

The details of training this model are described in the following report. Please cite this work if the model is useful for you.

Training Deeper Convolutional Networks with Deep Supervision
L.Wang, C.Lee, Z.Tu, S. Lazebnik, arXiv:1505.02496, 2015

Models for Age and Gender Classification.

Age/Gender.net are models for age and gender classification trained on the Adience-OUI dataset. See the Project page.

The models are described in the following paper:

Age and Gender Classification using Convolutional Neural Networks
Gil Levi and Tal Hassner
IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG),
at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, June 2015

If you find our models useful, please add suitable reference to our paper in your work.

More Models for Age and Gender Classification.

Additional models for age and gender recognition, trained on the Adience benchmark dataset dataset are provided on the github project page belonging to the ICCV 2017 workshop paper titled "Understanding and Comparing Deep Neural Networks for Age and Gender Classification".

The provided models are based on the architectures of the following networks:

Age/Gender.net from the previous post.
BVLC Reference Caffenet
BVLC Googlenet
VGG 16

Results obtained with these models are described in the paper:

@incproceedings{lapuschkin2017understanding,
    author = {Lapuschkin, Sebastian and Binder, Alexander and M\"uller, Klaus-Robert and Samek, Wojciech},
    title = {Understanding and Comparing Deep Neural Networks for Age and Gender Classification},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW)},
    pages = {1629-1638},
    year = {2017},
    doi = {10.1109/ICCVW.2017.191},
    url = {https://doi.org/10.1109/ICCVW.2017.191}

} If you find our models useful, please add suitable reference to our paper in your work.

GoogLeNet_cars on car model classification

GoogLeNet_cars is the GoogLeNet model pre-trained on ImageNet classification task and fine-tuned on 431 car models in CompCars dataset. It is described in the technical report. Please cite the following work if the model is useful for you.

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
L. Yang, P. Luo, C. C. Loy, X. Tang, arXiv:1506.08959, 2015

ParseNet: Looking wider to see better

These models are described in the paper:

ParseNet: Looking Wider to See Better
Wei Liu, Andrew Rabinovich, Alexander C. Berg
arXiv:1506.04579

To be able to train/eval ParseNet, you can refer to http://github.com/weiliu89/caffe/tree/fcn.

Modified VGGNet used to fine-tune ParseNet:

fully convolutional reduced VGGNet

Models trained on PASCAL (using extra data from Hariharan et al. and finetuned from the fully convolutional reduced VGGNet):

ParseNet PASCAL

SegNet and Bayesian SegNet

SegNet is a real-time semantic segmentation architecture for scene understanding. Code and trained models for SegNet and Bayesian SegNet are available.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla
arXiv preprint arXiv:1511.00561, 2015.

Conditional Random Fields as Recurrent Neural Networks

Code (with Matlab/Python API) and model are described in the ICCV 2015 paper

Conditional Random Fields as Recurrent Neural Networks
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. Torr
ICCV 2015.

Model is trained on Microsoft COCO and PASCAL (using extra data from Hariharan et al. and finetuned from the FCN-8s):

CRF-RNN PASCAL

Holistically-Nested Edge Detection

The model and code provided are described in the ICCV 2015 paper:

Holistically-Nested Edge Detection
Saining Xie and Zhuowen Tu
ICCV 2015

For details about training/evaluating HED, please take a look at http://github.com/s9xie/hed.

Model trained on BSDS-500 Dataset (finetuned from the VGGNet):

HED BSDS-500

Translating Videos to Natural Language

These models are described in this NAACL-HLT 2015 paper.

Translating Videos to Natural Language Using Deep Recurrent Neural Networks 
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko   
NAACL-HLT 2015

More details can be found on this project page.

Model:
Video2Text_VGG_mean_pool: This model is an improved version of the mean pooled model described in the NAACL-HLT 2015 paper. It uses video frame features from the VGG-16 layer model. This is trained only on the Youtube video dataset.

Compatibility: These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the recurrent branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent and https://github.com/vsubhashini/caffe/tree/recurrent.

VGG Face CNN descriptor

These models are described in this [BMVC 2015 paper] (http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf).

Deep Face Recognition 
Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman    
BMVC 2015

More details can be found on this project page.

Model: VGG Face: This is the very deep architecture based model trained from scratch using 2.6 Million images of celebrities collected from the web. The model has been imported to work with Caffe from the original model trained using MatConvNet library.

If you find our models useful, please add suitable reference to our paper in your work.

Yearbook Photo Dating

Model from the ICCV 2015 Extreme Imaging Workshop paper:

A Century of Portraits: Exploring the Visual Historical Record of American High School Yearbooks 
Shiry Ginosar, Kate Rakelly, Brian Yin, Sarah Sachs, Alyosha Efros
ICCV Workshop 2015

Model and prototxt files: Yearbook

CCNN: Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

These models are described in the ICCV 2015 paper.

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
Deepak Pathak, Philipp Krähenbühl, Trevor Darrell
ICCV 2015
arXiv:1506.03648

These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. Full details, source code, models, prototxts are available here: CCNN.

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

We provide models for facial emotion classification for different image representation obtained using mapped binary patterns. See the Project page for more details.

The models are described in the following paper:

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
Gil Levi and Tal Hassner
Proc. ACM International Conference on Multimodal Interaction (ICMI), Seattle, Nov. 2015

If you find our models useful, please add suitable reference to our paper in your work.

Facial Landmark Detection with Tweaked Convolutional Neural Networks

We provide source code and model for article: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project.

Written by Ishay Tubi

This software is provided as is, without any warranty, with no legal constraints. If you find our models useful, please add suitable reference to our paper in your work.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Download pre-computed Faster R-CNN detectors cd $FRCN_ROOT
./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models. See data/README.md for details. These models were trained on VOC 2007 trainval.

ref https://github.com/rbgirshick/py-faster-rcnn/blob/master/data/scripts/fetch_faster_rcnn_models.sh

Sequence to Sequence - Video to Text

These models are described in this ICCV 2015 paper.

Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015

More details can be found on this project page.

Model:
S2VT_VGG_RGB:
This is the S2VT (RGB) model described in the ICCV 2015 paper. It uses video frame features from the VGG-16 layer model. This is trained only on the Youtube video dataset.

Compatibility:
These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the recurrent branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent and https://github.com/vsubhashini/caffe/tree/recurrent.

ResNets: Deep Residual Networks from MSRA at ImageNet and COCO 2015

This repository contains the original models (ResNet-50, ResNet-101, and ResNet-152) described in the paper "Deep Residual Learning for Image Recognition" (http://arxiv.org/abs/1512.03385). These models are those used in [ILSVRC] (http://image-net.org/challenges/LSVRC/2015/) and COCO 2015 competitions, which won the 1st places in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

More instructions with prototxt and binary weight files are in: https://github.com/KaimingHe/deep-residual-networks

Reference:

@article{He2015,
	author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
	title = {Deep Residual Learning for Image Recognition},
	journal = {arXiv preprint arXiv:1512.03385},
	year = {2015}
}

Pascal VOC 2012 Multilabel Classification Model

This model has been used for the paper "Analyzing Classifiers: Fisher Vectors and Deep Neural Networks" (http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Bach_Analyzing_Classifiers_Fisher_CVPR_2016_paper.pdf), published in the proceedings of CVPR 2016. Kindly note, that it has been trained in a multilabel setting with a multilabel-compatible loss function. It should not be used in conjunction with a softmax layer In particular $f_{i}(x)>0$ denotes presence of class i and multiple classes can be predicted in one image.

Downloading the Model: caffemodel prototxt

Please reference the above submission when using the model via

@inproceedings{lapuschkinCVPR16,
    title={Analyzing classifiers: Fisher vectors and deep neural networks},
    author={Lapuschkin, S. and Binder, A. and Montavon, G. and M{\"u}ller, K.-R. and Samek, W.},
    booktitle={CVPR},
    pages={2912-2920},
    year={2016}
}

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters

@article{SqueezeNet,
    Author = {Forrest N. Iandola and Matthew W. Moskewicz and Khalid Ashraf and Song Han and William J. Dally and Kurt Keutzer},
    Title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$1MB model size},
    Journal = {arXiv:1602.07360},
    Year = {2016}
}

Please cite the paper if you use the model.

Model trained on ImageNet (including weights, solver, train_val, and deploy prototxt files)

Error rate on ImageNet ILSVRC-2012 is better than or equal to the bvlc_alexnet model.

Mixture DCNN

Mixture DCNN is a novel multi-model architecture which achieves better performance than an ensemble of DCNNs as evaluated on three different fine-grained datasets. Please cite the following paper if you use these models in your research.

@inproceedings{GeWACV2016,
  author    = {ZongYuan Ge and Alex Bewley and Christopher McCool and Ben Upcroft and Peter Corke and Conrad Sanderson},
  title     = {Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks},
  booktitle = {Winter Conference on the Applications of Computer Vision (WACV)},
  publisher = {IEEE},
  year      = {2016}
}

Models
Paper

CNN Object Proposal Models for Salient Object Detection

CNN models for the following CVPR'16 paper:

Unconstrained Salient Object Detection via Proposal Subset Optimization
J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price and R. Mech. 
CVPR, 2016.

[PDF] [Webpage]

The following models are finetuned on the Salient Object Subitizing dataset (~5000 images) with bounding box annotations:

VGG16: This model is used in the paper.
GoogleNet: This model is smaller, faster and slightly better than the VGG16 model.

It is recommended that you download the full system here, which will automatically download all the needed models and data.

Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled

We provide pretrained CNN models of our CVPR'16 paper (Oral):

Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled
O. Koller, H. Ney, R. Bowden
CVPR 2016, Las Vegas, NV, USA.

[PDF] [Webpage]

Mulimodal Compact Bilinear Pooling for VQA

The current state-of-the-art model for visual question answering, as described in the following paper:

@article{fukui16mcb,
  title={Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding},
  author={Fukui, Akira and Park, Dong Huk and Yang, Daylen and Rohrbach, Anna and Darrell, Trevor and Rohrbach, Marcus},
  journal={arXiv:1606.01847},
  year={2016},
}

[arXiv] [GitHub repo]

Pose-Aware CNN Models (PAMs) for Face Recognition

We provide the following:

pretrained CNN models of our CVPR'16 paper for pose-aware face recognition in the wild
IJB-A yaw estimates from our pose estimation module

(your informations are required to proceed to the download page in the link below)

@INPROCEEDINGS{masi2016cvpr, 
    author={Iacopo Masi and Stephen Rawls and G{\'e}rard Medioni and Prem Natarajan}, 
    booktitle={CVPR}, 
    title={Pose-{A}ware {F}ace {R}ecognition in the {W}ild}, 
    year={2016}
    }

[PDF] [Webpage]

Learning Structured Sparsity in Deep Neural Networks

Train deep neural networks with structured sparsity to speed up DNNs:

@incollection{Wen_NIPS2016,
    Title = {Learning Structured Sparsity in Deep Neural Networks},
    Author = {Wen, Wei and Wu, Chunpeng and Wang, Yandan and Chen, Yiran and Li, Hai},
    bookTitle = {Advances in Neural Information Processing Systems},
    Year = {2016}
}

[arXiv] [Caffemodel] [GitHub repo]

Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks

We provide fine-tuned models for CUB200-2011 birds (AlexNet + VGG19), Oxford flowers 102 (AlexNet + VGG19), Oxford IIIT PETS (AlexNet + VGG19), and NABirds dataset (GoogLeNet). We also provide our AlexNet model which was trained on ImageNet with the Stanford dogs test data excluded.

No bounding box or part annotations were used for fine-tuning. Part-based object proposal filtering and two-step fine-tuning was used as described in the corresponding paper

@inproceedings{Simon15:NAC,
    author = {Marcel Simon and Erik Rodner},
    booktitle = {International Conference on Computer Vision (ICCV)},
    title = {Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks},
    year = {2015},
}

[Models] [Paper] [Github repo] [Slides]

Inception-BN full ImageNet model

Inception-v2 model trained on full ImageNet dataset with 14,197,087 images in 21,841 classes.

The model was converted from the MXNet InceptionBN-21k network trained by Mu Li. Details and evaluation results can be found at https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-21k-inception.md.

@inproceedings{Ioffe15:ArXiv,
    author = {Sergey Ioffe and Christian Szegedy},
    booktitle = {arXiv:1502.03167},
    title = {Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift},
    year = {2015}
}

[GitHub repo] [Paper] [Original MXNet network]

ResFace101: ResNet-101 for Face Recognition

This page contains ResFace101: a ResNet-101 deep network model, tuned for face recognition.

We fine-tuned this model using the procedure described in I. Masi*, A. Tran*, T. Hassner*, J. Leksut, G. Medioni, "Do We Really Need to Collect Million of Faces for Effective Face Recognition? ", in Proc. of ECCV 2016 on the publicly available CASIA WebFace set.

Please, remember to cite our paper below, if you use our model, thanks.

@inproceedings{masi16dowe,
      title={Do {W}e {R}eally {N}eed to {C}ollect {M}illions of {F}aces 
      for {E}ffective {F}ace {R}ecognition?},
      booktitle = {European Conference on Computer Vision},
      author={Iacopo Masi 
      and Anh Tran 
      and Tal Hassner 
      and Jatuporn Toy Leksut 
      and G\'{e}rard Medioni},
      year={2016},
    }

[PDF] [Webpage]

DeepYeast

11-layer convolutional neural network trained on two-channel microscopy images of yeast cells carrying fluorescent proteins with different subcellular localizations.

[Web] [Paper] [Model]

ImageNet pre-trained models with batch normalization

CNN models pre-trained on 1000 ImageNet categories. Currently contains:

AlexNet and VGG16 with batch normalization added
Residual Networks with 50 (ResNet-50) and 10 layers (ResNet-10)

Improves over previous pre-trained models and in particular reproduces the ImageNet results of ResNet50 using Caffe. Includes ResNet generation script, training code and log files.

@article{simon2016cnnmodels,
  Author = {Simon, Marcel and Rodner, Erik and Denzler, Joachim},
  Journal = {arXiv preprint arXiv:1612.01452},
  Title = {ImageNet pre-trained models with batch normalization},
  Year = {2016}
}

[Models] [Paper] [Website]

ResNet-101 for regressing 3D morphable face models (3DMM) from single images

This project page contains a ResNet-101 deep network model for 3DMM regression (3D shape and texture)

The download includes both the network itself and the parameters required to map the 3DMM parameters regressed by the network back to 3D shapes (e.g., the basis vectors for the face shape and the average face shape).

If you find this useful, please remember to cite of paper below:

@inproceedings{tran2017regressing,
  title={Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network},
  author={Tran, Anh Tuan and Hassner, Tal and Masi, Iacopo and Medioni, G\'{e}rard},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2017}
}

[PDF] [Webpage]

Cascaded Fully Convolutional Networks for Biomedical Image Segmentation

These models segment liver and liver tumor in CT volumes using the UNET architecture proposed by Ronnerberger et al. (2015). The project contains all the source code, models and a notebook for easy liver and liver tumor inference. We encourage researcher to use our models for finetuning.

If you find this work useful for your research, please cite:

@Inbook{Christ2016,
title="Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields",
author="Christ, Patrick Ferdinand and Elshaer, Mohamed Ezzeldin A. and Ettlinger, Florian and Tatavarty, Sunil and Bickel, Marc and Bilic, Patrick and Rempfler, Markus and Armbruster, Marco and Hofmann, Felix and D'Anastasi, Melvin and Sommer, Wieland H. and Ahmadi, Seyed-Ahmad and Menze, Bjoern H.",
editor="Ourselin, Sebastien and Joskowicz, Leo and Sabuncu, Mert R. and Unal, Gozde and Wells, William",
bookTitle="Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="415--423",
isbn="978-3-319-46723-8",
doi="10.1007/978-3-319-46723-8_48",
url="http://dx.doi.org/10.1007/978-3-319-46723-8_48"
}

[PDF] [Webpage] [Models]

Deep Networks for Earth Observation

These models have been trained to perform semantic segmentation on aerial images, as proposed by Audebert et al. (2016). The available models are based on the SegNet architecture (Kendall et al., 2015). The project repository contains the model definitions and pre-trained weights on the city of Vaihingen.

@inproceedings{audebert_semantic_2016,
    address = {Taipei, Taiwan},
    title = {Semantic {Segmentation} of {Earth} {Observation} {Data} {Using} {Multimodal} and {Multi}-scale {Deep} {Networks}},
    url = {https://hal.archives-ouvertes.fr/hal-01360166},
    urldate = {2016-10-13},
    booktitle = {Asian {Conference} on {Computer} {Vision} ({ACCV}16)},
    author = {Audebert, Nicolas and Le Saux, Bertrand and Lefèvre, Sébastien},
    month = nov,
    year = {2016},
    keywords = {computer vision, data fusion, Earth observation, Neural networks, remote sensing},
}

[PDF] [Project]

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

We present a simple yet effective supervised deep hash approach that constructs binary hash codes from labeled data for large-scale image search. The supervised semantics-preserving deep hashing (SSDH) constructs hash functions as a latent layer in a deep network and the binary codes are learned by minimizing an objective function defined over classification error and other desirable hash codes properties. This is the extended version of our "CVPRW'15 paper." The details can be found in the following "TPAMI'17 paper":

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks
Huei-Fang Yang, Kevin Lin, Chu-Song Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

please cite the paper if you use the model:

Caffe-DeepBinaryCode: See our code release on Github, which allows you to train your own deep hashing model and create binary hash codes.

Striving for Simplicity: The All Convolutional Net

Implementation of All-CNN-C model for CIFAR-10 from the paper Striving for Simplicity: The All Convolutional Net by Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, accepted as a workshop contribution at ICLR 2015.

@article{Springenberg14,
  author    = {Jost Tobias Springenberg and Alexey Dosovitskiy and Thomas Brox and Martin A. Riedmiller},
  title     = {Striving for Simplicity: The All Convolutional Net},
  year      = {2014},
}

[arXiv] [GitHub repo]

VGG 4x without degradation: Channel Pruning for Accelerating Very Deep Neural Networks

An algorithm that effectively prune channels each layer, which could accelerate VGG-16 4x without degradation. Accepted as a poster at ICCV 2017.

@article{he2017channel,
  title={Channel Pruning for Accelerating Very Deep Neural Networks},
  author={He, Yihui and Zhang, Xiangyu and Sun, Jian},
  journal={arXiv preprint arXiv:1707.06168},
  year={2017}
}

Models:

VGG-16 4x 10.1% top-5 error and 29.4% top-1 error on ILSVRC-2012.

[PDF] [Github repo]

Using Ranking-CNN for Age Estimation

The ranking-CNN Caffe model for the CVPR 2017 paper:

@InProceedings{Chen_2017_CVPR,
author = {Chen, Shixing and Zhang, Caojin and Dong, Ming and Le, Jialiang and Rao, Mike},
title = {Using Ranking-CNN for Age Estimation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}

[PDF] [GitHub repo]

Lets Keep it Simple: Using Simple Architectures to Outperform Deeper and More Complex Architectures

This repository contains the architectures, Models, logs, etc pertaining to the SimpleNet Paper. SimpleNet-V1 outperforms deeper and heavier architectures such as AlexNet, VGGNet, ResNet, GoogleNet, etc in a series of benchmark datasets, such as CIFAR10/100, MNIST, SVHN.

@article{hasanpour2016lets,
  title={Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures},
  author={Hasanpour, Seyyed Hossein and Rouhani, Mohammad and Fayyaz, Mohsen and Sabokrou, Mohammad},
  journal={arXiv preprint arXiv:1608.06037},
  year={2016}
}

[arXiv] [GitHub repo]

Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet

This repository contains the architectures, pretrained models, etc pertaining to the SimpNet Paper. In this work, several crucial principles for designing deep convolutional architectures are introduced. Based on these principles, a simple architecture called "SimpNet" is designed. SimpNet outperforms deeper and heavier architectures such as ResNet, Wide Residual Network, etc on several well-known benchmarks, while having 2 to 25 times fewer number of parameters and operations.

@article{hasanpour2018towards,
  title={Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet},
  author={Hasanpour, Seyyed Hossein and Rouhani, Mohammad and Fayyaz, Mohsen and Sabokrou, Mohammad and Adeli, Ehsan},
  journal={arXiv preprint arXiv:1802.06205},
  year={2018}
}

[arXiv] [GitHub repo]

Model Zoo

Berkeley-trained models

Network in Network model

Models from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets"

Models used by the VGG team in ILSVRC-2014

Places-CNN model from MIT.

GoogLeNet GPU implementation from Princeton.

Fully Convolutional Networks for Semantic Segmentation (FCNs)

CaffeNet fine-tuned for Oxford flowers dataset

CNN Models for Salient Object Subitizing.

Deep Learning of Binary Hash Codes for Fast Image Retrieval

Places_CNDS_models on Scene Recognition

Models for Age and Gender Classification.

More Models for Age and Gender Classification.

GoogLeNet_cars on car model classification

ParseNet: Looking wider to see better

SegNet and Bayesian SegNet

Conditional Random Fields as Recurrent Neural Networks

Holistically-Nested Edge Detection

Translating Videos to Natural Language

VGG Face CNN descriptor

Yearbook Photo Dating

CCNN: Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

Facial Landmark Detection with Tweaked Convolutional Neural Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Sequence to Sequence - Video to Text

ResNets: Deep Residual Networks from MSRA at ImageNet and COCO 2015

Pascal VOC 2012 Multilabel Classification Model

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters

Mixture DCNN

CNN Object Proposal Models for Salient Object Detection

Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled

Mulimodal Compact Bilinear Pooling for VQA

Pose-Aware CNN Models (PAMs) for Face Recognition

Learning Structured Sparsity in Deep Neural Networks

Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks

Inception-BN full ImageNet model

ResFace101: ResNet-101 for Face Recognition

DeepYeast

ImageNet pre-trained models with batch normalization

ResNet-101 for regressing 3D morphable face models (3DMM) from single images

Cascaded Fully Convolutional Networks for Biomedical Image Segmentation

Deep Networks for Earth Observation

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

Striving for Simplicity: The All Convolutional Net

VGG 4x without degradation: Channel Pruning for Accelerating Very Deep Neural Networks

Using Ranking-CNN for Age Estimation

Lets Keep it Simple: Using Simple Architectures to Outperform Deeper and More Complex Architectures

Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet

Clone this wiki locally