This repository contains all the files for you to complete the 'Getting Started with SegNet' and the 'Bayesian SegNet' tutorials here: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html
Please note that if following this instruction set, that the folder names have been modified.
SegNet requires a modified version of Caffe to run. Please see the caffe-segnet-cudnn7
submodule within this repository, and follow the installation instructions.
To start, you can use the scripts/inference/segnet_inference.py
script. It is recommended to use this with the models/inference/SegNet/CityScapes/segnet_cityscapes.prototxt
model, and Timo Sämann's trained weights, which are available for download here.
The inference script can be used as follows:
python scripts/inference/inference.py models/inference/SegNet/CityScapes/segnet_cityscapes.prototxt \
/PATH/TO/segnet_iter_30000_timo.caffemodel data/test_segmentation.avi [--cpu]
If the --cpu
flag is set, then inference will be run on your CPU instead of your GPU. This is not recommended, unless you don't have a GPU.
The script uses OpenCV's VideoCapture to parse the data. An example video file has been provided for testing, data/test_segmentation.avi
.
The easiest way to specify your own segmentation data is via a video file, such as an .mp4
or .avi
. Else, you must be sure to specify a folder of images with the format required for VideoCapture.
A number of example models for indoor and outdoor road scene understanding can be found in the SegNet Model Zoo.
Cityscapes is a dataset that can be used to train SegNet/Bayesian SegNet, but a few steps must be done first. You can download the dataset here and the Cityscape scripts repo here. Once downloaded, follow these steps:
- Edit
/cityscapesScripts/cityscapescripts/helpers/labels.py
to contain the classes you want to train on. - Set the
CITYSCAPES_DATASET
environment variable to wherever you downloaded the Cityscapes dataset. - Run
python /cityscapesScripts/cityscapescripts/preparation/createTrainIdLabelImgs.py
to create the labeled images. - Once the script is completed you should have a
${CITYSCAPES_DATASET}/gtFine/*/labelTrainIds.png
images created.
For convenience, the data generated by the Cityscapes scripts must be preprocessed. This is done by running python scripts/data_prep/preprocessor.py
Within this script, there are several parameters that must be modified. Please open the file and verify the values of these parameters.
IMG_PATH
: Path to the images created from the previous step. The first directory in the array is all the raw images and the second directory is the ground truthOUT_PATH
: The directory where the processed images will reside. NOTE: You must create all the directories yourself. If you creating a training set, you must also create the subdirectories<OUT_PATH>/train/
and<OUT_PATH>/trainannot/
. Likewise, if you are creating a validation set you need<OUT_PATH>/val/
and<OUT_PATH>/valannot/
DATA_TYPE
: The type of data you are processing. Choose fromtrain
andval
.TXT_PATH
: Text file location which contains all the location of all the processed images. This is needed for Caffe.RESIZE_IMGS
: Flag to resize images.WIDTH, HEIGHT
: Desired width and height of the processed images.INTERPOLATION
: Type of interpolation done for the actual images and the ground truth images.CROP_TO_ASPECT_RATIO
: Crops input images to the aspect ratio of the wanted imageCROP_HEIGHT_POSITION
: Where to start the vertical crop. Options aretop
,middle
, andbottom
CROP_WIDTH_POSITION
: Where to start the horizontal crop. Options areleft
,middle
, andright
You can set these parameters directly in the file or using command line arguments. Your training and test set should be processed using this script. NOTE: For the training processing, make sure to note down/save the class weights that are printed at the end of the function. These will be needed for your training .prototxt file.
To complete training, you must have a solver.prototxt
and a train.prototxt
. If you are performing inference with your generated model on the validation set, you will need a test.prototxt
as well. Here are some things to look out for when configuring these files.
solver.prototxt
snapshot_prefix
: This is not the directory of the snapshots. Your snapshots will look like<snapshot_prefix>_iter_10.caffemodel
.
train.prototxt
- Couple notes on
python_param.param_str
data_dirs
: locations of the datasets you preprocesseddata_proportions
: proportion of data you want to save for testingbatch_size
: number of images trained per iteration of stochastic gradient descent
- Make sure the number of classes you are training on match the number of layers the softmax layer/last convolution has.
- Remember those class weights that you remembered to save? Well you need to input them in your loss layer as class weightings.
test.prototxt
- The first layer's param
dense_image_data_param.source
should be the txt file of your validation set
scripts/training/train_config.ini
is where you can specify all the models you want to train.
Solvers
: Solver file (.prototxt
file)Init_Weights
: Initial weights of the model (.caffemodel
file)Inference_Weights
: Final inference weights of the model once training is completedSolverstates
: Load solver state (.solverstate
file). This allows for you to begin training in an intermediate step. Typically blank.Test_Models
: Test model that is run on snapshots while training is done in parallel (.prototxt
file)Test_Images
: File where all the test images are stored. This should be generated by the preprocessing step (.txt
file)Log_Dirs
: Directory where the logs are stored. NOTE: Logging currently only works for testing
Once all your trained models are in the ini file, you can run python train_and_test.py
with the following arguments:
--config
: Location of ini file--run_inference
: Flag to test in parallel while training the model. Will only occur for each snapshot created--train_gpu
: GPU ID of where the training will occur. Only matters if you have multiple GPUs.--test_gpu
: GPU ID of where the testing will occur. Only matters if you multiple GPUs.
Congrats! Your model should be training.
Testing in parallel with training If you are testing at the same time as training, make sure the intervals between snapshots are big enough. Computing the batch norm statistics for each snapshot takes time (around 10 - 20 minutes) so if the snapshots are too frequent, the script may queue many inferences and potentially blow up.
Error: Did not match C++ signature If you notice an error that looks like the following:
Boost.Python.ArgumentError: Python argument types in
Net.__init__(Net, str, str, int)
did not match C++ signature:
__init__(boost::python::api::object, std::string, std::string, int)
__init__(boost::python::api::object, std::string, int)
You usually need to wrap your strings inputted to Caffe functions with str()
. This is only an issue with Python 2 which is explained in more detail here.
For more information about the SegNet architecture:
http://arxiv.org/abs/1511.02680 Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680, 2015.
http://arxiv.org/abs/1511.00561 Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." PAMI, 2017.
This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/
Alex Kendall
Cambridge University