In computer vision, there are three challenges: image classification, object detection and semantic segmentation. As you see above, semantic segmentation can segment an image into different parts and objects (e.g.grass, cat, tree, sky).
Pixel Decoder is a tool that contains several current available semantic segmentation algorithms. Pixel Decoder includes Standard Unet and its modified versions, Tiramisu and SegNet. SegNet is the algorithm that Skynet was built on. All the algorithms that live inside Pixel Decoder are convolutional neural networks are all in a structure that called encoder-decoder. The encoder reads in the image pixels and compresses the information in vector, downsample to save computing memory; and the decoder works on reconstructing the pixels spatial information and output the desired outcome. Some UNet-like algorithms were adopted from SpaceNet challenge solutions.
All these algorithms are built with Tensorflow and Keras. These are some results for road segmentation from Pixel Decoder we got.
### Installationgit clone https://github.com/Geoyi/pixel-decoder
cd pixel-decoder
pip install -e .
pixel_decoder train --batch_size=4 \
--imgs_folder=tiles \
--masks_folder=labels \
--models_folder=trained_models_out \
--model_id=resnet_unet \
--origin_shape_no=256 \
--border_no=32
It takes in the training dataset that created from Label Maker
.
batch_size
: batch size for the training;imgs_folder
: is the directory for RGB images to train;masks_folder
: is the directory for labeled mask to train;model_id
: is the neural net architecture to train with. We have -resnet_unet
,inception_unet
,linknet_unet
,SegNet
,Tiramisu
as model_id live in Pixel Decoder.origin_shape_no
: 256 is the default image tile shape from Label Maker;border_no
: it's set to 32. It's a additional 32 pixel to add on 256 by 256 image tile to become 320 by 320 to get rid of U-Net's edge distortion.
After the model is trained and you see a trained model weight in your model directory, run:
pixel_decoder predict --imgs_folder=tiles \
--test_folder=test_images \
--models_folder=trained_models_out \
--pred_folder=predictions \
--model_id=resnet_unet \
--origin_shape_no=256 \
--border_no=32
imgs_folder
: is the directory for RGB images to train;masks_folder
: is the directory for labeled mask to train. It uses to get the stats, e.g. mean and standard deviation, from training images.test_folder
: is the directory for test images.pred_folder
: a directory that saved all the predicted test image from test_folder;model_id
: is the neural net architecture to train with. We have -resnet_unet
,inception_unet
,linknet_unet
,SegNet
,Tiramisu
as model_id live in Pixel Decoder.origin_shape_no
: 256 is the default image tile shape from Label Maker;border_no
: it's set to 32. It's a additional 32 pixel to add on 256 by 256 image tile to become 320 by 320 to get rid of U-Net's edge distortion.
-
Docker installation on AWS EC2. Instruction for Nvidia Docker installation here.
-
Build provide docker image from the Dockerfile
git clone https://github.com/Geoyi/pixel-decoder
cd pixel-decoder
nvidia-docker build -t pixel_decoder .
- Run nvidia-docker and Pixel Decoder
nvidia-docker run -v $PWD:/work -it pixel_decoder bash
- Install Pixel Decoder and train the model
Train
pixel_decoder train --batch_size=4 \
--imgs_folder=tiles \
--masks_folder=labels \
--models_folder=trained_models_out \
--model_id=resnet_unet \
--origin_shape_no=256 \
--border_no=32
Predict
pixel_decoder predict --imgs_folder=tiles \
--test_folder=test_images \
--models_folder=trained_models_out \
--pred_folder=predictions \
--model_id=resnet_unet \
--origin_shape_no=256 \
--border_no=32
To run a neural net, e.g resnet_unet
, you can create ready-to-train dataset from Label Maker. A detail walkthrough notebook will come soon.
pixel_decoder was built on top of python-seed that created by Development Seed.