This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch.
- Distributed Training: >60% Thank you ycszen, from his struct faster than the multi-thread parallel method(nn.DataParallel), we use the multi-processing parallel method.
- Multi-GPU training and inference: support different manners of inference.
- Provides pre-trained models and implement different semantic segmentation models.
- PyTorch 1.0
pip3 install torch torchvision
- Easydict
pip3 install easydict
- Apex
- Ninja
sudo apt-get install ninja-build
- tqdm
pip3 install tqdm
- xception-71(pretrain)
- deeperlab(CVPR2019)
SS:Single Scale MSF:Multi-scale + Flip
because I only realize the segmentation part,I tested its results on voc
Method | Backbone | TrainSet | EvalSet | Mean IoU(ss) | Mean IoU(msf) |
---|---|---|---|---|---|
deeperlab(ours+SBD) | R101_v1c | train_aug | val | 79.71 | 80.26 |
deeperlab(ours) | R101_v1c | train_aug | val | 73.28 | 74.11 |
- Detection part
we must build the env for training
make link
make others
soft link to data,pretrain,log,logger
- create the config file of dataset:
train.txt
,val.txt
,test.txt
file structure:(split withtab
)path-of-the-image path-of-the-groundtruth
- modify the
config.py
according to your requirements - train a network:
We use the official torch.distributed.launch
in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.
For each experiment, you can just run this script:
export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py
The above performance are all conducted based on the non-distributed training. For each experiment, you can just run this script:
bash train.sh
In train.sh, the argument of d
means the GPU you want to use.
In the evaluator, we have implemented the multi-gpu inference base on the multi-process. In the inference phase, the function will spawns as many Python processes as the number of GPUs we want to use, and each Python process will handle a subset of the whole evaluation dataset on a single GPU.
- evaluate a trained network on the validation set:
bash eval.sh
- input arguments in shell:
usage: -e epoch_idx -d device_idx -c save_csv [--verbose ] [--show_image] [--save_path Pred_Save_Path]
if you are interested my algorithm, you can see my realized segmentation tool(dfn,deeperlab,deeplabv3 plus and so on):
because my device is 1080, we can't use 7*7 conv in two 4096 channel due to out of memory. so if you use it. you can change it in model/deeperlab.py