This is a PyTorch implemntation of the ECCV 2020 paper Mask TextSpotter v3. Mask TextSpotter v3 is an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Mask TextSpotter v3 significantly improves robustness to rotations, aspect ratios, and shapes.
Here we label the Mask TextSpotter series as Mask TextSpotter v1 (ECCV 2018 paper, code), Mask TextSpotter v2 (TPAMI paper, code), and Mask TextSpotter v3 (ECCV 2020 paper).
This project is under a lincense of Creative Commons Attribution-NonCommercial 4.0 International. Part of the code is inherited from Mask TextSpotter v2, which is under an MIT license.
- Python3 (Python3.7 is recommended)
- PyTorch >= 1.4 (1.4 is recommended)
- cocoapi
- yacs
- matplotlib
- GCC >= 4.9 (This is very important!)
- OpenCV
- CUDA >= 9.0 (10.0.130 is recommended)
# first, make sure that your conda is setup properly with the right environment
# for that, check that `which conda`, `which pip` and `which python` points to the
# right path. From a clean conda env, this is what you need to do
conda create --name masktextspotter -y
conda activate masktextspotter
# this installs the right pip and dependencies for the fresh python
conda install ipython pip
# python dependencies
pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX pyclipper Polygon3 editdistance
# install PyTorch
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
export INSTALL_DIR=$PWD
# install pycocotools
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
# clone repo
cd $INSTALL_DIR
git clone https://github.com/MhLiao/MaskTextSpotterV3.git
cd MaskTextSpotterV3
# build
python setup.py build develop
unset INSTALL_DIR
Download the trained model Google Drive, BaiduYun (downloading code: cnj2).
Option: Download the model pretrain with SynthText for your quick re-implementation. Google Drive, BaiduYun (downloading code: c82l).
You can run a demo script for a single image inference by python tools/demo.py
.
The datasets are the same as Mask TextSpotter v2.
Download the ICDAR2013(Google Drive, BaiduYun) and ICDAR2015(Google Drive, BaiduYun) as examples.
The SCUT dataset used for training can be downloaded here.
The converted labels of Total-Text dataset can be downloaded here.
The converted labels of SynthText can be downloaded here.
The root of the dataset directory should be MaskTextSpotterV3/datasets/
.
An example of the path of test images: MaskTextSpotterV3/datasets/icdar2015/test_iamges
test dataset: TEST.DATASETS
;
input size: ```INPUT.MIN_SIZE_TEST''';
model path: MODEL.WEIGHT
;
output directory: OUTPUT_DIR
Place all the training sets in MaskTextSpotterV3/datasets/
and check DATASETS.TRAIN
in the config file.
Trained with SynthText
python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/pretrain/seg_rec_poly_fuse_feature.yaml
Trained with a mixure of SynthText, icdar2013, icdar2015, scut-eng-char, and total-text
check the initial weights in the config file.
python3 -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/mixtrain/seg_rec_poly_fuse_feature.yaml
Google Drive, Baidu Drive ( downloading code: f3tk)
unzip and palce it like evaluation/lexicons/
.
cd evaluation/totaltext/e2e/
# edit "result_dir" in script.py
python script.py
First, generate the Rotated ICDAR 2013 dataset
cd tools
# set the specific rotating angle in convert_dataset.py
python convert_dataset.py
Then, run testing (change test set in YAML) and evaluate by evaluation/rotated_icdar2013/e2e/script.py
Please cite the related works in your publications if it helps your research:
@inproceedings{liao2020mask,
title={Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting},
author={Liao, Minghui and Pang, Guan and Huang, Jing and Hassner, Tal and Bai, Xiang},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
@article{liao2019mask,
author={M. {Liao} and P. {Lyu} and M. {He} and C. {Yao} and W. {Wu} and X. {Bai}},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes},
volume={43},
number={2},
pages={532--548},
year={2021}
}
@inproceedings{lyu2018mask,
title={Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes},
author={Lyu, Pengyuan and Liao, Minghui and Yao, Cong and Wu, Wenhao and Bai, Xiang},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
pages={67--83},
year={2018}
}