This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.
This repo is inherited from maskrcnn-benchmark and follows the same license.
CSVTR consists of 23 pre-defined query words in Chinese and 1667 Chinese scene text images collected from the Google image search engine. Each image is annotated with its corresponding query word among the 23 pre-defined Chinese query words.
CSVTR could be downloaded from baidu disk(asjw) or google driver.
The trained models could be downloaded from baidu disk(legq). (This model is only supported for English.)
An example of the path of test images: ./datasets/IIIT_STR_V1.0/imgDatabase/img_000846.jpg
CTR could be downloaded from baidu disk(e860).
MLT-5k: This dataset is the subset of MLT2017(or MLT2019). Please refer to the code of extracting this subset. Place the original datasets to the dir (datasets/MLT2019), such as:
img path: ./datasets/MLT2019/train_images/tr_img_10000.jpg
gt path: ./datasets/MLT2019/train_gts/tr_img_10000.txt
Please cite the related works in your publications if it helps your research:
@InProceedings{Wang_2021_CVPR,
author = {Wang, Hao and Bai, Xiang and Yang, Mingkun and Zhu, Shenggao and Wang, Jing and Liu, Wenyu},
title = {Scene Text Retrieval via Joint Text Detection and Similarity Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {4558-4567}
}