This repository is an official implementation of RCLSTR.
Python >= 3.8
CUDA == 11.0
PyTorch == 1.7.1
a. Create a conda virtual environment and activate it.
conda create -n RCLSTR python=3.8 -y
conda activate RCLSTR
b. Install PyTorch and torchvision following the official instructions.
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f
c. Install other packages.
pip install lmdb pillow nltk natsort fire tensorboard tqdm imgaug einops
pip install numpy==1.22.3
Download datasets
We use the ST(SynthText) training datasets from STR-Fewer-Labels. Download the datasets from baiduyun (password:px16).
Data folder structure
├── training
│ └── label
│ └── synth
├── validation
│ ├── 1.SVT
│ ├── 2.IIIT
│ ├── 3.IC13
│ ├── 4.IC15
│ ├── 5.COCO
│ ├── 6.RCTW17
│ ├── 7.Uber
│ ├── 8.ArT
│ ├── 9.LSVT
│ ├── 10.MLT19
│ └── 11.ReCTS
└── evaluation
└── benchmark
├── CUTE80
├── IC03_867
├── IC13_1015
├── IC15_2077
├── IIIT5k_3000
├── SVT
└── SVTP
Link the dataset path as follows:
cd pretrain
ln -s /path/to/data_CVPR2021 data_CVPR2021
cd evaluation
ln -s /path/to/data_CVPR2021 data_CVPR2021
TPS model weights
For the TPS module, we use the pretrained TPS model weights from STR-Fewer-Labels. Please download the TPS model weights from baiduyun (password:px16) and put it in pretrain/TPS_model.
RCLSTR method includes regularization module (reg), hierarchical module (hier) and cross-hierarchy consistency module (con).
Pretrain SeqMoCo model:
cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python \
--model_name TRBA \
--exp_name SeqMoCo \
--lr 0.0015 \
--batch-size 32 \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0 \
--data data_CVPR2021/training/label/synth \
--data-format lmdb \
--light_aug \
--instance_map window \
--epochs 5 \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--frame_weight 0 \
--frame_alpha 0 \
--word_weight 0 \
--word_alpha 0
Pretrain SeqMoCo model with reg module:
cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python \
--model_name TRBA \
--exp_name SeqMoCo_reg \
--lr 0.0015 \
--batch-size 32 \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0 \
--data data_CVPR2021/training/label/synth \
--data-format lmdb \
--light_aug \
--instance_map window \
--epochs 5 \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation \
--frame_weight 0 \
--frame_alpha 0 \
--word_weight 0 \
--word_alpha 0
Pretrain SeqMoCo model with reg and hier module:
cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python \
--model_name TRBA \
--exp_name SeqMoCo_reg_hier \
--lr 0.0015 \
--batch-size 32 \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0 \
--data data_CVPR2021/training/label/synth \
--data-format lmdb \
--light_aug \
--instance_map window \
--epochs 5 \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
Pretrain SeqMoCo model with reg, hier and con module:
cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python \
--model_name TRBA \
--exp_name SeqMoCo_reg_hier_con \
--lr 0.0015 \
--batch-size 32 \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0 \
--data data_CVPR2021/training/label/synth \
--data-format lmdb \
--light_aug \
--instance_map window \
--epochs 5 \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation \
--multi_level_consistent global2local \
--multi_level_ins 0
Train attention-based decoder for feature representation evaluation:
cd evaluation
--model_name TRA \
--exp_name TRA_reg_hier_con \
--saved_model ../pretrain/SeqMoCo_reg_hier_con/checkpoint_0004.pth.tar \
--select_data synth \
--batch_size 256 \
--Aug light
- Support ViT
We thank these great works and open-source codebases:
If you find our method useful for your reserach, please cite
title={Relational Contrastive Learning for Scene Text Recognition},
author={Jinglei Zhang and Tiancheng Lin and Yi Xu and Kai Chen and Rui Zhang},