Project page | Paper | Video
This repository contains code for the paper
"Improving Word Recognition using Multiple Hypotheses and Deep Embeddings" Siddhant Bansal, Praveen Krishnan, C.V. Jawahar published in ICPR 2020.
We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings to generate an updated distance vector. The updated distance vector has lower distance values for the correct words and higher distance values for the incorrect words. We rigorously evaluate our proposed method systematically on a collection of books in the Hindi language. Our method achieves an absolute improvement of around 10% in terms of word recognition accuracy.
git clone https://github.com/Sid2697/Word-recognition-EmbedNet-CAB.git
cd Word-recognition-EmbedNet-CAB
- Python == 3.7
- PyTorch
- Scikit-learn
- NumPy
- tqdm
requirements.txt
has been provided for installing Python dependencies.
pip install -r requirements.txt
The deep embeddings used in this work are generated using the End2End network proposed in:
Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). pp. 1–6 (April 2018). https://doi.org/10.1109/DAS.2018.70
Word text and image's deep embeddings for testing this repository are provided in the embeddings
folder.
Text files containing the information about the embeddings are required while running the code. They are in the format
<img1-path><space><text1-string><space><dummyInt><space>1
<img2-path><space><text2-string><space><dummyInt><space>1
...
One can refer to and use https://github.com/kris314/hwnet for generating embeddings.
For the purpose of making it easier to explore the code in this repository, sample text files and embeddings are provided in gen_files
and embeddings
, respectively.
Original Dataset used in this work will be released by CVIT soon.
Pre-trained EmbedNet models are saved in the models
folder.
For running baseline word recognition use the command:
python src/word_rec_EmbedNet.py
For running word recognition with confidence score use the command:
python src/word_rec_EmbedNet.py --use_confidence
For running word recognition using a pre-trained EmbedNet use the command:
python src/word_rec_EmbedNet.py --use_confidence --use_model --hidden_layers 1024
For running word recognition using a pre-trained EmbedNet and the CAB module use the command:
python src/word_rec_EmbedNet.py --use_confidence --use_model --hidden_layers 1024 --cab
Other arguments for word recognition experiment are:
--image_embeds
--topk_embeds
--image_file
--predictions_file
--use_confidence
--cab
--cab_alpha
--cab_beta
--in_features
--out_features
--hidden_layers
--model_path
--testing
--test_split
--k
image_embeds
is used to provide path to the image embeddingstopk_embeds
is used to provide path to the TopK predictions' embeddingsimage_file
is used to provide path to the image's text information filepredictions_file
is used to provide path to the TopK predictions' text information fileuse_confidence
if used then confidence score is used for re-ranking the predictionscab
if used then the CAB module is used for improving the word recognition accuracycab_alpha
hyper-parameter alpha defined for the CAB modulecab_beta
hyper-parameter beta defined for the CAB modulein_features
size of the input to EmbedNetout_features
size of the output to EmbedNethidden_layers
list of input size of the hidden layersmodel_path
path to the pre-trained model to be used for testingtesting
if used then only test set is used for evaluationtest_split
split for testing the trained EmbedNet on un-seen datak
total number of predictions to test on (max 20)
Prepare text files and embeddings as mentioned in Generating deep embeddings. Refer files in folder gen_files
for text file's examples. Once the embeddings are prepared run the following command
python src/EmbedNet_train.py --model_name provide_a_name_of_your_choice
For the purpose of a demonstration, you can run the following command
python src/EmbedNet_train.py --model_name temp
This will start training an EmbedNet for 1000 epochs and save the models in trained/EmbedNet_models
.
Other arguments for EmbedNet_train.py
are:
--base_dir
--model_dir
--train_percentage
--epochs
--lr
--batch
--model_name
--margin
--hidden_layers
--gpu_id
--image_embeds
--topk_embeds
--image_file
--predictions_file
base_dir
is a path to the directory for saving modelsmodel_dir
is a name of the folder for saving trained modelstrain_percentage
percentage of data to use for trainingepochs
number of epochs to train forlr
learning ratebatch
batch sizemodel_name
name of the model for savingmargin
triplet loss marginhidden_layers
list of input size of the hidden layersgpu_id
specify which GPU to useimage_embeds
is used to provide path to the image embeddingstopk_embeds
is used to provide path to the TopK predictions' embeddingsimage_file
is used to provide path to the image's text information filepredictions_file
is used to provide path to the TopK predictions' text information file
The software is licensed under the MIT License. If you find this work useful, please cite the following paper:
@misc{bansal2020fused,
title={Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval},
author={Siddhant Bansal and Praveen Krishnan and C. V. Jawahar},
year={2020},
eprint={2007.00166},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
In case of any query contact Siddhant Bansal.