ScanSSD: Scanning Single Shot Detector for Math in Document Images

A PyTorch implementation of ScanSSD Scanning Single Shot MultiBox Detector by Parag Mali. It was developed using SSD implementation by Max deGroot.

Developed using Cuda 9.1.85 and Pytorch 1.1.0

Installation

Install PyTorch
Clone this repository. Requires Python3
Download the dataset by following the instructions on (https://github.com/MaliParag/TFD-ICDAR2019).
Install Visdom for real-time loss visualization during training!
- To use Visdom in the browser:
```
# First install Python server and client
pip install visdom
# Start the server (probably in a screen or tmux)
python -m visdom.server
```
- Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details).

Code Organization

SSD model is built in ssd.py. Training and testing the SSD is managed in train.py and test.py. All the training code is in layers directory. Hyper-parameters for training and testing can be specified through command line and through config.py file inside data directory.

data directory also contains gtdb_new.py data reader that uses sliding windows to generates sub-images of page for training. All the scripts regarding stitching the sub-image level detections are in gtdb directory.

Functions for data augmentation, visualization of bounding boxes and heatmap are in utils.

Setting up data for training

If you are not sure how to setup data, use dir_struct file. It has the one of the possible directory structure that you can use for setting up data for training and testing.

To generate .pmath files or .pchar files you can use this script.

Training ScanSSD

First download the fc-reduced VGG-16 PyTorch base network weights here
By default, we assume you have downloaded the file in the scanssd/weights dir:
Run command

python3 train.py 
--dataset GTDB 
--dataset_root ~/data/GTDB/ 
--cuda True 
--visdom True 
--batch_size 16 
--num_workers 4 
--exp_name IOU512_iter1 
--model_type 512 
--training_data training_data 
--cfg hboxes512 
--loss_fun ce 
--kernel 1 5 
--padding 0 2 
--neg_mining True 
--pos_thresh 0.75

Note:
- For training, an NVIDIA GPU is strongly recommended for speed.
- For instructions on Visdom usage/installation, see the Installation section.
- You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train.py for options)

Pre-Trained weights

For quick testing, pre-trained weights are available here.

Testing

To test a trained network:

python3 test.py 
--dataset_root ../ 
--trained_model HBOXES512_iter1GTDB.pth  
--visual_threshold 0.25 
--cuda True 
--exp_name test_real_world_iter1 
--test_data testing_data  
--model_type 512 
--cfg hboxes512 
--padding 3 3 
--kernel 1 1 
--batch_size 8

You can specify the parameters listed in the eval.py file by flagging them or manually changing them.

Stitching the patch level results


python3 <Workspace>/ssd/gtdb/stitch_patches_pdf.py 
--data_file <Workspace>/train_pdf 
--output_dir <Workspace>/ssd/eval/stitched_HBOXES512_e4/ 
--math_dir <Workspace>/ssd/eval/test_HBOXES512_e4/ 
--stitching_algo equal 
--algo_threshold 30 
--num_workers 8 
--postprocess True 
--home_images <Workspace>/images/

math_dir is output dir generated by test.py

output_dir is where you want to generate the final output

Evaluate

python3 <Workspace>/ICDAR2019/TFD-ICDAR2019v2/Evaluation/IOULib/IOUevaluater.py 
--ground_truth <Workspace>/ICDAR2019/TFD-ICDAR2019v2/Train/math_gt/ 
--detections <Workspace>/ssd/eval/stitched_HBOXES512_e4/

Performance

TFD-ICDAR 2019 Version1 Test

Metric	Precision	Recall	F-score
IOU50	85.05 %	75.85%	80.19%
IOU75	77.38 %	69.01%	72.96%

FPS

GTX 1080: ~27 FPS for 512 * 512 input images

Related publications

Mali, Parag, et al. “ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images.” ArXiv:2003.08005 [Cs], Mar. 2020. arXiv.org, http://arxiv.org/abs/2003.08005.

P. S. Mali, "Scanning Single Shot Detector for Math in Document Images." Order No. 22622391, Rochester Institute of Technology, Ann Arbor, 2019.

M. Mahdavi, R. Zanibbi, H. Mouchere, and Utpal Garain (2019). ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection. Proc. International Conference on Document Analysis and Recognition, Sydney, Australia (to appear).

Acknowledgements

Max deGroot for providing open-source SSD code

Comentários das alterações no branch

Proposta de parâmetros para o funcionamento do test

python .\test.py --dataset_root . --trained_model ../pretrained_model/AMATH512_e1GTDB.pth --visual_threshold 0.5 --exp_name test_turing --test_data testing_data_turing --model_type 512 --cfg hboxes512 --padding 3 3 --kernel 1 5 --batch_size 8 --padding 0 2 --stride 1.0

Antes é necessário gerar as imagens de cada página do PDF, que servirá como entrada no modelo. Para tal, é possível utilizar o script convert_pdf_to_image.py em https://github.com/MaliParag/TFD-ICDAR2019, que funcionou sem problemas. Basta inserir como diretório de saída, o diretório "images" deste repositório.
Observar para os parâmetro --data_root e --test_data, que sugere-se que sejam, respectivamente, o diretório raiz, e um arquivo com o nome do PDF que gerou as imagens, listado por página que será processada, exemplo:

<nome_pdf>/1
<nome_pdf>/2
<nome_pdf>/3
<nome_pdf>/4

Após o processamento das páginas, um arquivo <nome_pdf>.csv é incluído na pasta .\eval\<exp_name>, que inclui os bounding box preditos para cada página. Esse arquivo gerado, deve ser utilizado no script .\gtdb\stitch_patches_pdf.py, o qual ainda não está funcionando. Esse script é utilizado para geração do arquivo .math, além de ajustar os boxes preditos. Entretanto, o arquivo <nome_pdf>.csv pode ser copiado com a extensão .math, lembrando de alterar a primeira coluna para número inteiro, apenas retirando as duas casas decimais.
Com o arquivo .math gerado, é possível gerar as imagens de cada página anotada, por meio do script visualize_annotations.py, também disponibilizado em https://github.com/MaliParag/TFD-ICDAR2019 . Esse script não apresentou problemas. Um exemplo de anotação efetuada com os parâmetros de teste indicados acima, para o artigo https://dl.acm.org/citation.cfm?id=321991 denominado Emden76, está na pasta anotado.

Name		Name	Last commit message	Last commit date
Latest commit History 420 Commits
IOU_lib		IOU_lib
annotations		annotations
anotado/Emden76		anotado/Emden76
anotado_stitched/Emden76		anotado_stitched/Emden76
data		data
docker		docker
eval/test_turing		eval/test_turing
gtdb		gtdb
images		images
layers		layers
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
Cópia_de_transfer_learning.ipynb		Cópia_de_transfer_learning.ipynb
LICENSE		LICENSE
README.md		README.md
dir_struct		dir_struct
requirements.txt		requirements.txt
ssd.py		ssd.py
test.py		test.py
test_pdf		test_pdf
testing_data		testing_data
testing_data_turing		testing_data_turing
train.py		train.py
train_pdf		train_pdf
training_data		training_data
validation_data		validation_data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScanSSD: Scanning Single Shot Detector for Math in Document Images

Table of Contents

Installation

Code Organization

Setting up data for training

Training ScanSSD

Pre-Trained weights

Testing

Stitching the patch level results

Evaluate

Performance

TFD-ICDAR 2019 Version1 Test

FPS

Related publications

Acknowledgements

Comentários das alterações no branch

About

Releases

Packages

Languages

License

marcoaleixo/ScanSSD

Folders and files

Latest commit

History

Repository files navigation

ScanSSD: Scanning Single Shot Detector for Math in Document Images

Table of Contents

Installation

Code Organization

Setting up data for training

Training ScanSSD

Pre-Trained weights

Testing

Stitching the patch level results

Evaluate

Performance

TFD-ICDAR 2019 Version1 Test

FPS

Related publications

Acknowledgements

Comentários das alterações no branch

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages