OCR to detect and recognize dot-matrix text written with inkjet-printed on medical PVC bag
Example images:
The code pre-processes images with the OpenCV function to improve text detection with CRAFT with (https://github.com/clovaai/CRAFT-pytorch/blob/master/README.md#craft-character-region-awareness-for-text-detection)
The weights of pre-train network are available on this link https://drive.google.com/file/d/1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ/view.
The recognize label is a string of the text, so the CRAFT parameters are set to find a unique block of text. it is possible to change --text_threshold
,--low_text
,--link_threshold
to have different detection results, but it is necessary to modify the label and recognition method after.
Craft results:
The code extract the area around text on original image and fix the text oriention.
The cropped image:
Morphology Transformations (OpenCV function) and rescaling of chars with different parameters are applied to the cropped image.
Pre-process cropped image:
I use Tesseract OCR engine (https://tesseract-ocr.github.io/) with default page segmentation , the experiments show the LCDDot_FT_500.traineddata performs the best results in this case. Two methods are used to control the label:
- SequenceMatcher is a class available in python module named difflib. It can be used for comparing pairs of input sequences. With the function ratio( ) returns the similarity score ( float in [0,1] ) between input strings. It sums the sizes of all matched sequences returned by function.
- Regular expression is a class available in python module named re. The function re.match() checks for a match only at the beginning of the string.
Saving all result in json file:
{
"Name_original_file": "A_0.png",
"Name_preprocess": "_preprocess_150.jpg",
"check_label": "LOTTO:L21X45SCAD.:10-2023",
"tesseract_LCDDot_FT_500_psm3_result": "LOTTO:L21X45SCAD.:10-2023",
"LCDDot_FT_500_psm3_sequence_matcher_ratio_result": 1.0,
"LCDDot_FT_500_psm3_bool_re_result": true
}
],
[
{
"Name_original_file": "A_0.png",
"Name_preprocess": "_preprocess_160.jpg",
"check_label": "LOTTO:L21X45SCAD.:10-2023",
"tesseract_LCDDot_FT_500_psm3_result": "LOTTO:L21X4SCAD.:1625555",
"LCDDot_FT_500_psm3_sequence_matcher_ratio_result": 0.78,
"LCDDot_FT_500_psm3_bool_re_result": false
}
],
[
{
"Name_original_file": "A_0.png",
"Name_preprocess": "_preprocess_170.jpg",
"check_label": "LOTTO:L21X45SCAD.:10-2023",
"tesseract_LCDDot_FT_500_psm3_result": "LOTTO:L21X45SCAD.:10-2023",
"LCDDot_FT_500_psm3_sequence_matcher_ratio_result": 1.0,
"LCDDot_FT_500_psm3_bool_re_result": true
}
- PyTorch>=1.9.0
- torchvision>=0.2.2
- opencv-python>=4.5.2
conda env create -f environment.yml
python Test_Image.py --image [folder path to test images] --folder_res [folder path to save result images] --label [string label to check]
-
--image
: folder path to test images -
--label
: string label to check -
--folder_res
: folder path to save result images -
--trained_model
: pretrained model -
--text_threshold
: text confidence threshold -
--low_text
: text low-bound score -
--link_threshold
: link confidence threshold -
--cuda
: use cuda for inference (default:True) -
--canvas_size
: max image size for inference -
--mag_ratio
: image magnification ratio -
--poly
: enable polygon type result -
--show_time
: show processing time -
--test_folder
: folder path to input images -
--refine
: use link refiner for sentense-level dataset -
--refiner_model
: pretrained refiner model