This project is dedicated to developing a robust pipeline for license plate recognition (LPR) in Bangladesh. The LPR process involves two crucial steps: first, detecting the region of an image where the license plate is located, and second, performing Optical Character Recognition (OCR) to extract the text. The pipeline can be summarized as follows:
graph LR
Image-->yolo[Detection]-->crop[Cropped License Plate]-->ocr[Text Recognition]
yolov8 is used for detection of the license plate. yolo can already detect license plates but it can be fine tuned on a particular dataset as well. models/yolo.pt
is the model fine tuned on some images of vehicles containing bangla license plates.
EasyOCR library is used for the text recognition part. They already have a model for recognizing bengali characters. The model used by EasyOCR can also be fine tuned using this repository. The model in models/EasyOCR/models
is the fine tuned version of the pretrained bengali model on a custom dataset.
License plates found in vehicles in bangladesh has two lines in it. The first line contains the area name and the vehicle class and the second lines contains a 6 digit number in the format dd-dddd
. The number of possible area names are finite, so a list of all possible area names is used to correct minor mistakes made by the ocr model. This is done by using pythons difflib library which has the get_close_matches
method. This method returns the string in the list that is the closest to the input string.
graph TB
Image-->ocr[OCR Model]-->Text[Recognized Text]
Text-->first[Extraction of 1st line]-->difflib[Get closest valid area name]
Text-->second[Extraction of 2nd line]-->num[Format number as dd-dddd]
Convenient utility functions are provided to execute the full pipeline of detection and recognition.
For this to work, you need to have all the dependencies installed.
First create a virtual environment.
python -m venv ./
Activate the virtual environment
./Scripts/activate
Then install the dependencies.
pip install -r necessary-requirements-to-run.txt
Download and extract the model files
gdown 1ujSYC3tEC3VNoxqUIWO2PG7Lg5HAdYqn
unzip models.zip
For windows
tar -xf models.zip
Now test with the demo image.
import utils
utils.detect_and_extract_lp_text('images/test_image.jpg')
The code was tested with Python 3.11.5.
You can train the recognition model either from scratch or fine tune it. The code for training is in this repository. You can follow this discussion and configure the parameters accordingly.
After creating your dataset, you can start fine tuning with by calling the train.py
script.
python train.py --batch_size 16 --num_iter 1500 --valInterval 50 --FT --saved_model='bengali.pth' --workers 2 --Transformation 'TPS' --FeatureExtraction 'ResNet'
The saved_model argument specifies the model that you want to fine tune. In this case it's this bengali model. The model in models/EasyOCR/models/bn_license_tps.pth
is the result of fine tuning that pretrained model on this dataset. Here are the results of experimenting with different hyper parameters.
Transformation | Batch size | Similarity Measure (%) |
---|---|---|
TPS | 16 | 94.91 |
TPS | 32 | 91.64 |
TPS | 64 | 91.37 |
None | 10 | 91.86 |
None | 64 | 90.78 |
None | 192 | 90.08 |
The similarity measure is calculated using difflib.SequenceMatcher class which has a ratio method. This method returns a floating point number in the range [0, 1] indicating how similar two strings are. It produces almost the same result when compared to levenshtein distance also known as edit distance. These experiments were done using google colab.
This a work in progress. There are some issues with using compatible font for bengali. This output was obtained from running yolo detection in google colab and modifying the annotation portion of the detection code.