A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background.
This algorithm is based on several papers, and was implemented in C/C++.
- Install OpenCV; put the
opencv
directory intoC:\tools
- You can install it manually from its Github repo, or
- You can install it via Chocolatey:
choco install opencv
, or - If you already have OpenCV, edit
CMakeLists.txt
and changeWIN_OPENCV_CONFIG_PATH
to where you have it
- Use CMake to generate the project files
cd Scene-text-recognition mkdir build-win cd build-win cmake .. -G "Visual Studio 15 2017 Win64"
- Use CMake to build the project
cmake --build . --config Release
- Find the binaries in the root directory
cd .. dir | findstr scene
- To execute the
scene_text_recognition.exe
binary, use its wrapper script; for example:.\scene_text_recognition.bat -i res\ICDAR2015_test\img_6.jpg
- Install OpenCV; refer to OpenCV Installation in Linux
- Use CMake to generate the project files
cd Scene-text-recognition mkdir build-linux cd build-linux cmake ..
- Use CMake to build the project
cmake --build .
- Find the binaries in the root directory
cd .. ls | grep scene
- To execute the binaries, run them as-is; for example:
./scene_text_recognition -i res/ICDAR2015_test/img_6.jpg
The executable file scene_text_recognition
must ultimately exist in the project root directory (i.e., next to classifier/
, dictionary/
etc.)
./scene_text_recognition -v: take default webcam as input
./scene_text_recognition -v [video]: take a video as input
./scene_text_recognition -i [image]: take an image as input
./scene_text_recognition -i [path]: take folder with images as input,
./scene_text_recognition -l [image]: demonstrate "Linear Time MSER" Algorithm
./scene_text_recognition -t detection: train text detection classifier
./scene_text_recognition -t ocr: train text recognition(OCR) classifier
- Put your text data to
res/pos
, non-text data tores/neg
- Name your data in numerical, e.g.
1.jpg
,2.jpg
,3.jpg
, and so on. - Make sure
training
folder exist - Run
./scene_text_recognition -t detection
mkdir training
./scene_text_recognition -t detection
- Text detection classifier will be found at
training
folder
- Put your training data to
res/ocr_training_data/
- Arrange the data in
[Font Name]/[Font Type]/[Category]/[Character.jpg]
, for instanceTime_New_Roman/Bold/lower/a.jpg
. You can refer tores/ocr_training_data.zip
- Make sure
training
folder exist, and putsvm-train
to root folder (svm-train will be build by the system and should be found at build/) - Run
./scene_text_recognition -t ocr
mkdir training
mv svm-train scene-text-recognition/
scene_text_recognition -t ocr
- Text recognition(OCR) classifier will be fould at
training
folder
The algorithm is based on an region detector called Extremal Region (ER), which is basically the superset of famous region detector MSER. We use ER to find text candidates. The ER is extracted by Linear-time MSER algorithm. The pitfall of ER is repeating detection, therefore we remove most of repeating ERs with non-maximum suppression. We estimate the overlapped between ER based on the Component tree. and calculate the stability of every ER. Among the same group of overlapped ER, only the one with maximum stability is kept. After that we apply a 2-stages Real-AdaBoost to fliter non-text region. We choose Mean-LBP as feature because it's faster compare to other features. The suviving ERs are then group together to make the result from character-level to word level, which is more instinct for human. Our next step is to apply an OCR to these detected text. The chain-code of the ER is used as feature and the classifier is trained by SVM. We also introduce several post-process such as optimal-path selection and spelling check to make the recognition result better.