Skip to content

Simple micro service with gofiber and gosseract to add optical character recognition to your data pipeline.

License

Notifications You must be signed in to change notification settings

ieferrari/ocr_go_microservice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR GO Microservice

Api endpoints to call tesseract ocr functions

Getting started

With docker:

docker run -p5005:5005 ieferrari/ocr_go_microservice

Get the text from an image url:

curl -X POST \
     -d '{"msg": "https://pbs.twimg.com/media/EH-Pvo9WwAEKFwc?format=jpg&name=small"}' \
     -H "Content-Type: application/json" \
     http://127.0.0.1:5005/ocr_from_url

Basic installation

how to install tesseract

apt-get install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config

git clone https://github.com/tesseract-ocr/tesseract.git

cd tesseract ./autogen.sh ./configure make sudo make install sudo ldconfig

https://notesalexp.org/tesseract-ocr/#tesseract_5.x sudo apt-get install tesseract-ocr

go get github.com/otiai10/gosseract/v2

wget https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata tesseract --tessdata-dir . example.png outputbase -l spa --psm 3

Load test

On a 1 CPU, 1 GB RAM, vps server on Linode with Ubuntu 20.04.2 LTS

service not running running iddle high load
428 MB RAM 530 MB RAM 638 MB RAM
4.4 % CPU 5 % CPU 60 % CPU

load test


Other languages alternatives

Depending on your architecture, it may be more efficient to call tesseract from a wrapper in your preferred language. This container is an alternative if your team is having troubles installing the tesseract components for a specific language, or if you want a centralized ocr implementation in the first place.

Python example:

import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='/usr/local/bin/tesseract'
print(pytesseract.image_to_string(Image.open('./example.png'), lang='spa').replace("º", 'o'))

About

Simple micro service with gofiber and gosseract to add optical character recognition to your data pipeline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published