Using free OCR in Ubuntu

The problem: you have image containing text (called input-image.png in this example) and you want to extract its text into ordinary plain text file using OCR.

Tested on Ubuntu Ubuntu 16.04.1 LTS

Setup

Install following packages (support for English and Czech languages):

sudo apt-get install tesseract-ocr-ces tesseract-ocr tesseract-ocr-eng

Verify list of recognized languages:

tesseract --list-langs
   List of available languages (4):
   equ
   ces
   eng
   osd

Run OCR

Use this example to process input-image.png containing Czech characters and outputing results into /tmp/output.txt file (standard UTF-8 encoding):

tesseract input-image.png /tmp/output.txt -l ces

Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using free OCR in Ubuntu

Setup

Run OCR

Clone this wiki locally