Skip to content

Using free OCR in Ubuntu

Henryk Paluch edited this page Sep 17, 2016 · 1 revision

The problem: you have image containing text (called input-image.png in this example) and you want to extract its text into ordinary plain text file using OCR.

Tested on Ubuntu Ubuntu 16.04.1 LTS

Setup

Install following packages (support for English and Czech languages):

sudo apt-get install tesseract-ocr-ces tesseract-ocr tesseract-ocr-eng

Verify list of recognized languages:

tesseract --list-langs
   List of available languages (4):
   equ
   ces
   eng
   osd

Run OCR

Use this example to process input-image.png containing Czech characters and outputing results into /tmp/output.txt file (standard UTF-8 encoding):

tesseract input-image.png /tmp/output.txt -l ces
Clone this wiki locally