Skip to content

Julia bindings for the Tesseract Library and to a lesser extent the Leptonica library.

License

Notifications You must be signed in to change notification settings

pixel27/Tesseract.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tesseract.jl

This Julia packages provides support for performing OCR on scanned images. This is done by using the Tesseract C library. Tesseract.jl tries to provide a direct mapping of the Tesseract API to Julia with additional functionality added to fit better into the Julia ecosystem.

using Tesseract

# Generate some pages to load.
write("page01.tiff", sample_tiff())
write("page02.tiff", sample_tiff())
write("page03.tiff", sample_tiff())

# Download the Tesseract English data files
download_languages("eng")

# Initialize the library to generate a text file.
instance = TessInst("eng")
pipeline = TessPipeline(instance)

tess_pipeline_text(pipeline, "My Book.txt")

# Process all the pages in the book.
tess_run_pipeline(pipeline, "My First Book") do add
    add(pix_read("page01.tiff"), 72)
    add(pix_read("page02.tiff"), 72)
    add(pix_read("page03.tiff"), 72)
end

# The results will be saved in "My Book.txt".
println("My Book.txt: $(filesize("My Book.txt")) bytes.")

# output

My Book.txt: 123

About

Julia bindings for the Tesseract Library and to a lesser extent the Leptonica library.

Resources

License

Stars

Watchers

Forks

Packages

No packages published