Tesseract merge script This is bash script, which extract and download images from HTML file merge images into one enlarge merged image run tesseract OCR to read the whole image Dependencies tesseract-ocr imagemagick Usage bash ./ocr-extract.sh /path/to/file.html