deep_turing_ocr

A deep neural network for performing OCR on handwritten Turing Machine definitions. It is meant for teachers to evaluate handwritten Turing Machines more quickly. It comes with a web frontend interface to interact with the model. You can:

Load a jpeg crop of a list of handwritten Turing Machine states.
Request bounding boxes around them, and manually adjust them.
Request OCR predictions of the model, and manually adjust them.
Save the new data to a dataset for posterior re-training of the model.
Append the lines to a text file for posterior simulation of the Turing Machine.

Structure

The project consists of a backend built with flask, which loads the model and responds to http requests, and a frontend built with web tools, which sends requests to communicate with the model. Inside the notebooks folder, you can find the jupyter notebooks that I used to preprocess the data and train the model.

Run it yourself!

To run this project, please clone the repository locally, and follow these steps:

Install Tesseract 4 following the instructions on https://github.com/tesseract-ocr/tesseract/wiki.
Get a Python virtual environment with Python 3.6 (with conda, this would be for example: conda create -n turing_ocr python=3.6).
Install all required packages with pip -r requirements.txt. Unfortunately some packages are not in the anaconda or conda_forge repositories, so pip is required.
Execute the command flask run to start the backend.
Open the file client/index.html in a browser.
Load a file from the folder evaluate, where I already put some sample crops.
Enjoy!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

deep_turing_ocr

Structure

Run it yourself!

Files

README.md

Latest commit

History

README.md

File metadata and controls

deep_turing_ocr

Structure

Run it yourself!