A deep neural network for performing OCR on handwritten Turing Machine definitions. It is meant for teachers to evaluate handwritten Turing Machines more quickly. It comes with a web frontend interface to interact with the model. You can:
- Load a
jpeg
crop of a list of handwritten Turing Machine states. - Request bounding boxes around them, and manually adjust them.
- Request OCR predictions of the model, and manually adjust them.
- Save the new data to a dataset for posterior re-training of the model.
- Append the lines to a text file for posterior simulation of the Turing Machine.
The project consists of a backend built with flask
, which loads the model and responds to http requests, and a frontend built with web tools, which sends requests to communicate with the model.
Inside the notebooks
folder, you can find the jupyter notebooks that I used to preprocess the data and train the model.
To run this project, please clone the repository locally, and follow these steps:
- Install
Tesseract 4
following the instructions on https://github.com/tesseract-ocr/tesseract/wiki. - Get a Python virtual environment with
Python 3.6
(with conda, this would be for example:conda create -n turing_ocr python=3.6
). - Install all required packages with
pip -r requirements.txt
. Unfortunately some packages are not in theanaconda
orconda_forge
repositories, sopip
is required. - Execute the command
flask run
to start the backend. - Open the file
client/index.html
in a browser. - Load a file from the folder
evaluate
, where I already put some sample crops. - Enjoy!