Skip to content

Latest commit

 

History

History
22 lines (20 loc) · 1.61 KB

README.md

File metadata and controls

22 lines (20 loc) · 1.61 KB

deep_turing_ocr

A deep neural network for performing OCR on handwritten Turing Machine definitions. It is meant for teachers to evaluate handwritten Turing Machines more quickly. It comes with a web frontend interface to interact with the model. You can:

  • Load a jpeg crop of a list of handwritten Turing Machine states.
  • Request bounding boxes around them, and manually adjust them.
  • Request OCR predictions of the model, and manually adjust them.
  • Save the new data to a dataset for posterior re-training of the model.
  • Append the lines to a text file for posterior simulation of the Turing Machine.

Structure

The project consists of a backend built with flask, which loads the model and responds to http requests, and a frontend built with web tools, which sends requests to communicate with the model. Inside the notebooks folder, you can find the jupyter notebooks that I used to preprocess the data and train the model.

Run it yourself!

To run this project, please clone the repository locally, and follow these steps:

  • Install Tesseract 4 following the instructions on https://github.com/tesseract-ocr/tesseract/wiki.
  • Get a Python virtual environment with Python 3.6 (with conda, this would be for example: conda create -n turing_ocr python=3.6).
  • Install all required packages with pip -r requirements.txt. Unfortunately some packages are not in the anaconda or conda_forge repositories, so pip is required.
  • Execute the command flask run to start the backend.
  • Open the file client/index.html in a browser.
  • Load a file from the folder evaluate, where I already put some sample crops.
  • Enjoy!