OCR_project/daily_logs.md at master · abhisharsinha/OCR_project · GitHub

23 Nov:

Added corpus from Project Guttenberg, and randomly generated addresses.
Added open source fonts from https://www.fontsquirrel.com/
Added images to use as background images.

24 Nov:

Generated images using text_renderer.
Created tfrecords using aocr.

25 Nov:

Started training the model.
Tested the model.
Noticed that the model was not predicting '@' symbol since the number of email addresses were rare in the dataset. So, created 2k more images with only email addresses and numbers.
Created tfrecords for new images.

26 Nov:

Merged tfrecords to make training more inclusive.
Started training model.
Finished training the model.
Tested the model.

27 Nov:

Exported the model.
Completed writing the API.

28 Nov:

Generated new data for training using different fonts and backgrounds.
Changed aocr to save loss history, train configuration and training loss plot.
Generated a small dataset of 900 images to try different model configurations and select the best one for traing with bigger dataset.
Started training small dataset using different configurations.

29 Nov- 2 Dec Training the model.