Saram - Image/PDF OCR detection system

Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with support for rotation in case of wrong orientation along.

Currently in beta state

Follow: Demo run

Note: Make sure you have a OCR tool like tesseract and certain data value for comparing OCR, eg tesseract-data-eng along with Pillow and Wand for image conversion and loading which will be fetched during pip install.

For using in python: Refer to the py-module branch

Installation

Install using PIP:

$ pip install saram
$ saram <dirname>

else

Clone the source locally:

$ git clone https://github.com/aryaminus/saram
$ cd saram
$ git checkout py-module
$ python main.py <dirname>

Todo

Add support for PDF by PDF -> Image -> Txt with converted image deletion after processing
Double check for orientation in case of image and PDF
Make a PIP package
Add NLP to process the most repeated frequent characters to filer content
Add Cloud Vision support for effective character recognization
Suppot for GUI using tkinter

Reference

Contributing

Fork it (https://github.com/aryaminus/saram/fork)
Create your feature branch (git checkout -b feature/fooBar)
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.vscode		.vscode
saram		saram
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.rst		README.rst
requirements.txt		requirements.txt
run_travis.sh		run_travis.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Saram - Image/PDF OCR detection system

Installation

Todo

Reference

Contributing

About

Releases 3

Packages

Contributors 2

Languages

License

aryaminus/saram

Folders and files

Latest commit

History

Repository files navigation

Saram - Image/PDF OCR detection system

Installation

Todo

Reference

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages