Extracting data from Images using OCR-NER and Machine Learning Techniques

This is a project that demonstrates the use of OCR-NER and machine learning techniques to extract data from images. The project involves the use of several libraries, including numpy, pandas, cv2, ocr-ner, and spacy.

Installation

To run this project, you must have Python 3.x installed on your computer. You can install the necessary libraries by running the following command in your terminal or command prompt:

pip install numpy pandas opencv-python ocr-ner spacy

You must also download the necessary data files for the spacy library. To do this, run the following command in your terminal or command prompt:

python -m spacy download en_core_web_sm

Usage

To use this project, you must first download or clone the repository to your computer. Then, change the file name you want to image.jpeg. Once you have done this, navigate to the project directory and run the following command in your terminal or command prompt:

python preprocess.py

This will run the program and extract the data from the sample image provided in the images directory. You can modify the program to extract data from other images by changing the image file path in the main function of the main.py file.

Contributing

If you would like to contribute to this project, you can fork the repository and make any changes or improvements you see fit. Once you have made your changes, submit a pull request and we will review it as soon as possible.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Selected		Selected
__pycache__		__pycache__
data		data
output		output
.gitignore		.gitignore
Data_Prep.ipynb		Data_Prep.ipynb
Data_Preprocessing.ipynb		Data_Preprocessing.ipynb
LICENSE		LICENSE
Predictions.ipynb		Predictions.ipynb
Pytesseract.ipynb		Pytesseract.ipynb
README.md		README.md
base_config.cfg		base_config.cfg
businessCard.csv		businessCard.csv
businessCard.txt		businessCard.txt
config.cfg		config.cfg
output.ipynb		output.ipynb
predictions.py		predictions.py
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting data from Images using OCR-NER and Machine Learning Techniques

Installation

Usage

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

Dharniesh/Extract-Text-Data-from-Document

Folders and files

Latest commit

History

Repository files navigation

Extracting data from Images using OCR-NER and Machine Learning Techniques

Installation

Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages