Skip to content

Dharniesh/Extract-Text-Data-from-Document

Repository files navigation

Extracting data from Images using OCR-NER and Machine Learning Techniques

This is a project that demonstrates the use of OCR-NER and machine learning techniques to extract data from images. The project involves the use of several libraries, including numpy, pandas, cv2, ocr-ner, and spacy.

Installation

To run this project, you must have Python 3.x installed on your computer. You can install the necessary libraries by running the following command in your terminal or command prompt:

pip install numpy pandas opencv-python ocr-ner spacy

You must also download the necessary data files for the spacy library. To do this, run the following command in your terminal or command prompt:

python -m spacy download en_core_web_sm

Usage

To use this project, you must first download or clone the repository to your computer. Then, change the file name you want to image.jpeg. Once you have done this, navigate to the project directory and run the following command in your terminal or command prompt:

python preprocess.py

This will run the program and extract the data from the sample image provided in the images directory. You can modify the program to extract data from other images by changing the image file path in the main function of the main.py file.

Contributing

If you would like to contribute to this project, you can fork the repository and make any changes or improvements you see fit. Once you have made your changes, submit a pull request and we will review it as soon as possible.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

About

Extracting data from a document using Machine Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published