Here Script use Google Vision api
to extract Text Annotations
in images.
- Python 3.x
- Credentials
To install necessary library, simply use pip:
pip install google-cloud-vision
or,
pip install -r requirements.txt
Next, set up to authenticate with the Cloud Vision API using your project's service account credentials. See the Vision API Client Libraries for more information. Then, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to your downloaded service account credentials:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials-key.json
Text Detection
~ python vision.py --images ./images
Test Result
~ python test.py --gt ./ --output ./image_test --number_test 2
Convert to Pascal VOC data format
~ python convert_pascal_format.py --output output --input images --gt_path gt.pkl
To enable accurate image detection within the Google Cloud Vision API, images should generally be a minimum of 640 x 480 pixels (about 300k pixels). Full details for different types of Vision API Feature requests are shown below:
Vision API Feature | Recommended Size | Notes |
---|---|---|
FACE_DETECTION | 1600 x 1200 | Distance between eyes is most important |
LANDMARK_DETECTION | 640 x 480 | |
LOGO_DETECTION | 640 x 480 | |
LABEL_DETECTION | 640 x 480 | |
TEXT_DETECTION | 1024 x 768 | OCR requires more resolution to detect characters |
SAFE_SEARCH_DETECTION | 640 x 480 |