OCRVisualizer is a tool to visualize Microsoft Cognitive Services OCR API json output to get familiar with bounding boxes of Regions, Lines and Words. It's written in C#/WPF.
This tool will be helful for your data discovery, if you use OCR with your documents.
The new preview OCR engine is integrated (through "Recognize Text" API operation) with even better text recognition results for English.
In this version;
- You can see bounding boxes of
Regions
,Lines
andWords
- You can select layers of bounding boxes to visualize under
OCR Text Visibility
menu - You can see extracted text over your original document
- You can extract full text as output
- You extract Key-Value Pairs
- Better results with new preview OCR engine (through "Recognize Text", only in English)
If you would like to test OCR visualizer on your local, download Setup File and update with your subscription Key
and endpoint
in OCR Settings
. After defining your endpoint you can visualize your OCR documents via Browse
.
Please change in App.config
file below code snippet with your Cognitive Services Computer Vision API subscription key and if your service hosted other than northeurope
region, change the region with yours.
Microsoft Cognitive Services Computer Vision Endpoint details.
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.1" />
</startup>
<appSettings>
<add key="subscriptionKey" value="YOUR_COMPUTER_VISION_API_KEY" />
<add key="endpointRegion" value="https://northeurope.api.cognitive.microsoft.com/vision/v2.0/ocr" />
<add key="documentLanguage" value="unk" />
<add key="searchValues" value="Number,Field,Source" />
<add key="searchValuesWidth" value="300" />
</appSettings>
</configuration>
If you're looking for a specific value of a field, you can use this feature. After defining your field names, it'll look for a certain pixel to retrieve group of text as a value. Here is how it works.
Now you can define your Field Extraction
under Key-Value pairs in Manage Key-Value
under Field Extraction
menu and values of these fields will be detected by their positions on the document and detected values will be listed in same panel.
Here are some examples of of output of documents.
Extract Key-Value Pairs
Now you can select layers of bounding boxes to visualize under OCR Text Visibility
menu
OCR for unstuctured documents.
For more information about Optical character recognition (OCR) in images | Demo | Container Support
- unk (AutoDetect)
- zh-Hans (ChineseSimplified)
- zh-Hant (ChineseTraditional)
- cs (Czech)
- da (Danish)
- nl (Dutch)
- en (English)
- fi (Finnish)
- fr (French)
- de (German)
- el (Greek)
- hu (Hungarian)
- it (Italian)
- ja (Japanese)
- ko (Korean)
- nb (Norwegian)
- pl (Polish)
- pt (Portuguese,
- ru (Russian)
- es (Spanish)
- sv (Swedish)
- tr (Turkish)
- ar (Arabic)
- ro (Romanian)
- sr-Cyrl (SerbianCyrillic)
- sr-Latn (SerbianLatin)
- sk (Slovak)
Thanks.