You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User cannot configure Tesseract parameters of the injected ITesseract api and must use underlying TessEngine api.
Because of this dependency injection cannot be used when user wants to change for example tessedit_char_whitelist or tessedit_char_blacklist.
Current way of doing
stringtraineddataFolder=FileSystem.Current.CacheDirectory;// Load datavartranineddataPath=Path.Combine(traineddataFolder,"eng.traineddata");if(!File.Exists(tranineddataPath)){usingStreamtraineddata=awaitFileSystem.OpenAppPackageFileAsync("eng.traineddata");FileStreamfileStream=File.Create(tranineddataPath);traineddata.CopyTo(fileStream);}// Create and configure engineusingvarengine=newTessEngine("eng",traineddataFolder);boolsuccess=engine.SetVariable("tessedit_char_whitelist","mychars");// Recognize textusingvarimage=Pix.LoadFromFile(@"\path\to\file.png");usingvarresult=engine.ProcessImage(image);stringtext=result.GetText();
Suggested fix
Addition to ITesseract API
Add optional configuration Property to pass into ITesseract
Tesseract.EngineConfiguration= engine =>{// These characters are not recognizedengine.SetVariable("tessedit_char_blacklist","bad");}varresult=awaitTesseract.RecognizeTextAsync(@"my\image\path.png");
The text was updated successfully, but these errors were encountered:
privateasyncvoidDEMO_Recognize_AsConfigured(objectsender,EventArgse){// Select image (Not important)varpath=awaitGetUserSelectedPath();if(pathisnull){return;}// this Tesseract is injected propertyTesseract.EngineConfiguration=(engine)=>{// Engine uses DefaultSegmentationMode, if no other is passed as method parameter.// If ITesseract is injected to page, this is only way of setting PageSegmentationMode.// PageSegmentationMode defines how ocr tries to look for text, for example singe character or single word.// By default uses PageSegmentationMode.Auto.engine.DefaultSegmentationMode=TesseractOcrMaui.Enums.PageSegmentationMode.Auto;engine.SetCharacterWhitelist("abcdefgh");// These characters ocr is looking forengine.SetCharacterBlacklist("abc");// These characters ocr is not looking for// Now ocr should be only finding characters 'defgh'};// Recognize image varresult=awaitTesseract.RecognizeTextAsync(path);// For this example I reset engine configuration, because same Object is used in other examplesTesseract.EngineConfiguration=null;// Show output (Not important)ShowOutput("FromPath, Configured",result);}
Problem
User cannot configure Tesseract parameters of the injected ITesseract api and must use underlying TessEngine api.
Because of this dependency injection cannot be used when user wants to change for example
tessedit_char_whitelist
ortessedit_char_blacklist
.Current way of doing
Suggested fix
Addition to ITesseract API
Add optional configuration Property to pass into ITesseract
Changes in Tesseract.cs
Method
internal RecognizionResult Recognize(Pix pix, string tessDataFolder, string[] traineddataFileNames)
Change
To
Intended use
Set configuration func before running process
The text was updated successfully, but these errors were encountered: