Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Variable again, russian chars #120

Closed
Viton-zizu opened this issue Sep 10, 2014 · 9 comments
Closed

Set Variable again, russian chars #120

Viton-zizu opened this issue Sep 10, 2014 · 9 comments
Labels

Comments

@Viton-zizu
Copy link

This "SetVariable" not work, how i can do white list chars?
engine.SetVariable("tessedit_char_whitelist", "АБВГД...etc");

@charlesw
Copy link
Owner

Try encoding the value as an ansii value using unicode escape sequences. I
thought id done this automatically but the code must be in the 3.03 branch.
On 11 Sep 2014 09:22, "Viton-zizu" notifications@github.com wrote:

This "SetVariable" not work, how i can do white list chars?
engine.SetVariable("tessedit_char_whitelist", "АБВГД...etc");


Reply to this email directly or view it on GitHub
#120.

@charlesw
Copy link
Owner

@Viton-zizu
Copy link
Author

try this, not work
engine.SetVariable("tessedit_char_whitelist", "\u0410");
"\u0410" = "А" russian letter

@AndreyAkinshin
Copy link
Contributor

I have the same problems with Russian symbols.
Transform to UTF-8 doesn't help.
Version 3.03 doesn't help too.

But I think I know the solution.
Check out this StackOverflow question: http://stackoverflow.com/questions/9794029/python-tesseract-ocr-get-digits-only

The line

SetVariable("tessedit_char_whitelist", someChars);

should be run before initializing.

In the previous version of the Tesseract wrapper, the initialization method was existed separately from constructor. So, I could do this:

engine = new TesseractEngine(@"./tessdata", "rus", EngineMode.Default)
engine.SetVariable("tessedit_char_whitelist", rusChars);
engine.Init();

But in the current version I can't do it because the initialization method was moved into the constructor. Please, fix it.

@charlesw
Copy link
Owner

Okay, I'm going to have a look into this today.
On 20 Sep 2014 00:34, "Andrey Akinshin" notifications@github.com wrote:

I have the same problems with Russian symbols.
Transform to UTF-8 doesn't help.
Version 3.03 doesn't help too.


Reply to this email directly or view it on GitHub
#120 (comment).

@charlesw
Copy link
Owner

Same issue as Issue #68, I'll backport the fix from 3.03 and see if that helps.

@AndreyAkinshin
Copy link
Contributor

It works now, thanks. Can you merge it into master branch and publish via NuGet?

@charlesw
Copy link
Owner

Yes, I'll look at doing that tomorrow want to do a little more testing
first as there's quite a few changes since last release.
On 20 Sep 2014 18:35, "Andrey Akinshin" notifications@github.com wrote:

It works now, thanks. Can you merge it into master branch and publish via
NuGet?


Reply to this email directly or view it on GitHub
#120 (comment).

@Viton-zizu
Copy link
Author

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants