Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How use tesseract dictionary? #125

Closed
Viton-zizu opened this issue Oct 14, 2014 · 11 comments
Closed

How use tesseract dictionary? #125

Viton-zizu opened this issue Oct 14, 2014 · 11 comments

Comments

@Viton-zizu
Copy link

How i can use dictionary for best recognition? For example, tesseract give me result: "Роосия", but dictionary have normal version: "Роcсия" . Why my result not replaced good version of word?

@charlesw
Copy link
Owner

Maybe checkout the section on user data here:
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html

Note my reading of this is it replaces the standard dictionary rather than
augmenting/extending it though I haven't tried this before.
On 14 Oct 2014 11:32, "Viton-zizu" notifications@github.com wrote:

How i can use dictionary for best recognition? For example, tesseract give
me result: "Роосия", but dictionary have normal version: "Роcсия" . Why my
result not replaced good version of word?


Reply to this email directly or view it on GitHub
#125.

@Viton-zizu
Copy link
Author

This parametrs not work? If i set this parametrs more then default value, my words begin compare with dictionary words! But how i can do it?
language_model_penalty_non_dict_word
language_model_penalty_non_freq_dict_word

@charlesw
Copy link
Owner

Might be they are init only variables and therefore need to be set during
init. Unfortunately the capi doesn't expose that functionality in 3.02
which means I can't either. I might be able to add support for specifying a
config file to work around that issue for now.

In the meantime it might be worth trying the previously mentioned approach
using the official tesseract distribution to make sure it works.
On 15 Oct 2014 13:16, "Viton-zizu" notifications@github.com wrote:

This parametrs not work? If i set this parametrs more then default value,
my words begin compare with dictionary words! But how i can do it?
language_model_penalty_non_dict_word
language_model_penalty_non_freq_dict_word


Reply to this email directly or view it on GitHub
#125 (comment).

@Viton-zizu
Copy link
Author

Init only write NO in table here: https://code.google.com/p/tesseract-ocr/wiki/ControlParams

@charlesw
Copy link
Owner

I'll have to give it a go myself. However according to the controls page
you mentioned user_words_suffix is init only which means you're not going
to be able to load your own word list at this time.
On 16 Oct 2014 17:50, "Viton-zizu" notifications@github.com wrote:

Init only write NO in table here:
https://code.google.com/p/tesseract-ocr/wiki/ControlParams


Reply to this email directly or view it on GitHub
#125 (comment).

@Viton-zizu
Copy link
Author

do you try "tessedit_enable_dict_correction parametr"?

@charlesw
Copy link
Owner

Sorry haven't had a chance to look into this yet, should have time tonight
hopefully.
On 19 Oct 2014 09:11, "Viton-zizu" notifications@github.com wrote:

do you try "tessedit_enable_dict_correction parametr"?


Reply to this email directly or view it on GitHub
#125 (comment).

@charlesw
Copy link
Owner

Good news, I've got custom word dictionaries working by adding support for
loading config files on initialisation.

Unfortunately I can't publish the code now as I don't have an Internet
connection at home. Hopefully will be able to get the release out in a
couple of days.
On 19 Oct 2014 13:45, "Charles Weld" charles.weld@gmail.com wrote:

Sorry haven't had a chance to look into this yet, should have time tonight
hopefully.
On 19 Oct 2014 09:11, "Viton-zizu" notifications@github.com wrote:

do you try "tessedit_enable_dict_correction parametr"?


Reply to this email directly or view it on GitHub
#125 (comment).

@Viton-zizu
Copy link
Author

Oh great! I`m wait! How this will be work?

@charlesw
Copy link
Owner

For an example of loading a config file which configures tesseract to use a custom word list see EngineTests.Initialise_CanLoadConfigFile

@charlesw
Copy link
Owner

As per previous comments the enhancements to use custom dictionary will be released in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants