Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Init Only Variables #70

Closed
DuNuNuBatman opened this issue Jan 29, 2014 · 11 comments
Closed

Set Init Only Variables #70

DuNuNuBatman opened this issue Jan 29, 2014 · 11 comments

Comments

@DuNuNuBatman
Copy link

Is is possible to set an init only variable before creating a new instance of TesseractEngine?

The examples I'm looking at are load_system_dawg and load_freq_dawg.
https://code.google.com/p/tesseract-ocr/wiki/ControlParams

@charlesw
Copy link
Owner

Sorry at the moment this is not possible since I haven't exposed this part of the api. Which do you think would be more suitable overloading the TesseractEngine constructor to take a dictionary of name/value pairs or a list of properties (Name + Value)?

@DuNuNuBatman
Copy link
Author

Yeah, I think something like an IDictionary would be simple and straightforward for users.
I'm not sure what you mean by a list of properties. Unless you mean something like...

public class ControlParameters
{
    // Commonly used settings
    public bool LoadSystemDawg { get; set; }
    public bool LoadFreqDawg { get; set; }
    public string UserWordsSuffix { get; set; }

    // User Settings? I don't know the details of Tesseract...
    public Dictionary<string, string> OtherParametersThatExistButIDontKnowWhatTheyAre { get; set; }
}

public TesseractEngine(string datapath, string language, EngineMode engineMode = EngineMode.Default, ControlParameters parameters = null)

In which case I would say the parameters class would be awesome. I they would all be in a single place and you wouldn't have to dig through documentation to figure them out.

@charlesw
Copy link
Owner

No I meant something like this:

public class TesseractProperty
{
    public static TesseractProperty Create(string name, string value)
    {
        return new TesseractProperty() {
            Name = name,
            Value = value,
        };
    }

    // other versions of Create for supported value types (long etc)

    public string Name { get; private set; }
    public object Value { get; private set; }
}

// Tesseract constructor
TesseractEngine(string datapath, string language, EngineMode engineMode, params TesseractProperty[] properties)

However I think a simple IDictionary<string, object> would be better.

@DuNuNuBatman
Copy link
Author

I agree. Dictionary is easy and everyone already knows how to use it!

Thanks

Sent from my Windows Phone


From: Charles Weldmailto:notifications@github.com
Sent: ý1/ý29/ý2014 6:05 PM
To: charlesw/tesseractmailto:tesseract@noreply.github.com
Cc: Joshua Dalemailto:jdale@itfoundry.net
Subject: Re: [tesseract] Set Init Only Variables (#70)

No I meant something like this:

public class TesseractProperty
{
public static TesseractProperty Create(string name, string value)
{
return new TesseractProperty() {
Name = name,
Value = value,
};
}

// other versions of Create for supported value types (long etc)

public string Name { get; private set; }
public object Value { get; private set; }

}

// Tesseract constructor
TesseractEngine(string datapath, string language, EngineMode engineMode, params TesseractProperty[] properties)

However I think a simple IDictionary would be better.


Reply to this email directly or view it on GitHubhttps://github.com//issues/70#issuecomment-33650690.

@charlesw
Copy link
Owner

Quick update in that I've had a look into implementing this. Unfortunately I believe tesseract doesn't expose the functions necessary to do this currently through their C API. I've posted a message on their forum, https://groups.google.com/forum/#!topic/tesseract-ocr/4n876ZNaUrg, to see if we can come up with a potential solution but for now it looks like this is a no go.

@charlesw
Copy link
Owner

Opps I got the wrong forum, the discussion can be found here: https://groups.google.com/forum/#!topic/tesseract-dev/1YEXPaQVR4E

@charlesw
Copy link
Owner

charlesw commented Feb 2, 2014

Ok, thanks to Zdenko we've made the required changes to the CAPI and can now pass in parameters on init as of commit 4cfa996. Please note as this required changes to the tesseract library this will have to wait till 3.03 is officially released. However if you need this functionality now you can the dev_3.03 branch and build it yourself.

@charlesw charlesw closed this as completed Feb 2, 2014
@charlesw charlesw reopened this Feb 2, 2014
@DuNuNuBatman
Copy link
Author

Awesome! Thanks for looking into this and adding it so quickly!

@redbaran
Copy link

This is something I could use as well, so hopefully 3.03 isn't too far off. I see it's in RC, but given that it looks like 3.01 was released in Oct 2011 and 3.02 was released in Oct 2012, we may be in for a wait.

In looking at the tesseract API, it looks like the C++ api supports setting INIT variables. Is the C api second class and would it be worthwhile to switch to the C++ api?

Good work on this lib though, it's been a big help.

@charlesw
Copy link
Owner

Yes tesseract does have a fairly long release cycle and their C++ api is
the primary one. However .net doesn't natively support c++ interop. While
it is possible it generally involves either dealing with c++ name wrangling
or writing the wrapper using managed c++, or whatever they're calling it
these days, which isn't portable (only available on Windows). Both of which
I'm not to interested in pursuing.

On the plus side I've been considering making a 3.03 based release anyway
soonish and maybe just marking it as a prerelease but we'll see.
On 30 Jul 2014 06:03, "redbaran" notifications@github.com wrote:

This is something I could use as well, so hopefully 3.03 isn't too far
off. I see it's in RC, but given that it looks like 3.01 was released in
Oct 2011 and 3.02 was released in Oct 2012, we may be in for a wait.

In looking at the tesseract API, it looks like the C++ api supports
setting INIT variables. Is the C api second class and would it be
worthwhile to switch to the C++ api?

Good work on this lib though, it's been a big help.


Reply to this email directly or view it on GitHub
#70 (comment).

@redbaran
Copy link

redbaran commented Aug 1, 2014

I can't say I blame you, the C++/CLI stuff sometimes seems like an unholy union. Thanks for providing this library, it was a big help. We are at about 97% accuracy using it to parse what are essential serial numbers and we'd be approaching 100% if we could turn off these settings (load_system_dawg and load_freq_dawg) which help since we aren't parsing words. At least that's my understanding anyways.

If you decide to go the prerelease route that incorporates 3.03, I'd be all over that. I do see where they indicate some of the latest distros (Ubuntu 14.04) are shipping 3.03, so apparently it's stable enough for them. Not surprising that it's stable given the slow release cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants