Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for setting parameters for single recognize job when using scheduler #665

Closed
Balearica opened this issue Sep 18, 2022 · 1 comment
Closed

Comments

@Balearica
Copy link
Member

At present, using a scheduler assumes that all workers are fungible, so any job can be sent to any worker. This means that any tasks that require 2+ "jobs" being sent to the same worker cannot be accomplished when using a scheduler. For example, in Issue #488 users point out that PDFs cannot be created using a scheduler (as this requires a recognize and getPDF job sent to the same worker).

Along with getPDF, the one other function where this seems unnecessarily restrictive is setParameters. It is useful to be able to run different recognition jobs with different settings (e.g. perhaps different types of images should be treated differently, or images with poor OCR results can be re-tried with different parameters). We should figure out how to let users set options when using schedulers (without changing the settings for all workers/jobs).

Balearica added a commit that referenced this issue Nov 25, 2022
See #662 for explanation of Tesseract.js Version 4 changes.  List below is auto-generated from commits. 

* Added image preprocessing functions (rotate + save images)

* Updated createWorker to be async

* Reworked createWorker to be async and throw errors per #654

* Reworked createWorker to be async and throw errors per #654

* Edited detect to return null when detection fails rather than throwing error per #526

* Updated types per #606 and #580 (#663) (#664)

* Removed unused files

* Added savePDF option to recognize per #488; cleaned up code for linter

* Updated download-pdf example for node to use new savePDF option

* Added OutputFormats option/interface for setting output

* Allowed for Tesseract parameters to be set through recognition options per #665

* Updated docs

* Edited loadLanguage to no longer overwrite cache with data from cache per #666

* Added interface for setting 'init only' options per #613

* Wrapped caching in try block per #609

* Fixed unit tests

* Updated setImage to resolve memory leak per #678

* Added debug output option per #681

* Fixed bug with saving images per #588

* Updated examples

* Updated readme and Tesseract.js-core version
@Balearica
Copy link
Member Author

Closing as this was added in Version 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant