Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine worker and scheduler interfaces #933

Open
Balearica opened this issue Jun 20, 2024 · 0 comments
Open

Combine worker and scheduler interfaces #933

Balearica opened this issue Jun 20, 2024 · 0 comments

Comments

@Balearica
Copy link
Member

At present, we have 2 interfaces for running recognition: schedulers and workers. Both interfaces are explained in this guide. I plan on combining them into a single interface, taking the syntax from the worker interface, and the "under the hood" functionality from the scheduler interface. This would not have any immediate impact on existing code, however some functions may be depreciated. I believe this will solve several issues that exist at present.

Workers are not a beginner-friendly option

Historically, schedulers have been seen as a secondary "advanced" interface for users who want to implement parallel processing. While it is true that the syntax for creating a new worker is currently simpler than making a scheduler, looking beyond the syntax, schedulers are a significantly more beginner-friendly method of managing workers than doing so directly. When creating/managing workers directly, new users are highly likely to manage workers in a way that causes one of the two issues described below.

Code that manually creates/manages many workers is frequently problematic

When users write code that creates and destroys workers within code that runs asynchronously, their code commonly allows for an undefined number of workers to exist at the same time. For example, say that a Node.js application starts a worker and runs recognition whenever an API endpoint receives an image. This would run fine on a low-traffic server, however if the endpoint was hit 20 times in a short period of time, 20 workers would be created, and the application would crash.

This issues is resolved by implementing using a scheduler, which manages a defined number of workers, which can be easily tweaked according to the capabilities of the device. While in theory a user could still crash their program by making multiple schedulers, saying "only ever use one scheduler" is simpler to communicate than explaining how to properly manage workers.

Code that manually re-uses a single worker may also be problematic

Alternatively, say that the user creates only a single worker, and re-uses it between jobs. While their code will not encounter the issue above, it can still encounter issues if their recognition code can be triggered asynchronously.

Workers are written with the assumption that only one job is run at a time. Unlike with schedulers, workers have no job queuing mechanism. When this assumption is violated, and worker.recognize is called before the previous worker.recognize job is completed, unexpected results can occur (see #875). This can be resolved by using a scheduler (even if there is only one worker), as schedulers will queue jobs.

Having multiple interfaces is confusing to new users

At present, the syntax in the README uses workers manually. If somebody copy/pastes this code into a function, they are likely to cause one of the issues described above. Transitioning to schedulers (the recommended approach) requires reading an entire separate documentation page and rewriting their code to use different syntax.

Planned Changes

The changes this would entail are described below. In short, we would combine the simple syntax of creating and using a worker, with the inner-workings of schedulers (a job queue and support for parallel jobs). I do not believe any of this is technically a breaking change, as the old functions would not go anywhere. However, users would be strongly encouraged to switch from workers to schedulers in documentation. Therefore, should probably still be a major release.

  1. Add scheduler.recognize/scheduler.detect functions that run a job.
    1. This should be the same syntax as worker.recognize/worker.detect, and minimize switching costs.
  2. Expose the workers within a scheduler
    1. This would allow for the same fine-grained control of workers that is currently possible.
    2. See Expose workers within scheduler #855
  3. Add a helper function that both creates a scheduler and fills it with workers
    1. The default would be 1 worker, which would make it a drop-in replacement for createWorker
  4. Update the documentation to instruct new users to use the new syntax
    1. The "default" method recommended to new users would go from being a single worker, to a scheduler that contains a single worker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant