support jobQueueSize and job ticket per model in model config YAML #2350

lxning · 2023-05-16T17:49:08Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)
#2322
#2333

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test jobQueueSize
Add model_config.yaml in frontend/archive/src/test/resources/models/mnist_scripted.mar for unit test.
jobQueueSize: 1
Test jobTicket

cat model_store/mnist/model-config.yaml
minWorkers: 2
maxWorkers: 2
jobQueueSize: 2
useJobTicket: true

seq 3 | xargs -n1 -P 3 curl http://localhost:8080/predictions/mnist_scripted_v2 -T test_data/0.png
{
"code": 503,
"type": "ServiceUnavailableException",
"message": "Model "mnist_scripted_v2" has no worker to serve inference request. Please use scale workers API to add workers."
}

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

codecov · 2023-05-16T18:23:37Z

Codecov Report

Merging #2350 (4f291f4) into master (7f9967e) will not change coverage.
The diff coverage is n/a.

❗ Current head 4f291f4 differs from pull request most recent head 4793c33. Consider uploading reports for the commit 4793c33 to get more accurate results

@@           Coverage Diff           @@
##           master    #2350   +/-   ##
=======================================
  Coverage   71.92%   71.92%           
=======================================
  Files          78       78           
  Lines        3648     3648           
  Branches       58       58           
=======================================
  Hits         2624     2624           
  Misses       1020     1020           
  Partials        4        4

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

andrewjguo · 2023-05-24T20:45:47Z

frontend/archive/src/main/java/org/pytorch/serve/archive/model/ModelConfig.java

+     * the job queue size of a model. By default, job_queue_size is set as 100 in config.property
+     * for all models. Here, jobQueueSize: -1 means no customized setting for the model.
+     */
+    private int jobQueueSize;


For my information, where do we default it to 100?

the default queueSize 100 is passed into Model. This customized config value is used to overwrite the queueSize in this PR.

andrewjguo · 2023-05-24T20:49:45Z

frontend/server/src/main/java/org/pytorch/serve/wlm/BatchAggregator.java

@@ -34,6 +34,11 @@ public BaseModelRequest getRequest(String threadName, WorkerState state)
        model.pollBatch(
                threadName, (state == WorkerState.WORKER_MODEL_LOADED) ? 0 : Long.MAX_VALUE, jobs);

+        if (model.isUseJobTicket() && jobs.isEmpty()) {
+            model.decNumJobTickets();


Will the NumJobTickets be negative if simply keep reducing the number

The ticket is increased when there is an idle worker coming to pollbatch.

the behind logic is to increase a ticket at pollbatch; decrease a ticket at addjob.

The corner case is there are no incoming requests. idle workers keep trying to pollbatch and get empty request. Here is used to make sure the number of tickets are always balanced (ie. +1/-1)

andrewjguo · 2023-05-24T20:54:55Z

frontend/server/src/main/java/org/pytorch/serve/wlm/Model.java

+            return false;
+        }
+
+        this.numJobTickets.decrementAndGet();


Why are we decreasing here when just checking if hasJobTickets

multi-threads can query func hasJobTickets. Decreasing in this func is used to guarantee the ticket is available for this request.

Could there potentially be a race condition here with two threads calling hasJobTickets simultaneously which could cause numJobTickets to become negative since at the time of checking, both threads may not have completed the update just yet?

this is a synchronized func.

I see, then a function name like getJobTicket or takeJobTicket would make more sense, as it's actually trying to reduce the ticket count instead of just checking

namannandan · 2023-05-24T21:18:55Z

frontend/server/src/main/java/org/pytorch/serve/wlm/Model.java

@@ -225,6 +234,10 @@ public void removeJobQueue(String threadId) {
    }

    public boolean addJob(Job job) {
+        if (isUseJobTicket() && !hasJobTickets()) {


Will a similar check for available job tickets be required in the public void addJob(String threadId, Job job) method as well?

again hasJobTickets is synchronized func

namannandan · 2023-05-24T21:23:49Z

frontend/server/src/main/java/org/pytorch/serve/wlm/Model.java

@@ -253,6 +266,9 @@ public void pollBatch(String threadId, long waitTime, Map<String, Job> jobsRepo)
        }

        try {
+            if (isUseJobTicket()) {
+                incNumJobTickets();


Will a similar update to job ticket count be required in the code segment above:

LinkedBlockingDeque<Job> jobsQueue = jobsDb.get(threadId); if (jobsQueue != null && !jobsQueue.isEmpty()) { Job j = jobsQueue.poll(waitTime, TimeUnit.MILLISECONDS); if (j != null) { jobsRepo.put(j.getJobId(), j); return; } }

Looks like we are only tracking requests on the DEFAULT_DATA_QUEUE, so it won't be required for the above case, makes sense.

HamidShojanazeri

Thanks @lxning LGTM and I think its a great feature. It would be also great if you could add an example/ extending the LLM example for custom job queue size as well.

HamidShojanazeri · 2023-06-12T19:03:49Z

docs/large_model_inference.md

@@ -194,3 +194,12 @@ torchrun:
                      # gpus you wish to split your model
    OMP_NUMBER_THREADS: 2
 ```
+#### Feature Job ticket is recommended for the use case of inference latency Sensitive


Suggested change

#### Feature Job ticket is recommended for the use case of inference latency Sensitive

#### Feature Job ticket is recommended for the use case of inference latency sensitive

HamidShojanazeri · 2023-06-12T19:31:33Z

docs/large_model_inference.md

@@ -194,3 +194,12 @@ torchrun:
                      # gpus you wish to split your model
    OMP_NUMBER_THREADS: 2
 ```
+#### Feature Job ticket is recommended for the use case of inference latency Sensitive
+When the job ticket feature is enabled, it verifies the availability of a model's active worker for processing a client's request. If an active worker is available, the request is accepted; otherwise, a 503 response is sent back to client. Here is an example of enabling job ticket.


@lxning shall we elaborate a bit more on why someone may want to use it? describing a bit of use-case/ motivation will help to raise awareness when and how user can use this feature.

added more information in this doc

HamidShojanazeri · 2023-06-12T22:02:38Z

docs/large_model_inference.md

@@ -194,3 +194,12 @@ torchrun:
                      # gpus you wish to split your model
    OMP_NUMBER_THREADS: 2
 ```
+#### Feature Job ticket is recommended for the use case of inference latency sensitive
+When the job ticket feature is enabled, TorchServe verifies the availability of a model's active worker for processing a client's request. If an active worker is available, the request is accepted and processed immediately without waiting time incurred from job queue or dynamic batching; otherwise, a 503 response is sent back to client. This feature can help the use case which is sensitive to inference latency, such as interactive code generator, chatGPT. These applications can take effective actions,for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements. Here is an example of enabling job ticket.


Suggested change

When the job ticket feature is enabled, TorchServe verifies the availability of a model's active worker for processing a client's request. If an active worker is available, the request is accepted and processed immediately without waiting time incurred from job queue or dynamic batching; otherwise, a 503 response is sent back to client. This feature can help the use case which is sensitive to inference latency, such as interactive code generator, chatGPT. These applications can take effective actions,for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements. Here is an example of enabling job ticket.

When the job ticket feature is enabled, TorchServe verifies the availability of a model's active worker for processing a client's request. If an active worker is available, the request is accepted and processed immediately without waiting time incurred from job queue or dynamic batching; otherwise, a 503 response is sent back to client.

This feature help with use-cases where inference latency can be high, such as generative models, auto regressive decoder models like chatGPT. This feature help such applications to take effective actions, for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements. Here is an example of enabling job ticket.

lxning added 2 commits May 15, 2023 09:35

add jobQsize

e6fcbd0

add jobQueueSize in model_config.yaml for mnist_scripted.mar

f06c70d

lxning added the enhancement New feature or request label May 16, 2023

lxning requested review from msaroufim and namannandan May 16, 2023 17:49

lxning self-assigned this May 16, 2023

lxning added this to the v0.8.1 milestone May 16, 2023

feature job tickets

295765b

lxning changed the title ~~support jobQueueSize per model in model config YAML~~ support jobQueueSize and job ticket per model in model config YAML May 19, 2023

Merge branch 'master' into feat/jobQ

5ec2395

andrewjguo reviewed May 24, 2023

View reviewed changes

namannandan reviewed May 24, 2023

View reviewed changes

namannandan approved these changes May 24, 2023

View reviewed changes

lxning and others added 2 commits May 25, 2023 17:49

Merge branch 'master' into feat/jobQ

3a96915

rename hasJobTickets to getJobTickets

22648bc

andrewjguo approved these changes May 25, 2023

View reviewed changes

HamidShojanazeri reviewed May 25, 2023

View reviewed changes

lxning added 2 commits June 11, 2023 13:20

fmt doc

4b8140b

update doc

7788e52

HamidShojanazeri reviewed Jun 12, 2023

View reviewed changes

lxning added 2 commits June 12, 2023 13:50

fmt doc

4f60941

fmt doc

98cf8fa

HamidShojanazeri approved these changes Jun 12, 2023

View reviewed changes

lxning and others added 2 commits June 12, 2023 16:01

update doc

9bc23eb

Merge branch 'master' into feat/jobQ

4793c33

lxning merged commit 89fbd88 into master Jun 13, 2023

lxning mentioned this pull request Jul 10, 2023

is TS_JOB_QUEUE_SIZE a valid environment variable? #2446

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support jobQueueSize and job ticket per model in model config YAML #2350

support jobQueueSize and job ticket per model in model config YAML #2350

lxning commented May 16, 2023 •

edited

Loading

codecov bot commented May 16, 2023 •

edited

Loading

andrewjguo May 24, 2023

lxning May 24, 2023

andrewjguo May 24, 2023

lxning May 24, 2023

andrewjguo May 24, 2023

lxning May 24, 2023

namannandan May 24, 2023 •

edited

Loading

lxning May 24, 2023

andrewjguo May 24, 2023 •

edited

Loading

namannandan May 24, 2023

lxning May 24, 2023

namannandan May 24, 2023

lxning May 24, 2023

namannandan May 24, 2023

HamidShojanazeri left a comment

HamidShojanazeri Jun 12, 2023

HamidShojanazeri Jun 12, 2023

lxning Jun 12, 2023

HamidShojanazeri Jun 12, 2023

	#### Feature Job ticket is recommended for the use case of inference latency Sensitive
	#### Feature Job ticket is recommended for the use case of inference latency sensitive

support jobQueueSize and job ticket per model in model config YAML #2350

support jobQueueSize and job ticket per model in model config YAML #2350

Conversation

lxning commented May 16, 2023 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

codecov bot commented May 16, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

namannandan May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewjguo May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HamidShojanazeri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxning commented May 16, 2023 •

edited

Loading

codecov bot commented May 16, 2023 •

edited

Loading

namannandan May 24, 2023 •

edited

Loading

andrewjguo May 24, 2023 •

edited

Loading