Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker has no effect in time inferent #2811

Open
ToanLyHoa opened this issue Nov 27, 2023 · 4 comments
Open

Worker has no effect in time inferent #2811

ToanLyHoa opened this issue Nov 27, 2023 · 4 comments

Comments

@ToanLyHoa
Copy link

ToanLyHoa commented Nov 27, 2023

📚 The doc issue

No different time inferent when I change minWorkers, maxWorkers from 1 to 2 or 3. Three different worker give me a same time to handle 1000 request by using multi.thread. This is a bug or I have mistake when setting config.properties?

My config.properties:

inference_address=http://0.0.0.0:8000
management_address=http://0.0.0.0:8001
metrics_address=http://0.0.0.0:8002
grpc_inference_port=7000
grpc_management_port=7001

cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core
number_of_gpu = 0

models= {"my_tc": {"1.0": {"marName": "my_text_classifier.mar","minWorkers": 2,"maxWorkers": 2,"batchSize": 16,"maxBatchDelay": 20,"deviceType": "cpu"}}}

My code:

    import threading
    def temp():
        a =  requests.post("http://127.0.0.1:8000/predictions/my_tc", data = 'boxset')
        print(a.text)
        print(a.status_code)
    list_thread = [threading.Thread(target = temp) for i in range(100)]
    start = time()
    for thread in list_thread:
       thread.start()
    for thread in list_thread:
       thread.join()
    print(time() - start)

Suggest a potential alternative/fix

No response

@ToanLyHoa ToanLyHoa reopened this Nov 27, 2023
@agunapal
Copy link
Collaborator

Hi @ToanLyHoa You can refer to this PR which improved the CPU performance in our nightly benchmark #2166

You can use torchserve's benchmarking tool to configure the num_workers and benchmark performance

@ToanLyHoa
Copy link
Author

Hi @agunapal Can you answer me this question? If my model inferent 100 requests in 1 second with worker = 1, so if worker = 2 I can solve 100 requests in 0.5 second, do I understand it right?

@ToanLyHoa
Copy link
Author

ToanLyHoa commented Nov 28, 2023

@agunapal I tried to send 1000 request for worker=1, batchsize = 16 for my_text_classifier and it throw status_cpde 503 it said: "Model "my_tc" has no worker to serve inference request. Please use scale workers API to add workers." But when I set and worker = 16, batchsize = 1 there no status_code 503 anymore. But the speed of worker=1, batchsize = 16 same with worker = 16, batchsize = 1 or even worker = 16, batchsize = 16 when I send 100 requests. I thought speed have to different about x16 when I change worker and batchsize. Can you tell me what I misunderstand about torchserve.

@agunapal
Copy link
Collaborator

agunapal commented Nov 28, 2023

@ToanLyHoa This is a bit more complicated. Writing a custom tool to measure this would require good understanding of TorchServe. Ex: For processing requests, the frontend has a queue size of 100 by default. So, depending on the model processing time, if you send 1000 requests concurrently, 900 of these can be dropped. So, you need to design the client taking this into account.
There is nothing preventing multiple workers in TorchServe to process requests simultaneously. You will notice this more prominently in a multi GPU setup. Depending on what CPU you are using and how the OS schedules these processes, you could see the perf improvement. You can refer to how to improve perf on intel CPUs in this blog https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html

Finally, I would recommend using this benchmarking tool that we have. You can see an example here https://github.com/pytorch/serve/tree/master/examples/benchmarking/resnet50

You can set the number of workers, batch_size and see the effect it has on throughput/ latency.

Also, I noticed that you have a max_batch_delay set to 20. I would increase this . Also, you can add prints in your handler to see how many requests are being batched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants