-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker has no effect in time inferent #2811
Comments
Hi @ToanLyHoa You can refer to this PR which improved the CPU performance in our nightly benchmark #2166 You can use torchserve's benchmarking tool to configure the num_workers and benchmark performance |
Hi @agunapal Can you answer me this question? If my model inferent 100 requests in 1 second with worker = 1, so if worker = 2 I can solve 100 requests in 0.5 second, do I understand it right? |
@agunapal I tried to send 1000 request for worker=1, batchsize = 16 for my_text_classifier and it throw status_cpde 503 it said: "Model "my_tc" has no worker to serve inference request. Please use scale workers API to add workers." But when I set and worker = 16, batchsize = 1 there no status_code 503 anymore. But the speed of worker=1, batchsize = 16 same with worker = 16, batchsize = 1 or even worker = 16, batchsize = 16 when I send 100 requests. I thought speed have to different about x16 when I change worker and batchsize. Can you tell me what I misunderstand about torchserve. |
@ToanLyHoa This is a bit more complicated. Writing a custom tool to measure this would require good understanding of TorchServe. Ex: For processing requests, the frontend has a queue size of 100 by default. So, depending on the model processing time, if you send 1000 requests concurrently, 900 of these can be dropped. So, you need to design the client taking this into account. Finally, I would recommend using this benchmarking tool that we have. You can see an example here https://github.com/pytorch/serve/tree/master/examples/benchmarking/resnet50 You can set the number of workers, batch_size and see the effect it has on throughput/ latency. Also, I noticed that you have a max_batch_delay set to 20. I would increase this . Also, you can add prints in your handler to see how many requests are being batched |
📚 The doc issue
No different time inferent when I change minWorkers, maxWorkers from 1 to 2 or 3. Three different worker give me a same time to handle 1000 request by using multi.thread. This is a bug or I have mistake when setting config.properties?
My config.properties:
My code:
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: