GPU deadlock for pytorch models using the python wrapper #1662

parthshah86 · 2020-04-04T20:16:46Z

We are using seldon-core to serve the PyTorch BERT model based on https://github.com/huggingface/transformers using the python wrapper serving option. We have the Java version of the seldon operator enabled.
When there are multiple concurrent request to the model server, we observe that all the request timeout and on the seldon serving pod, GPU and CPU utilization goes to 100%. Even after we kill the requests the utilization does not go down.

Logs from seldon-container-engine:

org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://localhost:9000/predict": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out

This issue seems to be related to pytorch/pytorch#22259, where PyTorch can have GPU deadlock with multiple requests.

With the simple python wrapper it will be useful to expose a threading param to control the no of concurrent threads.

Seldon Version - 1.0.2
python version - 3.6

The text was updated successfully, but these errors were encountered:

axsaucedo · 2020-04-06T08:48:23Z

@parthshah86 this is interesting, it would not be too complex to expose the ability to control the threading parameter on the Python Wrapper level, but the Service Orchestrator (both Java and Go optoins) are still going to continue running requests in parallel, so it would be good to make sure this can be achieve end-to-end. If you are using an ingress provider like Ambassador, you may also be able to leverage the "circuit breaker" functionality that is being added through #1661, which would ensure a cap on parallel requests end-to-end. Having said that, even with the circuit-breaker in place, it does sound like it will be key to also make sure the component that is coordinating the requests have the correct logic to also limit the number of concurrent requests being sent.

axsaucedo · 2020-04-06T08:50:15Z

One more thing to mention is that we have been thinking of potentially creating a pre-packaged model server specifically for hugging face transformers and/or models, so it would be interesting to see what your python wrapper looks like as we could perhaps generalise it into a pre-packaged model server

seldondev · 2020-05-10T08:54:50Z

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

axsaucedo · 2020-07-15T14:30:02Z

This is now resolved as the seldon model server now allows running as single thread

parthshah86 · 2020-07-20T01:18:12Z

Sorry, somehow missed this message. We did make a patch in seldon-core to support single-threaded model server support.

Interesting to know that you are thinking about supporting native hugging face transformer server.
On our side, our model server takes an s3 path and the model server loads the model, though there can be logic to batch, tokenize the examples as it is passed into the model, and some post-processing to clean up the results.

Thanks for the response @axsaucedo !!

axsaucedo · 2020-07-20T04:37:49Z

Great to hear it has been resolved @parthshah86 - we have also released batch capabilities (which may be disjiont to what you mentioned above if you were referring to multibatching) https://docs.seldon.io/projects/seldon-core/en/latest/servers/batch.html

parthshah86 added the triage Needs to be triaged and prioritised accordingly label Apr 4, 2020

ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Apr 9, 2020

seldondev added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 10, 2020

axsaucedo closed this as completed Jul 15, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU deadlock for pytorch models using the python wrapper #1662

GPU deadlock for pytorch models using the python wrapper #1662

parthshah86 commented Apr 4, 2020 •

edited

Loading

axsaucedo commented Apr 6, 2020

axsaucedo commented Apr 6, 2020

seldondev commented May 10, 2020

axsaucedo commented Jul 15, 2020 •

edited

Loading

parthshah86 commented Jul 20, 2020

axsaucedo commented Jul 20, 2020

GPU deadlock for pytorch models using the python wrapper #1662

GPU deadlock for pytorch models using the python wrapper #1662

Comments

parthshah86 commented Apr 4, 2020 • edited Loading

axsaucedo commented Apr 6, 2020

axsaucedo commented Apr 6, 2020

seldondev commented May 10, 2020

axsaucedo commented Jul 15, 2020 • edited Loading

parthshah86 commented Jul 20, 2020

axsaucedo commented Jul 20, 2020

parthshah86 commented Apr 4, 2020 •

edited

Loading

axsaucedo commented Jul 15, 2020 •

edited

Loading