You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current Setup:
Currently, the TorchServe backend worker process dies whenever it receives an invalid json formatted requests.
Feature Requested:
Instead of killing the backend worker process, TorchServe could handle this failure gracefully within try block, log the failing payload and send back any relevant error code (may be 400?).
Once the worker dies, it takes sometime to start back and any incoming valid requests during that time will be impacted.
This may affect our production services using torch serve.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
🚀 The feature
Current Setup:
Currently, the TorchServe backend worker process dies whenever it receives an invalid json formatted requests.
Feature Requested:
Instead of killing the backend worker process, TorchServe could handle this failure gracefully within try block, log the failing payload and send back any relevant error code (may be 400?).
Test Query: "Blue shirt!
Point of failure within Torchserve:
https://github.com/pytorch/serve/blob/master/ts/protocol/otf_message_handler.py#L314
Failure Traceback:
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - Backend worker process died.
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 210, in
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - worker.run_server()
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 181, in run_server
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 132, in handle_connection
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - cmd, msg = retrieve_msg(cl_socket)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 36, in retrieve_msg
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - msg = _retrieve_inference_msg(conn)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 226, in _retrieve_inference_msg
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - request = _retrieve_request(conn)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 261, in _retrieve_request
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - input_data = _retrieve_input_data(conn)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 314, in _retrieve_input_data
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - model_input["value"] = json.loads(value.decode("utf-8"))
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/json/init.py", line 357, in loads
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - return _default_decoder.decode(s)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - obj, end = self.scan_once(s, idx)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - json.decoder.JSONDecodeError: Invalid control character at: line 2 column 28 (char 29)
2022-07-27T15:30:41,567 [ERROR] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - Unknown exception
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2022-07-27T15:30:41,567 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED
Motivation, pitch
Once the worker dies, it takes sometime to start back and any incoming valid requests during that time will be impacted.
This may affect our production services using torch serve.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: