Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle invalid json formatted requests without killing the worker process #1776

Closed
duk0011 opened this issue Aug 1, 2022 · 1 comment · Fixed by #1789
Closed

Handle invalid json formatted requests without killing the worker process #1776

duk0011 opened this issue Aug 1, 2022 · 1 comment · Fixed by #1789
Labels
enhancement New feature or request p2 low priority

Comments

@duk0011
Copy link

duk0011 commented Aug 1, 2022

🚀 The feature

Current Setup:
Currently, the TorchServe backend worker process dies whenever it receives an invalid json formatted requests.

Feature Requested:
Instead of killing the backend worker process, TorchServe could handle this failure gracefully within try block, log the failing payload and send back any relevant error code (may be 400?).

Test Query: "Blue shirt!

Point of failure within Torchserve:
https://github.com/pytorch/serve/blob/master/ts/protocol/otf_message_handler.py#L314

Failure Traceback:
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - Backend worker process died.
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 210, in
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - worker.run_server()
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 181, in run_server
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/model_service_worker.py", line 132, in handle_connection
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - cmd, msg = retrieve_msg(cl_socket)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 36, in retrieve_msg
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - msg = _retrieve_inference_msg(conn)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 226, in _retrieve_inference_msg
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - request = _retrieve_request(conn)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 261, in _retrieve_request
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - input_data = _retrieve_input_data(conn)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.8/site-packages/ts/protocol/otf_message_handler.py", line 314, in _retrieve_input_data
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - model_input["value"] = json.loads(value.decode("utf-8"))
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/json/init.py", line 357, in loads
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - return _default_decoder.decode(s)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - obj, end = self.scan_once(s, idx)
2022-07-27T15:30:41,172 [INFO ] W-9000-el1001_1.0-stdout MODEL_LOG - json.decoder.JSONDecodeError: Invalid control character at: line 2 column 28 (char 29)
2022-07-27T15:30:41,567 [ERROR] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - Unknown exception
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2022-07-27T15:30:41,567 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED

Motivation, pitch

Once the worker dies, it takes sometime to start back and any incoming valid requests during that time will be impacted.
This may affect our production services using torch serve.

Alternatives

No response

Additional context

No response

@lxning lxning added enhancement New feature or request p2 low priority labels Aug 2, 2022
@lxning
Copy link
Collaborator

lxning commented Aug 2, 2022

@duk0011 Thank you for the report. I add this into TS backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request p2 low priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants