We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newest version of transformer (v4.41.0) causes the continuous batching pytests to fail.
transformer
2024-05-20T22:21:22,515 [INFO ] W-9000-streaming_handler_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1716243682515 2024-05-20T22:21:22,516 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Backend received inference at: 1716243682 2024-05-20T22:21:22,869 [WARN ] W-9000-streaming_handler_1.0-stderr MODEL_LOG - Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation. 2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Invoking custom service failed. 2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Traceback (most recent call last): 2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/ts/service.py", line 134, in predict 2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - ret = self._entry_point(input_batch, self.context) 2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/ts/torch_handler/base_handler.py", line 431, in handle 2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - output = self.inference(data_preprocess) 2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/tmp/models/9e429ad562f84475ad7b4600b1b092d2/stream_handler.py", line 78, in inference 2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - results[req_id] = self._run_prefill(req_id) 2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0 ACCESS_LOG - /127.0.0.1:35514 "POST /predictions/streaming_handler HTTP/1.1" 503 459 2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - return func(*args, **kwargs) 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/tmp/models/9e429ad562f84475ad7b4600b1b092d2/stream_handler.py", line 98, in _run_prefill 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:fv-az1271-393,timestamp:1716243682 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - output = self.model.generate( 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-20T22:21:22,872 [DEBUG] W-9000-streaming_handler_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 100288593, Inference time ns: 457870956 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - return func(*args, **kwargs) 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 355 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1736, in generate 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:3.0|#Level:Host|#hostname:fv-az1271-393,timestamp:1716243682 2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - result = self._sample( 2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 2368, in _sample 2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs) 2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position 2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - past_length = model_kwargs["past_key_values"][0][0].shape[2] 2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - TypeError: 'NoneType' object is not subscriptable 2024-05-20T22:21:22,881 [INFO ] epollEventLoopGroup-3-3 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:streaming_handler,model_version:default|#hostname:fv-az1271-393,timestamp:1716243682
Noticed on Torchserve binary regression CIs both CPU and GPU
No response
Examples search test_echo_stream_inference in the following CI runs: https://github.com/pytorch/serve/actions/runs/9165678469/job/25199613724 https://github.com/pytorch/serve/actions/runs/9145615468/job/25144773494
test_echo_stream_inference
Temporary Solution is to pin transformers to 4.40.2
transformers
The text was updated successfully, but these errors were encountered:
mreso
lxning
No branches or pull requests
🐛 Describe the bug
Newest version of
transformer
(v4.41.0) causes the continuous batching pytests to fail.Error logs
Installation instructions
Noticed on Torchserve binary regression CIs both CPU and GPU
Model Packaging
Noticed on Torchserve binary regression CIs both CPU and GPU
config.properties
No response
Versions
Noticed on Torchserve binary regression CIs both CPU and GPU
Repro instructions
Examples search
test_echo_stream_inference
in the following CI runs:https://github.com/pytorch/serve/actions/runs/9165678469/job/25199613724
https://github.com/pytorch/serve/actions/runs/9145615468/job/25144773494
Possible Solution
Temporary Solution is to pin
transformers
to 4.40.2The text was updated successfully, but these errors were encountered: