Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Batching does not work with newest transformer issue #3148

Open
udaij12 opened this issue May 21, 2024 · 0 comments
Open

Continuous Batching does not work with newest transformer issue #3148

udaij12 opened this issue May 21, 2024 · 0 comments
Assignees

Comments

@udaij12
Copy link
Collaborator

udaij12 commented May 21, 2024

🐛 Describe the bug

Newest version of transformer (v4.41.0) causes the continuous batching pytests to fail.

Error logs

2024-05-20T22:21:22,515 [INFO ] W-9000-streaming_handler_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1716243682515
2024-05-20T22:21:22,516 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Backend received inference at: 1716243682
2024-05-20T22:21:22,869 [WARN ] W-9000-streaming_handler_1.0-stderr MODEL_LOG - Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Invoking custom service failed.
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/ts/service.py", line 134, in predict
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     ret = self._entry_point(input_batch, self.context)
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/ts/torch_handler/base_handler.py", line 431, in handle
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     output = self.inference(data_preprocess)
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/tmp/models/9e429ad562f84475ad7b4600b1b092d2/stream_handler.py", line 78, in inference
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     results[req_id] = self._run_prefill(req_id)
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0 ACCESS_LOG - /127.0.0.1:35514 "POST /predictions/streaming_handler HTTP/1.1" 503 459
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     return func(*args, **kwargs)
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/tmp/models/9e429ad562f84475ad7b4600b1b092d2/stream_handler.py", line 98, in _run_prefill
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:fv-az1271-393,timestamp:1716243682
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     output = self.model.generate(
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-20T22:21:22,872 [DEBUG] W-9000-streaming_handler_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 100288593, Inference time ns: 457870956
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     return func(*args, **kwargs)
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 355
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1736, in generate
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:3.0|#Level:Host|#hostname:fv-az1271-393,timestamp:1716243682
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     result = self._sample(
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 2368, in _sample
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     past_length = model_kwargs["past_key_values"][0][0].shape[2]
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - TypeError: 'NoneType' object is not subscriptable
2024-05-20T22:21:22,881 [INFO ] epollEventLoopGroup-3-3 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:streaming_handler,model_version:default|#hostname:fv-az1271-393,timestamp:1716243682

Installation instructions

Noticed on Torchserve binary regression CIs both CPU and GPU

Model Packaging

Noticed on Torchserve binary regression CIs both CPU and GPU

config.properties

No response

Versions

Noticed on Torchserve binary regression CIs both CPU and GPU

Repro instructions

Examples search test_echo_stream_inference in the following CI runs:
https://github.com/pytorch/serve/actions/runs/9165678469/job/25199613724
https://github.com/pytorch/serve/actions/runs/9145615468/job/25144773494

Possible Solution

Temporary Solution is to pin transformers to 4.40.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants