Continuous Batching does not work with newest transformer issue #3148

udaij12 · 2024-05-21T01:16:16Z

🐛 Describe the bug

Newest version of transformer (v4.41.0) causes the continuous batching pytests to fail.

Error logs

2024-05-20T22:21:22,515 [INFO ] W-9000-streaming_handler_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1716243682515
2024-05-20T22:21:22,516 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Backend received inference at: 1716243682
2024-05-20T22:21:22,869 [WARN ] W-9000-streaming_handler_1.0-stderr MODEL_LOG - Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Invoking custom service failed.
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/ts/service.py", line 134, in predict
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     ret = self._entry_point(input_batch, self.context)
2024-05-20T22:21:22,870 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/ts/torch_handler/base_handler.py", line 431, in handle
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     output = self.inference(data_preprocess)
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/tmp/models/9e429ad562f84475ad7b4600b1b092d2/stream_handler.py", line 78, in inference
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     results[req_id] = self._run_prefill(req_id)
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0 ACCESS_LOG - /127.0.0.1:35514 "POST /predictions/streaming_handler HTTP/1.1" 503 459
2024-05-20T22:21:22,871 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     return func(*args, **kwargs)
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/tmp/models/9e429ad562f84475ad7b4600b1b092d2/stream_handler.py", line 98, in _run_prefill
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:fv-az1271-393,timestamp:1716243682
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     output = self.model.generate(
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-20T22:21:22,872 [DEBUG] W-9000-streaming_handler_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 100288593, Inference time ns: 457870956
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     return func(*args, **kwargs)
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 355
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1736, in generate
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:3.0|#Level:Host|#hostname:fv-az1271-393,timestamp:1716243682
2024-05-20T22:21:22,872 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     result = self._sample(
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 2368, in _sample
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -   File "/usr/share/miniconda/envs/test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG -     past_length = model_kwargs["past_key_values"][0][0].shape[2]
2024-05-20T22:21:22,873 [INFO ] W-9000-streaming_handler_1.0-stdout MODEL_LOG - TypeError: 'NoneType' object is not subscriptable
2024-05-20T22:21:22,881 [INFO ] epollEventLoopGroup-3-3 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:streaming_handler,model_version:default|#hostname:fv-az1271-393,timestamp:1716243682

Installation instructions

Noticed on Torchserve binary regression CIs both CPU and GPU

Model Packaging

Noticed on Torchserve binary regression CIs both CPU and GPU

config.properties

No response

Versions

Noticed on Torchserve binary regression CIs both CPU and GPU

Repro instructions

Examples search test_echo_stream_inference in the following CI runs:
https://github.com/pytorch/serve/actions/runs/9165678469/job/25199613724
https://github.com/pytorch/serve/actions/runs/9145615468/job/25144773494

Possible Solution

Temporary Solution is to pin transformers to 4.40.2

The text was updated successfully, but these errors were encountered:

udaij12 assigned mreso and lxning May 21, 2024

udaij12 mentioned this issue May 21, 2024

Fixing Regression test CI GPU and CPU #3147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous Batching does not work with newest transformer issue #3148

Continuous Batching does not work with newest transformer issue #3148

udaij12 commented May 21, 2024

Continuous Batching does not work with newest transformer issue #3148

Continuous Batching does not work with newest transformer issue #3148

Comments

udaij12 commented May 21, 2024

🐛 Describe the bug

Error logs

Installation instructions

Model Packaging

config.properties

Versions

Repro instructions

Possible Solution