TorchServe v0.8.1 Release Notes
This is the release of TorchServe v0.8.1.
New Features
- Supported microbatch in handler to parallel process a batch request from frontend. #2210 @mreso
Because pre- and post- processing are often carried out on the CPU the GPU sits idle until the two CPU bound steps are executed and the worker receives a new batch. Microbatch in handler is able to parallel process inference, pre- and post- processing for a batch request from frontend.
- Supported job ticket #2350 @lxning
This feature help with use cases where inference latency can be high, such as generative models, auto regressive decoder models like chatGPT. Applications can take effective actions, for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements.
New Examples
- Notebook example of TorchServe on SageMaker MME(multiple model endpoint). @lxning
This example demonstrates creative content assisted by generative AI by using TorchServe on SageMaker MME.
Improvements
-
Upgraded to PyTorch 2.0.1 #2374 @namannandan
-
Significant reduction in Docker Image Size
- Reduce GPU docker image size by 3GB #2392 @agunapal
- Reduced dependency installation time and decrease docker image size #2364 @mreso
GPU pytorch/torchserve 0.8.1-gpu 04eef250c14e 4 hours ago 2.34GB pytorch/torchserve 0.8.0-gpu 516bb13a3649 4 weeks ago 5.86GB pytorch/torchserve 0.6.0-gpu fb6d4b85847d 12 months ago 2.13GB
CPU pytorch/torchserve 0.8.1-cpu 68a3fcae81af 4 hours ago 662MB pytorch/torchserve 0.8.0-cpu 958ef6dacea2 4 weeks ago 2.37GB pytorch/torchserve 0.6.0-cpu af91330a97bd 12 months ago 496MB
-
Updated CPU information for IPEX #2372 @min-jean-cho
-
Fixed inf2 example handler #2378 @namannandan
-
Added inf2 nightly benchmark #2283 @namannandan
-
Fixed archiver tgz format model directory structure mismatch on SageMaker #2405 @lxning
-
Fixed model archiver to fail if extra files are missing #2212 @mreso
-
Fixed device type setting in model config yaml #2408 @lxning
-
Fixed batchsize in config.properties not honored #2382 @lxning
-
Upgraded torchrun argument names and fixed backend tcp port connection #2377 @lxning
-
Fixed error thrown while loading multiple models in KServe #2235 @jagadeeshi2i
-
Fixed KServe fastapi migration issues #2175 @jagadeeshi2i
-
Added type annotation in model_server.py #2384 @josephcalise
-
Speed up unit test by removing sleep in start/stop torchserve #2383 @mreso
-
Enabled ONNX CI test #2363 @msaroufim
-
Removed session_mocker usage to prevent test cross talking #2375 @mreso
-
Enabled regression test in CI #2370 @msaroufim
-
Fixed regression test failures #2371 @namannandan
-
Bump up transformers version from 4.28.1 to 4.30.0 #2410
Documentation
-
Fixed links in FAQ #2351 @sekyondaMeta
-
Fixed broken links in index.md #2329 @sekyondaMeta
Platform Support
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
GPU Support
Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2