Skip to content

TorchServe v0.11.1 Release Notes

Compare
Choose a tag to compare
@agunapal agunapal released this 18 Jul 18:56
· 46 commits to master since this release
9a0ab89

This is the release of TorchServe v0.11.1.

Highlights Include

  • Security Updates
    • Token Authorization: TorchServe enforces token authorization by default which requires the correct token to be provided when calling a HTTP/S or gRPC API. This is a security feature which addresses the concern of unauthorized API calls. This is applicable in the scenario where an unauthorized user may try to access a running TorchServe instance. The default behavior is to enable this feature which creates a key file with the appropriate tokens to be used for API calls. Users have the option to disable this feature to prevent token authorization from being required for API calls. For more details, refer to the token authorization documentation: https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md
    • Model API Control: TorchServe disables the ability to register and delete models using HTTP/S or gRPC API calls by default once TorchServe is running. This is a security feature which addresses the concern of unintended registration and deletion of models once TorchServe has started. This is applicable in the scenario where a user may upload malicious code to the model server in the form of a model or where a user may delete a model that is being used. The default behavior prevents users from registering or deleting models once TorchServe is running. Model API control can be enabled to allow users to register and delete models using the TorchServe model load and delete APIs. For more details, refer to the model API control documentation: https://github.com/pytorch/serve/blob/master/docs/model_api_control.md
  • PyTorch 2.x updates
    • Standardized torch.compile configuration
    • Added examples for tensorrt & hpu backends
  • GenAI updates
    • Support continuous batching in sequence batch streaming
    • Asynchronous backend worker communication for continuous batching
    • No code LLM deployment
  • Support for Intel GPUs

Security Updates

  • Adding model-control-mode by @udaij12 in #3165
  • Enable Token Authorization by default by @udaij12 in #3163
  • Updating night CIs to account for model control and token auth by @udaij12 in #3188
  • Adding token auth and model api to workflow and https by @udaij12 in #3234
  • Enable token authorization and model control for gRPC by @namannandan in #3238

PyTorch 2.x Updates

GenAI

  • Support continuous batching in sequence batch streaming case by @lxning in #3160
  • GPT-FAST-MIXTRAL-MOE integration by @alex-kharlamov in #3151
  • clean a jobGroup immediately when it finished by @lxning in #3222
  • Asynchronous worker communication and vllm integration by @mreso in #3146
  • Add single command LLM deployment by @mreso in #3209
  • TensorRT-LLM Engine integration by @agunapal in #3228
  • Adds torch.compile documentation to alexnet example readme by @crmdias in #3227

Support for Intel GPUs

Documentation

Improvements and Bug Fixing

New Contributors

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.

GPU Support Matrix

TorchServe version PyTorch version Python Stable CUDA Experimental CUDA
0.11.1 2.3.0 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.11.0 2.3.0 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.10.0 2.2.1 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.9.0 2.1 >=3.8, <=3.11 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26
0.8.0 2.0 >=3.8, <=3.11 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84
0.7.0 1.13 >=3.7, <=3.10 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version PyTorch version Python Neuron SDK
0.11.1 2.1 >=3.8, <=3.11 2.18.2+
0.11.0 2.1 >=3.8, <=3.11 2.18.2+
0.10.0 1.13 >=3.8, <=3.11 2.16+
0.9.0 1.13 >=3.8, <=3.11 2.13.2+