TorchServe v0.9.0 Release Notes
This is the release of TorchServe v0.9.0.
Security
Our security process is documented here
We rely heavily on automation to improve the security of torchserve
namely by
- On a monthly basis updating our
gradle
andpip
dependencies - Docker scanning via Snyk
- Code analysis via CodeQL
A key point to remember is that torchserve
will allow you to configure things in an unsecure way so make sure to read our security docs and relevant security warnings to make sure your product is secure in production. In general we do not encourage you to download untrusted mar files from the internet, running a .mar
file effectively is running arbitrary python code so make sure to unzip mar files and validate whether they are doing anything suspicious.
Code scanning fixes
- Used Sha-256 in ziputils #2629 @msaroufim
- Verified default hostname in Test #2631 @msaroufim
- Fixed zip slip error #2634 @msaroufim
- Used string array as Process arguments input #2632 #2635 @msaroufim
- Enabled Netty HTTP header validation as default #2630 @msaroufim
- Verified 3rd party package installation path #2687 @lxning
- Allowed url validation #2685 @lxning including
- Disabled loading TS_ALLOWED_URLS from env by default.
- Moved the model url validation to last step.
- Sanity check model archive name to guard Uncontrolled data used in path expression
Address configuration updates
- Updated default address from 0.0.0.0 to 127.0.0.1 #2624 #2704 @namannandan @agunapal
- Bind container ports to localhost ports #2646 @namannandan
Documentation improvements
- Updated security readme #2643 #2690 @msaroufim @agunapal
- Updated security guidance in docker readme #2669 @agunapal
Dependency improvements
- Created dependabot.yml #2642 #2675 @msaroufim
- Bumped packaging from 23.1 to 23.2
- Bumped pygit2 from 1.21.1 to 1.13.1
- Bumped com.github.spotbugs from 4.0.2 to 5.1.3
- Bumped ONNX from 1.14.0 to 1.14.1
- Bumped Pillow from 9.3.0 to 10.0.1
- Bumped Bump com.amazonaws:DynamoDBLocal from 1.13.2 to 2.0.0
- Upgraded node to version 18 #2663 @agunapal
Blogs
- High performance Llama 2 deployments with AWS Inferentia2 using TorchServe
- ML Model Server Resource Saving - Transition From High-Cost GPUs to Intel CPUs and oneAPI powered Software with performance
- Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs
New Features
- Support PyTorch 2.1.0 and Python 3.11 #2621 #2691 #2697 @agunapal
- Supported continous batching for LLM inference #2628 @mreso @lxning
- Supported dynamically loading 3rd party package on SageMaker Multi-Model Endpoint #2535 @lxning
- Added DALI handler to handle preprocess and updated Nvidia DALI example #2485 @jagadeeshi2i
New Examples
- Deploy Llama2 on Inferentia2 #2458 @namannandan
- Using TorchServe on SageMaker Inf2.24xlarge with Llama2-13B @lxning
- PyTorch tensor parallel on Llama2 example #2623 #2689 @HamidShojanazeri
- Enabled better transformer (ie. flash attention 2) on Llama2 #2700 @HamidShojanazeri @lxning
- Llama2 Chatbot on Mac #2618 @agunapal
- ASR speech recognition example #2047 @husenzhang
Improvements
- Fixed typo in BaseHandler #2547 @a-ys
- Create merge_queue workflow for CI #2548 @msaroufim
- Fixed typo in artifact terminology unification #2551 @park12sj
- Added env hints in model_service_worker #2540 @ZachOBrien
- Refactor conda build scripts to publish all binaries #2561 @agunapal
- Fixed response return type in KServe #2566 @jagadeeshi2i
- Added torchserve-kfs nightly build #2574 @jagadeeshi2i
- Added regression for all CPU binaries #2562 @agunapal
- Updated CICD runners #2586 #2597 #2636 #2627 #2677 #2710 #2696 @agunapal @msaroufim
- Upgraded newman version to 5.3.2 #2598 #2603 @agunapal
- Updated opt benchmark config for inf2 #2617 @namannandan
- Added ModelRequestEncoderTest #2580 @abergmeier
- Added manually dispatch workflow #2686 @msaroufim
- Updated test wheels with PyTorch 2.1.0 #2684 @agunapal
- Allowed parallel level = 1 to run in torchrun mode #2608 @lxning
- Fixed metric unit assignment backward compatibility #2693 @namannandan
Documentation
- Updated MPS readme #2543 @sekyondaMeta
- Updated large model inference readme #2542 @sekyondaMeta
- Fixed bash snippets in examples/image_classifier/mnist/Docker.md #2345 @dmitsf
- Fixed typo in kubernetes/autoscale.md #2393 @CandiedCode
- Fixed path in examples/image_classifier/resnet_18/README.md #2568 @udaij12
- Model Loading Guidance #2592 @agunapal
- Updated Metrics readme #2560 @sekyondaMeta
- Display nightly workflow status badge in README #2619 #2666 @agunapal @msaroufim
- Update torch.compile information in examples/pt2/README.md #2706 @agunapal
- Deploy model using TorchServe on SageMaker tutorial @lxning
Platform Support
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
GPU Support
Torch 2.1.0 + Cuda 11.8, 12.1
Torch 2.0.1 + Cuda 11.7
Torch 2.0.0 + Cuda 11.7
Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2