TorchServe v0.8.2 Release Notes
This is the release of TorchServe v0.8.2.
Security
- Updated snakeyaml version to v2 #2523 @nskool
- Added warning about model allowed urls when default value is applied #2534 @namannandan
Custom metrics backwards compatibility
add_metric
is now backwards compatible with versions [< v0.6.1] but the default metric type is inferred to beCOUNTER
. If the metric is of a different type, it will need to be specified in the call toadd_metric
as follows:
metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
- When upgrading from versions [v0.6.1 - v0.8.1] to v0.8.2, replace the call to
add_metric
withadd_metric_to_cache
. - All custom metrics updated in the custom handler will need to be included in the metrics configuration file for them to be emitted by Torchserve. This is shown here.
- A detailed upgrade guide is included in the metrics documentation.
New Features
- Supported KServe GPRC v2 #2176 @jagadeeshi2i
- Supported K8S session affinity #2519 @jagadeeshi2i
New Examples
-
Example LLama v2 70B chat using HuggingFace Accelerate #2494 @lxning @HamidShojanazeri @agunapal
-
large model example OPT-6.7B on Inferentia2 #2399 @namannandan
- This example demonstrates how NeuronX compiles the model , detects neuron core availability and runs the inference.
-
DeepSpeed deferred init with OPT-30B #2419 @agunapal
- This PR added feature
deferred model init
in OPT-30B example by leveraging DeepSpeed new version. This feature is able to significantly reduce model loading latency.
- This PR added feature
-
Torch TensorRT example #2483 @agunapal
- This PR uses Resnet-50 as an example to demonstrate Torch TensorRT.
-
K8S mnist example using minikube #2323 @agunapal
- This example shows how to use a pre-trained custom MNIST model to performing real time Digit recognition via K8S.
-
Example for custom metrics #2516 @namannandan
-
Example for object detection with ultralytics YOLO v8 model #2508 @agunapal
Improvements
- Migrated publishing torchserve-plugins-sdk from Maven JCenter to Maven Central #2429 #2422 @namannandan
- Fixed download model from S3 presigned URL #2416 @namannandan
- Enabled opt-6.7b benchmark on inf2 #2400 @namannandan
- Added job Queue Status in describe API #2464 @namannandan
- Added add_metric API to be backward compatible #2525 @namannandan
- Upgraded nvidia base image version to
nvidia/cuda:11.7.1-base-ubuntu20.04
in GPU docker image #2442 @agunapal - Added Docker regression tests in CI #2403 @agunapal
- Updated release version #2533 @agunapal
- Upgraded default cuda to 11.8 in docker image build #2489 @agunapal
- Updated docker nightly build parameters #2493 @agunapal
- Added path to save ab benchmark profile graph in benchmark report #2451 @agunapal
- Added profile information for benchmark #2470 @agunapal
- Fixed manifest null in base handler #2488 @pedrogengo
- Fixed batching input in DALI example #2455 @jagadeeshi2i
- Fixed metrcis for K8S setup #2473 @jagadeeshi2i
- Fixed kserve storage optional package in Dockerfile #2537 @jagadeeshi2i
- Fixed typo in ModelConfig.java comments #2506 @arnavmehta7
- Fixed netty direct buffer issues in torchserve-plugins-sdk #2511 @marrodion
- Fixed typo in ts/context.py comments #2536 @ethankim00
- Fixed Server error when gRPC client close connection unexpectedly #2420 @lxning
Documentation
- Updated large model documentation #2468 @sekyondaMeta
- Updated Sphinx landing page and requirements #2428 #2520 @sekyondaMeta
- Updated G analytics in docs #2449 @sekyondaMeta
- Added performance checklist in docs #2526 @sekyondaMeta
- Added performance guidance in FAQ #2524 @sekyondaMeta
- Added instruction for embedding handler examples #2431 @sidharthrajaram
- Updated PyPi description #2445 @bryanwweber @agunapal
- Updated Better Transformer README #2474 @HamidShojanazeri
- Fixed typo in microbatching README #2484 @InakiRaba91
- Fixed broken link in kubernetes AKS README #2490 @agunapal
- Fixed lint error #2497 @ankithagunapal
- Updated instructions for building GPU docker image for ONNX #2435 @agunapal
Platform Support
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
GPU Support
Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2