Release TorchServe v0.8.2 Release Notes · pytorch/serve

This is the release of TorchServe v0.8.2.

Security

Updated snakeyaml version to v2 #2523 @nskool
Added warning about model allowed urls when default value is applied #2534 @namannandan

Custom metrics backwards compatibility

add_metric is now backwards compatible with versions [< v0.6.1] but the default metric type is inferred to be COUNTER. If the metric is of a different type, it will need to be specified in the call to add_metric as follows:
metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
When upgrading from versions [v0.6.1 - v0.8.1] to v0.8.2, replace the call to add_metric with add_metric_to_cache.
All custom metrics updated in the custom handler will need to be included in the metrics configuration file for them to be emitted by Torchserve. This is shown here.
A detailed upgrade guide is included in the metrics documentation.

New Features

Supported KServe GPRC v2 #2176 @jagadeeshi2i
Supported K8S session affinity #2519 @jagadeeshi2i

New Examples

Example LLama v2 70B chat using HuggingFace Accelerate #2494 @lxning @HamidShojanazeri @agunapal
large model example OPT-6.7B on Inferentia2 #2399 @namannandan
- This example demonstrates how NeuronX compiles the model , detects neuron core availability and runs the inference.
DeepSpeed deferred init with OPT-30B #2419 @agunapal
- This PR added feature deferred model init in OPT-30B example by leveraging DeepSpeed new version. This feature is able to significantly reduce model loading latency.
Torch TensorRT example #2483 @agunapal
- This PR uses Resnet-50 as an example to demonstrate Torch TensorRT.
K8S mnist example using minikube #2323 @agunapal
- This example shows how to use a pre-trained custom MNIST model to performing real time Digit recognition via K8S.
Example for custom metrics #2516 @namannandan
Example for object detection with ultralytics YOLO v8 model #2508 @agunapal

Improvements

Migrated publishing torchserve-plugins-sdk from Maven JCenter to Maven Central #2429 #2422 @namannandan
Fixed download model from S3 presigned URL #2416 @namannandan
Enabled opt-6.7b benchmark on inf2 #2400 @namannandan
Added job Queue Status in describe API #2464 @namannandan
Added add_metric API to be backward compatible #2525 @namannandan
Upgraded nvidia base image version to nvidia/cuda:11.7.1-base-ubuntu20.04 in GPU docker image #2442 @agunapal
Added Docker regression tests in CI #2403 @agunapal
Updated release version #2533 @agunapal
Upgraded default cuda to 11.8 in docker image build #2489 @agunapal
Updated docker nightly build parameters #2493 @agunapal
Added path to save ab benchmark profile graph in benchmark report #2451 @agunapal
Added profile information for benchmark #2470 @agunapal
Fixed manifest null in base handler #2488 @pedrogengo
Fixed batching input in DALI example #2455 @jagadeeshi2i
Fixed metrcis for K8S setup #2473 @jagadeeshi2i
Fixed kserve storage optional package in Dockerfile #2537 @jagadeeshi2i
Fixed typo in ModelConfig.java comments #2506 @arnavmehta7
Fixed netty direct buffer issues in torchserve-plugins-sdk #2511 @marrodion
Fixed typo in ts/context.py comments #2536 @ethankim00
Fixed Server error when gRPC client close connection unexpectedly #2420 @lxning

Documentation

Updated large model documentation #2468 @sekyondaMeta
Updated Sphinx landing page and requirements #2428 #2520 @sekyondaMeta
Updated G analytics in docs #2449 @sekyondaMeta
Added performance checklist in docs #2526 @sekyondaMeta
Added performance guidance in FAQ #2524 @sekyondaMeta
Added instruction for embedding handler examples #2431 @sidharthrajaram
Updated PyPi description #2445 @bryanwweber @agunapal
Updated Better Transformer README #2474 @HamidShojanazeri
Fixed typo in microbatching README #2484 @InakiRaba91
Fixed broken link in kubernetes AKS README #2490 @agunapal
Fixed lint error #2497 @ankithagunapal
Updated instructions for building GPU docker image for ONNX #2435 @agunapal

Platform Support

Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe v0.8.2 Release Notes