Contents of this document.
Relevant documents.
Torchserve API's are compliant with the OpenAPI specification 3.0.
Depending on your use case, you will be able to deploy torchserve in production using following mechanisms.
Standalone deployment. Refer TorchServe docker documentation or TorchServe documentation Cloud based deployment. Refer TorchServe kubernetes documentation or TorchServe cloudformation documentation
Torchserve's main purpose is to serve models via http REST APIs, Torchserve is not a Flask app and it uses netty engine for serving http requests.
Various models are provided in Torchserve out of the box. Checkout out Torchserve Model Zoo for list of all available models. You can also check out the examples folder.
No, As of now only python based models are supported.
Torchserve is derived from Multi-Model-Server. However, Torchserve is specifically tuned for Pytorch models. It also has new features like Snapshot and model versioning.
By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode.
If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py
Relevant documents.
CPU performance is heavily influenced by launcher core pinning. We recommend setting the following properties in your config.properties
:
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core
More background on improving CPU performance can be found in this blog post.
Relevant documents.
Yes, Torchserve API ports are configurable using a properties file or environment variable. Refer to configuration for more details.
You can provide a requirements.txt
while creating a mar file using "--requirements-file/ -r" flag. Also, you can add dependency files using "--extra-files" flag.
Refer to configuration for more details.
Yes, you can deploy Torchserve in Kubernetes using Helm charts. Refer Kubernetes deployment for more details.
Yes, you can deploy Torchserve on a multi-node ASG AWS EC2 cluster. There is a cloud formation template available here for this type of deployment. Refer Multi-node EC2 deployment behind Elastic LoadBalancer (ELB) more details.
TorchServe preserves server runtime configuration across sessions such that a TorchServe instance experiencing either a planned or unplanned service stop can restore its state upon restart. These saved runtime configuration files can be used for backup and restore. Refer to TorchServe model snapshot for more details.
Torchserve has a utility script for creating docker images, the docker image can be hardware-based CPU or GPU compatible. A Torchserve docker image could be CUDA version specific as well.
All these docker images can be created using build_image.sh
with appropriate options.
Run ./build_image.sh --help
for all available options.
Refer to Create Torchserve docker image from source for more details.
To create a Docker image for a specific branch, use the following command:
./build_image.sh -b <branch_name>/<commit_id>
To create a Docker image for a specific branch and specific tag, use the following command:
./build_image.sh -b <branch_name> -t <tagname:latest>
What is the difference between image created using Dockerfile and image created using Dockerfile.dev?
The image created using Dockerfile.dev has Torchserve installed from source where as image created using Dockerfile has Torchserve installed from PyPi distribution.
TorchServe looks for the config.property file according to the order listed in the doc. There is no override mechanism.
-
model_store: A mandatory argument during TorchServe start. It can be either defined in config.property or overridden by TorchServe command line option "--model-store".
-
load_models: An optional argument during TorchServe start. It can be either defined in config.property or overridden by TorchServe command line option "--models".
-
models: Defines a list of models' configuration in config.property. A model's configuration can be overridden by management API. It does not decide which models will be loaded during TorchServe start. There is no relationship b.w "models" and "load_models" (ie. TorchServe command line option --models).
Relevant documents
You can use any tool like Postman, Insomnia or even use a python script to do so. Find sample python script here.
You can add a custom API using plugins SDK available in Torchserve. Refer to serving sdk and plugins for more details.
You can provide multiple data in a single inference request to your custom handler as a key-value pair in the data
object.
Refer to this issue for more details.
Relevant documents
You would have to write a custom handler and modify the postprocessing to return the image Refer to custom service documentation for more details.
Write a custom handler that extends the default handler and just override the methods to be tuned. Refer to custom service documentation for more details.
Yes, you can deploy your model with no-code/ zero code by using builtin default handlers. Refer to default handlers for more details.
Yes, you can deploy Hugging Face models using a custom handler. Refer to HuggingFace_Transformers for example.
Relevant documents
A mar file is a zip file consisting of all model artifacts with the ".mar" extension. The cmd-line utility torch-model-archiver
is used to create a mar file.
Yes, you create your mar file using a Torchserve container. Follow the steps given here.
Currently torch-model-archiver
allows supplying only one serialized file with --serialized-file
parameter while creating the mar. However, you can supply any number and any type of file with --extra-files
flag. All the files supplied in the mar file are available in model_dir
location which can be accessed through the context object supplied to the handler's entry point.
Sample code snippet:
properties = context.system_properties
model_dir = properties.get("model_dir")
Refer Torch model archiver cli for more details. Relevant issues: [#633]
You can use both s3 v2 and v4 signature URLs.
Note: For v4 type replace &
characters in model url with its URL encoding character in the curl command i.e.%26
.
Relevant issues: [#669]
A mar file can be used either locally or be publicly available via http. An S3 URI starting with s3:// will not work but that very same file can be made public and available in the s3 console or aws cli to instead get a public object URL starting with https://
TorchServe performance tuning example
There's a few reasons why model initialization can be slow
torch.load()
overhead - not something we can improve, this will be more dramatic for larger models- CUDA context launch overhead - not something we can control
- install_py_dep_per_model=true is intended for local development or sagemaker deployments, in other production environment you should pre install your dependencies
- The model archiver has an overhead to compress and decompress models, the compression is on by default because historically torchserve came out of sagemaker needs which involve loading and unloading tons of models stored in cloud buckets. But for users with smaller deployments choosing
torch-model-archiver --no-archive
is a good bet