Skip to content

Commit

Permalink
Merge branch 'master' into issue_561
Browse files Browse the repository at this point in the history
  • Loading branch information
harshbafna authored Dec 10, 2020
2 parents 2029d1f + 7474e47 commit 0cbae6f
Show file tree
Hide file tree
Showing 3 changed files with 241 additions and 0 deletions.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ name: "\U0001F41B Bug report"
about: Create a report to help us improve

---
Please have a look at [FAQ's](../../docs/FAQs.md) and [Troubleshooting guide](../../docs/Troubleshooting.md), your query may be already addressed.

Your issue may already be reported!
Please search on the [issue tracker](https://github.com/pytorch/serve/issues) before creating one.
Expand Down
147 changes: 147 additions & 0 deletions docs/FAQs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# FAQ'S
Contents of this document.
* [General](#general)
* [Deployment and config](#deployment-and-config)
* [API](#api)
* [Handler](#handler)
* [Model-archiver](#model-archiver)

## General
Relevant documents.
- [Torchserve readme](https://github.com/pytorch/serve#torchserve)

### Does Torchserve API's follow some REST API standard?
Torchserve API's are compliant with the [OpenAPI specification 3.0](https://swagger.io/specification/).

### How to use Torchserve in production?
Depending on your use case, you will be able to deploy torchserve in production using following mechanisms.
> Standalone deployment. Refer https://github.com/pytorch/serve/docker or https://github.com/pytorch/serve/docs/README.md
> Cloud based deployment. Refer https://github.com/pytorch/serve/kubernetes https://github.com/pytorch/serve/cloudformation

### What's difference between Torchserve and a python web app using web frameworks like Flask, Django?
Torchserve's main purpose is to serve models via http REST APIs, Torchserve is not a Flask app and it uses netty engine for serving http requests.

Relevant issues: [[581](https://github.com/pytorch/serve/issues/581),[569](https://github.com/pytorch/serve/issues/569)]

### Are there any sample Models available?
Various models are provided in Torchserve out of the box. Checkout out Torchserve [Model Zoo](https://github.com/pytorch/serve/blob/master/docs/model_zoo.md) for list of all available models. You can also check out the [examples](https://github.com/pytorch/serve/tree/master/examples) folder.

### Does Torchserve support other models based on programming languages other than python?
No, As of now only python based models are supported.

### What benefits does Torchserve have over AWS Multi-Model-Server?
Torchserve is derived from Multi-Model-Server. However, Torchserve is specifically tuned for Pytorch models. It also has new features like Snapshot and model versioning.

## Deployment and config
Relevant documents.
- [Torchserve configuration](https://github.com/pytorch/serve/blob/master/docs/configuration.md)
- [Model zoo](https://github.com/pytorch/serve/blob/master/docs/model_zoo.md#model-zoo)
- [Snapshot](https://github.com/pytorch/serve/blob/master/docs/snapshot.md)
- [Docker]([https://github.com/pytorch/serve/blob/master/docker/README.md](https://github.com/pytorch/serve/blob/master/docker/README.md))

### Can I run Torchserve APIs on ports other than the default 8080 & 8081?
Yes, Torchserve API ports are configurable using a properties file or environment variable.
Refer [configuration.md](https://github.com/pytorch/serve/blob/master/docs/configuration.md) for more details.


### How can I resolve model specific python dependency?
You can provide a requirements.txt while creating a mar file using "--requirements-file/ -r" flag. Also, you can add dependency files using "--extra-files" flag.
Refer [configuration.md](https://github.com/pytorch/serve/blob/master/docs/configuration.md) for more details.

### Can I deploy Torchserve in Kubernetes?
Yes, you can deploy Torchserve in Kubernetes using Helm charts.
Refer [Kubernetes deployment ](https://github.com/pytorch/serve/blob/master/kubernetes/README.md) for more details.

### Can I deploy Torchserve with AWS ELB and AWS ASG?
Yes, you can deploy Torchserve on a multinode ASG AWS EC2 cluster. There is a cloud formation template available [here](https://github.com/pytorch/serve/blob/master/cloudformation/ec2-asg.yaml) for this type of deployment. Refer [ Multi-node EC2 deployment behind Elastic LoadBalancer (ELB)](https://github.com/pytorch/serve/tree/master/cloudformation#multi-node-ec2-deployment-behind-elastic-loadbalancer-elb) more details.

### How can I backup and restore Torchserve state?
TorchServe preserves server runtime configuration across sessions such that a TorchServe instance experiencing either a planned or unplanned service stop can restore its state upon restart. These saved runtime configuration files can be used for backup and restore.
Refer [TorchServe model snapshot](https://github.com/pytorch/serve/blob/master/docs/snapshot.md#torchserve-model-snapshot) for more details.

### How can I build a Torchserve image from source?
Torchserve has a utility [script]([https://github.com/pytorch/serve/blob/master/docker/build_image.sh](https://github.com/pytorch/serve/blob/master/docker/build_image.sh)) for creating docker images, the docker image can be hardware-based CPU or GPU compatible. A Torchserve docker image could be CUDA version specific as well.

All these docker images can be created using `build_image.sh` with appropriate options.

Run `./build_image.sh --help` for all availble options.

Refer [Create Torchserve docker image from source](../docker/README.md#create-torchserve-docker-image-from-source) for more details.

### How to build a Torchserve image for a specific branch or commit id?
To create a Docker image for a specific branch, use the following command:

`./build_image.sh -b <branch_name>/<commit_id>`

To create a Docker image for a specific branch and specific tag, use the following command:

`./build_image.sh -b <branch_name> -t <tagname:latest>`


### What is the difference between image created using Dockerfile and image created using Dockerfile.dev?
The image created using Dockerfile.dev has Torchserve installed from source where as image created using Dockerfile has Torchserve installed from pypi distribution.

## API
Relevant documents
- [Torchserve Rest API](https://github.com/pytorch/serve/blob/master/docs/model_zoo.md#model-zoo)

### What can I use other than *curl* to make requests to Torchserve?
You can use any tool like Postman, Insomnia or even use a python script to do so. Find sample python script [here](https://github.com/pytorch/serve/blob/master/docs/default_handlers.md#torchserve-default-inference-handlers).

### How can I add a custom API to an existing framework?
You can add a custom API using **plugins SDK** available in Torchserve.
Refer to [serving sdk](https://github.com/pytorch/serve/blob/master/serving-sdk) and [plugins](https://github.com/pytorch/serve/blob/master/plugins) for more details.

### How can pass multiple images in Inference request call to my model?
You can provide multiple data in a single inference request to your custom handler as a key-value pair in the `data` object.
Refer [this](https://github.com/pytorch/serve/issues/529#issuecomment-658012913) for more details.

## Handler
Relevant documents
- [Default handlers](https://github.com/pytorch/serve/blob/master/docs/model_zoo.md#model-zoo)
- [Custom Handlers](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-handlers)

### How do I return an image output for a model?
You would have to write a custom handler with the post processing to return image.
Refer [custom service documentation](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-handlers) for more details.

### How to enhance the default handlers?
Write a custom handler that extends the default handler and just override the methods to be tuned.
Refer [custom service documentation](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-handlers) for more details.

### Do I always have to write a custom handler or are there default ones that I can use?
Yes, you can deploy your model with no-code/ zero code by using builtin default handlers.
Refer [default handlers](https://github.com/pytorch/serve/blob/master/docs/default_handlers.md#torchserve-default-inference-handlers) for more details.

### Is it possible to deploy Hugging Face models?
Yes, you can deploy Hugging Face models using a custom handler.
Refer [Huggingface_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md) for example.

## Model-archiver
Relevant documents
- [Model-archiver ](https://github.com/pytorch/serve/tree/master/model-archiver#torch-model-archiver-for-torchserve)
- [Docker Readme](https://github.com/pytorch/serve/blob/master/docker/README.md)

### What is a mar file?
A mar file is a zip file consisting of all model artifacts with the ".mar" extension. The cmd-line utility *torch-model-archiver* is used to create a mar file.

### How can create mar file using Torchserve docker container?
Yes, you create your mar file using a Torchserve container. Follow the steps given [here](https://github.com/pytorch/serve/blob/master/docker/README.md#create-torch-model-archiver-from-container).

### Can I add multiple serialized files in single mar file?
Currently `TorchModelArchiver` allows supplying only one serialized file with `--serialized-file` parameter while creating the mar. However, you can supply any number and any type of file with `--extra-files` flag. All the files supplied in the mar file are available in `model_dir` location which can be accessed through the context object supplied to the handler's entry point.

Sample code snippet:
```
properties = context.system_properties
model_dir = properties.get("model_dir")
```
Refer [Torch model archiver cli](https://github.com/pytorch/serve/blob/master/model-archiver/README.md#torch-model-archiver-command-line-interface) for more details.
Relevant issues: [[#633](https://github.com/pytorch/serve/issues/633)]

### Can I download and register model using s3 presigned v4 url?
You can use both s3 v2 and v4 signature URLs.
Note: For v4 type replace `&` characters in model url with its URL encoding character in the curl command i.e.`%26`.

Relevant issues: [[#669](https://github.com/pytorch/serve/issues/669)]
93 changes: 93 additions & 0 deletions docs/Troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## Troubleshooting guide.
Refer to this section for common issues faced while deploying your Pytorch models using Torchserve and their corresponding troubleshooting steps.

* [Deployment and config issues](#deployment-and-config-issues)
* [Snapshot related issues](#snapshot-related-issues)
* [API related issues](#api-relate-issues)
* [Model-archiver](#model-archiver)


### Deployment and config issues
#### "Failed to bind to address: http://127.0.0.1:8080", port 8080/8081 already in use.
Usually, the port number 8080/8081 is already used by some other application or service, it can be verified by using cmd `ss -ntl | grep 8080`. There are two ways to troubleshoot this issue either kill the process which is using port 8080/8081 or run Torchserve on different ports other than 8080 & 8081.

Refer [configuration.md](https://github.com/pytorch/serve/blob/master/docs/configuration.md) for more details.

Relevant issues: [[542](https://github.com/pytorch/serve/issues/542)]

#### "java.lang.NoSuchMethodError" when starting Torchserve.[[473](https://github.com/pytorch/serve/issues/473)]
This error usually occurs when Java 11 is not installed or used. Java 11 is required by Torchserve and older java versions are not supported.

Relevant issues: [[#473](https://github.com/pytorch/serve/issues/473)]

#### Unable to send big files for inference request?
The default max request size and response size is roughly 6.5 Mb. Hence any file size greater than 6.5mb cannot be uploaded.
To resolve this update `max_request_size` and `max_response_size` in a config.properties file and start the torchserve with this config file.
```
$ cat config.properties
max_request_size=<request size in bytes>
max_response_size=<response size in bytes>
$ torchserve --start --model-store model_store --ts-config /path/to/config.properties
```
You can also use environment variables to set these values.
Refer [configuration.md](https://github.com/pytorch/serve/blob/master/docs/configuration.md) for more details.
Relevant issues: [[#335](https://github.com/pytorch/serve/issues/335)]

### Snapshot related issues
#### How to disable Snapshot feature?
By default, the snapshot feature is enabled. To disable snapshot feature start torchserve using --ncs flag or specify config file using --ts-config path/to/config
Relevant issues:[[#383](https://github.com/pytorch/serve/issues/383), [#512](https://github.com/pytorch/serve/issues/512)]

#### Torchserve stopped after restart with "InvalidSnapshotException" exception.
Torchserve when restarted uses the last snapshot config file to restore its state of models and their number of workers. When "InvalidSnapshotException" is thrown then the model store is in an inconsistent state as compared with the snapshot. To resolve this the snapshot config files can be removed or torchserve can be started with specific config file using --ts-config path/to/config.
Refer [snapshot.md](https://github.com/pytorch/serve/blob/master/docs/snapshot.md) for more details.

#### Where are snapshot config files stored?
The snapshots are by default in `{LOG_LOCATION}\config` directory, where `{LOG_LOCATION}` is a system environment variable that can be used by TorchServe. If this variable is not set, the snapshot is stored in `.\log\config` directory
Refer [snapshot.md](https://github.com/pytorch/serve/blob/master/docs/snapshot.md) for more details.

#### How to change Temp Directory?
You can export TEMP environment variable to the desired path. This path will be used by TorchServe to extract the model-archives content.
Relevant issues: [[#654](https://github.com/pytorch/serve/issues/654)]

### API related issues

#### Register model: Failed with exception "ConflictStatusException" & error code 409.
This gives a clear message that the model we are trying to register conflicts with an already existing model. To resolve this change the model version when creating a mar file or register a model with a different name.
Relevant issues: [[#500](https://github.com/pytorch/serve/issues/500)]

#### Register model: Failed with exception "DownloadModelException" & error code 400.
Torchserve was unable to download the mar file in this case. To resolve this check whether the given URL is accessible publically.

#### Register model: Failed with exception "ModelNotFoundException" & error code 404.
In this case, Torchserve was unable to locate a given mar file in the model store directory. To resolve this check whether the given mar file exists in the model store. Check the mar file name in the POST request to register the model.

#### Inference request: Failed with exception "ServiceUnavailableException" & error code 503.
In this case, the model is registered but there no workers spawned for the model. Use the scale-up API to increase the number of workers. You can verify the number of workers using
`curl -X GET"http://localhost:8081/models/<model_name>"
`
### Model-archiver

#### How can add model specific custom dependency?
You can add your dependency files using `--extra-files` flag while creating a mar file. These dependency files can be of any type like zip, egg, json etc. You may have to write a custom handler to use these files as required.

Relevant issues: [[#566](https://github.com/pytorch/serve/issues/566)]

#### How can I resolve model specific python dependency?
You can provide a requirements.txt while creating a mar file using "--requirements-file/ -r" flag. You can refer to the [waveglow text-to-speech-synthesizer](https://github.com/pytorch/serve/tree/master/examples/text_to_speech_synthesizer) example

- [waveglow mar creation script](https://github.com/pytorch/serve/blob/master/examples/text_to_speech_synthesizer/create_mar.sh)
- [waveglow handler](https://github.com/pytorch/serve/blob/master/examples/text_to_speech_synthesizer/waveglow_handler.py#L57)

Relevant issues: [[#566](https://github.com/pytorch/serve/issues/566)]
Refer [Torch model archiver cli](https://github.com/pytorch/serve/blob/master/model-archiver/README.md#torch-model-archiver-command-line-interface) for more details.

#### I have added requirements.txt in my mar file but the packages listed are not getting installed.
By default model specific custom python packages feature is disabled, enable this by setting install_py_dep_per_model to true.
Refer [Allow model specific custom python packages](https://github.com/pytorch/serve/blob/master/docs/configuration.md#allow-model-specific-custom-python-packages) for more details.


#### Backend worker monitoring thread interrupted or backend worker process died error.
This issue is moslty occurs when the model fails to initialize, which may be due to erroneous code in handler's initialize function. This error is also observed when there is missing package/module.

Relevant issues: [[#667](https://github.com/pytorch/serve/issues/667), [#537](https://github.com/pytorch/serve/issues/537)]

0 comments on commit 0cbae6f

Please sign in to comment.