Skip to content

Commit

Permalink
Merge branch 'master' into msaroufim-patch-15
Browse files Browse the repository at this point in the history
  • Loading branch information
msaroufim authored Oct 2, 2023
2 parents 47bc584 + 8dfa6c8 commit 1b8d32b
Show file tree
Hide file tree
Showing 20 changed files with 184 additions and 186 deletions.
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

| Version | Supported |
| ------- | ------------------ |
| 0.7.1 | :white_check_mark: |
| 0.8.2 | :white_check_mark: |


## How we do security
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/config_kf.properties
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
service_envelope=kserve
number_of_netty_threads=32
job_queue_size=1000
job_queue_size=1000
22 changes: 11 additions & 11 deletions docs/batch_inference_with_ts.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* [Introduction](#introduction)
* [Prerequisites](#prerequisites)
* [Batch Inference with TorchServe's default handlers](#batch-inference-with-torchserves-default-handlers)
* [Batch Inference with TorchServe using ResNet-152 model](#batch-inference-with-torchserve-using-resnet-152-model)
* [Batch Inference with TorchServe using ResNet-152 model](#batch-inference-with-torchserve-using-resnet-152-model)
* [Demo to configure TorchServe ResNet-152 model with batch-supported model](#demo-to-configure-torchserve-resnet-152-model-with-batch-supported-model)
* [Demo to configure TorchServe ResNet-152 model with batch-supported model using Docker](#demo-to-configure-torchserve-resnet-152-model-with-batch-supported-model-using-docker)

Expand All @@ -16,7 +16,7 @@ TorchServe was designed to natively support batching of incoming inference reque
because most ML/DL frameworks are optimized for batch requests.
This optimal use of host resources in turn reduces the operational expense of hosting an inference service using TorchServe.

In this document we show an example of how to use batch inference in Torchserve when serving models locally or using docker containers.
In this document we show an example of how to use batch inference in Torchserve when serving models locally or using docker containers.

## Prerequisites

Expand Down Expand Up @@ -54,7 +54,7 @@ requests before this timer time's out, it sends what ever requests that were rec
Let's look at an example using this configuration through management API:

```bash
# The following command will register a model "resnet-152.mar" and configure TorchServe to use a batch_size of 8 and a max batch delay of 50 milliseconds.
# The following command will register a model "resnet-152.mar" and configure TorchServe to use a batch_size of 8 and a max batch delay of 50 milliseconds.
curl -X POST "localhost:8081/models?url=resnet-152.mar&batch_size=8&max_batch_delay=50"
```
Here is an example of using this configuration through the config.properties:
Expand Down Expand Up @@ -97,8 +97,8 @@ First things first, follow the main [Readme](../README.md) and install all the r
```text
$ cat config.properties
...
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
...
$ torchserve --start --model-store model_store
```
Expand Down Expand Up @@ -193,13 +193,13 @@ models={\
}\
}
```
* Then will start Torchserve by passing the config.properties using `--ts-config` flag
* Then will start Torchserve by passing the config.properties using `--ts-config` flag

```bash
torchserve --start --model-store model_store --ts-config config.properties
```
* Verify that TorchServe is up and running

```text
$ curl localhost:8080/ping
{
Expand Down Expand Up @@ -265,9 +265,9 @@ Here, we show how to register a model with batch inference support when serving
* Set the batch `batch_size` and `max_batch_delay` in the config.properties as referenced in the [dockered_entrypoint.sh](../docker/dockerd-entrypoint.sh)

```text
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
Expand All @@ -291,7 +291,7 @@ models={\
./build_image.sh -g -cv cu102
```

* Start serving the model with the container and pass the config.properties to the container
* Start serving the model with the container and pass the config.properties to the container

```bash
docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 --name mar -v /home/ubuntu/serve/model_store:/home/model-server/model-store -v $ path to config.properties:/home/model-server/config.properties pytorch/torchserve:latest-gpu
Expand Down
9 changes: 4 additions & 5 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,16 +81,15 @@ See [Enable SSL](#enable-ssl) to configure HTTPS.
* `inference_address`: Inference API binding address. Default: `http://127.0.0.1:8080`
* `management_address`: Management API binding address. Default: `http://127.0.0.1:8081`
* `metrics_address`: Metrics API binding address. Default: `http://127.0.0.1:8082`
* To run predictions on models on a public IP address, specify the IP address as `0.0.0.0`.
To run predictions on models on a specific IP address, specify the IP address and port.
* To run predictions on models on a specific IP address, specify the IP address and port.

```properties
# bind inference API to all network interfaces with SSL enabled
inference_address=https://0.0.0.0:8443
# bind inference API to localhost with SSL enabled
inference_address=https://127.0.0.1:8443
```

```properties
# bind inference API to private network interfaces
# bind inference API to private network interfaces with SSL enabled
inference_address=https://172.16.1.10:8080
```

Expand Down
6 changes: 3 additions & 3 deletions examples/asr_rnnt_emformer/config.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
Expand Down
10 changes: 5 additions & 5 deletions examples/cloudformation/ec2-asg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Parameters:
Type: String
MinLength: '9'
MaxLength: '18'
Default: '0.0.0.0/0'
Default: '127.0.0.1/0'
AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})
ConstraintDescription: Must be a valid IP CIDR range of the form x.x.x.x/x.
ModelPath:
Expand All @@ -41,7 +41,7 @@ Parameters:
Type: String
MinLength: '9'
MaxLength: '18'
Default: '0.0.0.0/0'
Default: '127.0.0.1/0'
AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})
ConstraintDescription: Must be a valid IP CIDR range of the form x.x.x.x/x.
Mappings:
Expand Down Expand Up @@ -469,9 +469,9 @@ Resources:
files:
/etc/torchserve/config.properties:
content: !Sub |
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
load_models=ALL
model_store=/mnt/efs/model_store
mode: '000400'
Expand Down
6 changes: 3 additions & 3 deletions examples/cloudformation/ec2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -267,9 +267,9 @@ Resources:
files:
/etc/torchserve/config.properties:
content: !Sub |
inference_address=https://0.0.0.0:8080
management_address=https://0.0.0.0:8081
metrics_address=https://0.0.0.0:8082
inference_address=https://127.0.0.1:8080
management_address=https://127.0.0.1:8081
metrics_address=https://127.0.0.1:8082
private_key_file=/etc/torchserve/server.key
certificate_file=/etc/torchserve/server.pem
mode: '000400'
Expand Down
6 changes: 3 additions & 3 deletions examples/diffusers/config.properties
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#Sample config.properties. In production config.properties at /mnt/models/config/config.properties will be used
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
enable_envvars_config=true
install_py_dep_per_model=true
load_models=all
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
enable_envvars_config=true
install_py_dep_per_model=true
number_of_gpu=1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
enable_envvars_config=true
install_py_dep_per_model=true

6 changes: 3 additions & 3 deletions examples/large_models/deepspeed_mii/config.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
enable_envvars_config=true
install_py_dep_per_model=true
load_models=all
Expand Down
Loading

0 comments on commit 1b8d32b

Please sign in to comment.