Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
sekyondaMeta authored May 10, 2023
2 parents 0d8cebf + fbcde1d commit 12f7d48
Show file tree
Hide file tree
Showing 3 changed files with 2 additions and 53 deletions.
2 changes: 1 addition & 1 deletion docs/large_model_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ torch-model-archiver --model-name bloom --version 1.0 --handler deepspeed_handle
* Pre-install the model parallel library such as Deepspeed on the container or host.
* Pre-download the model checkpoints. For example, if using HuggingFace pretrained model can be pre-downloaded via [Download_model.py](https://github.com/pytorch/serve/blob/75f66dc557b3b67a3ab56536a37d7aa21582cc04/examples/large_models/deepspeed/opt/Readme.md?plain=1#L7)
* Set environment variable [HUGGINGFACE_HUB_CACHE](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#understand-caching) and [TRANSFORMERS_CACHE](https://huggingface.co/transformers/v4.0.1/installation.html#caching-models)
* Download model to the HuggingFace cache dir via tool [Download_model.py](https://github.com/pytorch/serve/blob/75f66dc557b3b67a3ab56536a37d7aa21582cc04/examples/large_models/deepspeed/opt/Readme.md?plain=1#L7)
* Download model to the HuggingFace cache dir via tool [Download_model.py](https://github.com/pytorch/serve/blob/4fe5273cd6f98fb5abc570f802b402ac32ecd105/examples/large_models/Huggingface_pippy/Readme.md?plain=1#L20)

#### Tune "[responseTimeout](https://github.com/pytorch/serve/blob/5ee02e4f050c9b349025d87405b246e970ee710b/docs/configuration.md?plain=1#L216)" (see [model config YAML file](https://github.com/pytorch/serve/blob/5ee02e4f050c9b349025d87405b246e970ee710b/model-archiver/README.md?plain=1#L164)) if high model loading or inference latency causes response timeout.

Expand Down
2 changes: 1 addition & 1 deletion examples/large_models/deepspeed/opt/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This document briefs on serving large HG models on multiple GPUs using deepspeed
### Step 1: Download model

```bash
python ../../utils/Download_models.py --model_path model --model_name facebook/opt-350m --revision main
python ../../utils/Download_model.py --model_path model --model_name facebook/opt-350m --revision main
```

The script prints the path where the model is downloaded as below.
Expand Down
51 changes: 0 additions & 51 deletions examples/large_models/utils/Download_models.py

This file was deleted.

0 comments on commit 12f7d48

Please sign in to comment.