Skip to content

Commit

Permalink
Merge b4f3321 into 9833774
Browse files Browse the repository at this point in the history
  • Loading branch information
sidharthrajaram authored Jun 27, 2023
2 parents 9833774 + b4f3321 commit d713ab4
Show file tree
Hide file tree
Showing 5 changed files with 140 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ And backend is the Python code (most Pytorch specific stuff)

### Backend (Python)

https://github.com/pytorch/serve/blob/master/ts/arg_parser.py#L64
https://github.com/pytorch/serve/blob/master/ts/arg_parser.py

* Arg parser controls config/not workflow and can also setup a model service worker with a custom socket

Expand Down
99 changes: 99 additions & 0 deletions examples/instruction_embedding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# A TorchServe handler for Instructor Embedding models

A simple handler that you can use to serve [Instructor Embedding models](https://instructor-embedding.github.io/) with TorchServe, supporting both single inference and batch inference.

# Setup:

**1.** [Download an Instructor model (i.e. Instructor-XL)](https://huggingface.co/hkunlp/instructor-xl/tree/main?clone=true) from HuggingFace into your model store directory of choosing. Copy the `instructor-embedding-handler.py` into the same directory as your newly downloaded directory containing all the model-related files.

**2.** Create the .MAR Model Archive using [`torch-model-archiver`](https://github.com/pytorch/serve/blob/master/model-archiver/README.md):

```bash
torch-model-archiver --model-name <YOUR_MODEL_NAME_OF_CHOOSING> --version 1.0 --handler PATH/TO/instructor-embedding-handler.py --extra-files <DOWNLOADED_MODEL_DIR> --serialized-file <DOWNLOADED_MODEL_DIR>/pytorch_model.bin --f
```

**3.** Use [TorchServe](https://pytorch.org/serve/server.html) to startup the server and deploy the Instruction Embedding model you downloaded.

**Note:** Instructor Embedding models are around ~4 GB. By default, torchserve will autoscale workers (each with a loaded copy of the model). [At present](https://github.com/pytorch/serve/issues/2432), if you have memory concerns, you have to make use of the [Management API](https://pytorch.org/serve/management_api.html) to bring up the server and deploy your model.


# Performing Inference
To perform inference for an instruction and corresponding sentence, use the following format for the request body:
```text
{
"inputs": [INSTRUCTION, SENTENCE]
}
```

To perform batch inference, use the following format for the request body:
```text
{
"inputs": [
[INSTRUCTION_1, SENTENCE_1],
[INSTRUCTION_2, SENTENCE_2],
...
]
}
```

## Example: Single Inference
Request Endpoint: /predictions/<model_name>

Request Body:
```json
{
"inputs": ["Represent the Science title:", "3D ActionSLAM: wearable person tracking in multi-floor environments"]
}
```

### Response:
```yaml
[
0.010738617740571499,
...
0.10961631685495377
]
```

## Example: Batch Inference
Request Endpoint: /predictions/<model_name>

Request Body:
```json
{
"inputs": [
["Represent the Science title:", "3D ActionSLAM: wearable person tracking in multi-floor environments"],
["Represent the Medicine sentence for retrieving a duplicate sentence:", "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear."]
]
}
```

### Response:
```yaml
[
[
0.010738617740571499,
...
0.10961631685495377
],
[
0.014582153409719467,
...
0.08006688207387924
]
]
```

**Note:** The above request example was for batch inference on 2 distinct instruction/sentence pairs. The output of the batch inference request is two embedding vectors corresponding to the two input pairs (instruction, sentence):

**The first input was:**
["Represent the Science title:", "3D ActionSLAM: wearable person tracking in multi-floor environments"]

**The second input was:**
["Represent the Medicine sentence for retrieving a duplicate sentence:", "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear."]

The response was a list of 2 embedding vectors (numpy arrays converted `.tolist()` to ensure they were JSON serializable) corresponding to each of those inputs. The output vectors are quite long, so ellipses were used to make it more readable.

# Then What?

**Despite being slightly different under the hood compared to more traditional embedding models (i.e. Sentence Transformers), instruction embeddings can be used just like any other embeddings.** They are still just vector representations of your input text. The only difference is that the embedding vectors are *more fine-tuned* to the downstream task described by the instruction. To that end, these outputted embedding vectors can be stored or looked up in a vector database for [use cases](https://www.pinecone.io/learn/vector-embeddings-for-developers/#what-can-i-do-with-vector-embeddings) like semantic search or question answering or long-term memory for large language models. Check out the [Instructor Embedding project page](https://instructor-embedding.github.io/) for more information.
37 changes: 37 additions & 0 deletions examples/instruction_embedding/instructor_embedding_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""
Handler class for Instruction Embedding models (https://instructor-embedding.github.io/)
"""
import logging

from InstructorEmbedding import INSTRUCTOR

from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)


class InstructorEmbeddingHandler(BaseHandler):
"""
Handler class for Instruction Embedding models.
Refer to the README for how to use Instructor models and this handler.
"""

def __init__(self):
super().__init__()
self.initialized = False
self.model = None

def initialize(self, context):
properties = context.system_properties
logger.info("Initializing Instructor Embedding model...")
model_dir = properties.get("model_dir")
self.model = INSTRUCTOR(model_dir)
self.initialized = True

def handle(self, data, context):
inputs = data[0].get("body").get("inputs")
if isinstance(inputs[0], str):
# single inference
inputs = [inputs]
pred_embeddings = self.model.encode(inputs)
return [pred_embeddings.tolist()]
1 change: 1 addition & 0 deletions examples/instruction_embedding/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
InstructorEmbedding
2 changes: 2 additions & 0 deletions ts_scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1060,3 +1060,5 @@ AMI
DLAMI
XLA
inferentia
ActionSLAM
statins

0 comments on commit d713ab4

Please sign in to comment.