Skip to content

Commit

Permalink
Merge pull request #154 from fursovia/token_type_ids_bug
Browse files Browse the repository at this point in the history
Token type ids bug
  • Loading branch information
ayoub-louati authored Jan 27, 2023
2 parents d6c1f21 + c882eab commit f6dde83
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 12 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ First, clone the repo as some commands below expect to find the `demo` folder:
git clone git@github.com:ELS-RD/transformer-deploy.git
cd transformer-deploy
# docker image may take a few minutes
docker pull ghcr.io/els-rd/transformer-deploy:0.5.3
docker pull ghcr.io/els-rd/transformer-deploy:0.5.4
```

### Classification/reranking (encoder model)
Expand All @@ -77,7 +77,7 @@ This will optimize models, generate Triton configuration and Triton folder layou

```shell
docker run -it --rm --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && \
convert_model -m \"philschmid/MiniLM-L6-H384-uncased-sst2\" \
--backend tensorrt onnx \
Expand Down Expand Up @@ -147,7 +147,7 @@ This will optimize models, generate Triton configuration and Triton folder layou

```shell
docker run -it --rm --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && \
convert_model -m \"kamalkraj/bert-base-cased-ner-conll2003\" \
--backend tensorrt onnx \
Expand Down Expand Up @@ -212,7 +212,7 @@ This will optimize models, generate Triton configuration and Triton folder layou

```shell
docker run -it --rm --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && \
convert_model -m \"distilbert-base-cased-distilled-squad\" \
--backend tensorrt onnx \
Expand Down Expand Up @@ -280,7 +280,7 @@ a version >= V2.2.0 of sentence-transformers library.

```shell
docker run -it --rm --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && \
convert_model -m \"sentence-transformers/msmarco-distilbert-cos-v5\" \
--backend tensorrt onnx \
Expand Down Expand Up @@ -341,7 +341,7 @@ One point to have in mind is that Triton run:

```shell
docker run -it --rm --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && \
convert_model -m gpt2 \
--backend tensorrt onnx \
Expand Down Expand Up @@ -371,7 +371,7 @@ To optimize models which typically don't fit twice onto a single GPU, run the sc

```shell
docker run -it --rm --shm-size=24g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && \
convert_model -m gpt2-medium \
--backend tensorrt onnx \
Expand Down Expand Up @@ -425,7 +425,7 @@ You may want to tweak it regarding your needs (default is set for greedy search
You may be interested in running optimized text generation on Python directly, without using any inference server:

```shell
docker run -p 8888:8888 -v $PWD/demo/generative-model:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
docker run -p 8888:8888 -v $PWD/demo/generative-model:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root"
```

Expand All @@ -440,7 +440,7 @@ It makes it easy to use.
To play with it, open this notebook:

```shell
docker run -p 8888:8888 -v $PWD/demo/quantization:/project ghcr.io/els-rd/transformer-deploy:0.5.3 \
docker run -p 8888:8888 -v $PWD/demo/quantization:/project ghcr.io/els-rd/transformer-deploy:0.5.4 \
bash -c "cd /project && jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root"
```

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.5.3
0.5.4
7 changes: 7 additions & 0 deletions src/transformer_deploy/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,13 @@ def main(commands: argparse.Namespace):
else:
raise Exception(f"unknown task: {commands.task}")

if hasattr(model_config, "type_vocab_size") and model_config.type_vocab_size == 0:
try:
input_names.remove("token_type_ids")
logging.warning("Model doesn't have `token_type_ids`, removing them from `input_names`")
except ValueError:
pass

logging.info(f"axis: {input_names}")

model_pytorch.eval()
Expand Down
12 changes: 10 additions & 2 deletions src/transformer_deploy/utils/python_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@
except ImportError:
pass # triton_python_backend_utils exists only inside Triton Python backend.

from transformers import AutoTokenizer, BatchEncoding, PreTrainedTokenizer, TensorType
from transformers import AutoConfig, AutoTokenizer, BatchEncoding, PreTrainedTokenizer, TensorType


class TritonPythonModel:
tokenizer: PreTrainedTokenizer
model_input_names: List[str]

def initialize(self, args: Dict[str, str]) -> None:
"""
Expand All @@ -44,6 +45,13 @@ def initialize(self, args: Dict[str, str]) -> None:

path: str = str(Path(args["model_repository"]).parent.absolute())
self.tokenizer = AutoTokenizer.from_pretrained(path)
model_config = AutoConfig.from_pretrained(path)
self.model_input_names = self.tokenizer.model_input_names
if hasattr(model_config, "type_vocab_size") and model_config.type_vocab_size == 0:
try:
self.model_input_names.remove("token_type_ids")
except ValueError:
pass

def execute(self, requests) -> "List[List[pb_utils.Tensor]]":
"""
Expand All @@ -63,7 +71,7 @@ def execute(self, requests) -> "List[List[pb_utils.Tensor]]":
tokens_dict = {k: v.astype(np.int32) for k, v in tokens.items()}
# communicate the tokenization results to Triton server
outputs = list()
for input_name in self.tokenizer.model_input_names:
for input_name in self.model_input_names:
tensor_input = pb_utils.Tensor(input_name, tokens_dict[input_name])
outputs.append(tensor_input)

Expand Down

0 comments on commit f6dde83

Please sign in to comment.