Inference worse on onnx runtime and openvino runtime for converted seq2seq models on CPU #754 #188

harindercnvrg · 2023-02-08T09:55:00Z

System Info

CPU

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 57 bits virtual
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz
Stepping:            6
CPU MHz:             2200.000
CPU max MHz:         3500.0000
CPU min MHz:         800.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-31,64-95
NUMA node1 CPU(s):   32-63,96-127

python == 3.8.10

Installed packages:

absl-py==1.4.0
aiofiles==22.1.0
aiohttp==3.8.3
aiosignal==1.3.1
aiosqlite==0.18.0
anyio==3.6.2
argcomplete==1.10.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-timeout==4.0.2
attrs==22.2.0
autograd==1.5
azure-core==1.10.0
azure-storage-blob==12.6.0
Babel==2.11.0
backcall==0.2.0
backports.zoneinfo==0.2.1
beautifulsoup4==4.8.2
bleach==6.0.0
boto3==1.26.64
botocore==1.29.64
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
cma==2.7.0
cnvrg==0.7.54
colorama==0.4.6
coloredlogs==15.0.1
comm==0.1.2
compressed-rtf==1.0.6
contourpy==1.0.7
croniter==1.3.8
cryptography==39.0.0
cycler==0.11.0
datasets==2.9.0
debugpy==1.6.6
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.6
docx2txt==0.8
ebcdic==1.1.1
evaluate==0.4.0
executing==1.2.0
extract-msg==0.28.7
fastjsonschema==2.16.2
filelock==3.9.0
flatbuffers==23.1.21
fonttools==4.38.0
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2023.1.0
future==0.18.3
gitdb==4.0.10
GitPython==3.1.30
google-api-core==2.11.0
google-auth==2.16.0
google-auth-oauthlib==0.4.6
google-cloud-core==2.3.2
google-cloud-storage==2.7.0
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.58.0
grpcio==1.51.1
huggingface-hub==0.12.0
humanfriendly==10.0
idna==2.10
IMAPClient==2.1.0
importlib-metadata==6.0.0
importlib-resources==5.10.2
ipykernel==6.21.1
ipython==8.9.0
ipython-genutils==0.2.0
isodate==0.6.1
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.11
jsonpointer==2.3
jsonschema==4.17.3
jstyleson==0.0.2
jupyter-client==8.0.2
jupyter-core==5.2.0
jupyter-events==0.5.0
jupyter-server==2.2.1
jupyter-server-fileid==0.6.0
jupyter-server-mathjax==0.2.6
jupyter-server-terminals==0.4.4
jupyter-server-ydoc==0.6.1
jupyter-ydoc==0.2.2
jupyterlab==3.6.1
jupyterlab-git==0.41.0
jupyterlab-pygments==0.2.2
jupyterlab-server==2.19.0
kiwisolver==1.4.4
lxml==4.9.2
Markdown==3.4.1
MarkupSafe==2.1.2
matplotlib==3.6.3
matplotlib-inline==0.1.6
mistune==2.0.4
mpmath==1.2.1
msrest==0.6.21
multidict==6.0.4
multiprocess==0.70.14
natsort==8.2.0
nbclassic==0.5.1
nbclient==0.7.2
nbconvert==7.2.9
nbdime==3.1.1
nbformat==5.7.3
nest-asyncio==1.5.6
networkx==2.8.2
ninja==1.10.2.4
nncf==2.4.0
notebook==6.5.2
notebook-shim==0.2.2
numpy==1.23.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
olefile==0.46
onnx==1.12.0
onnxruntime==1.12.1
openvino==2022.3.0
openvino-telemetry==2022.3.0
optimum==1.6.3
optimum-intel==1.6.1
packaging==23.0
pandas==1.5.2
pandocfilters==1.5.0
parso==0.8.3
pdfminer.six==20191110
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.4.0
pkgutil-resolve-name==1.3.10
platformdirs==2.6.2
progress==1.6
prometheus-client==0.16.0
prompt-toolkit==3.0.36
protobuf==3.20.1
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==21.10.1
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pycryptodome==3.17
pydot==1.4.2
Pygments==2.14.0
pymoo==0.5.0
pyparsing==2.4.7
pyrsistent==0.19.3
python-dateutil==2.8.2
python-json-logger==2.0.4
python-pptx==0.6.21
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0
pyzmq==25.0.0
regex==2022.10.31
requests==2.28.2
requests-oauthlib==1.3.1
responses==0.18.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rsa==4.9
s3transfer==0.6.0
scikit-learn==1.2.1
scipy==1.10.0
Send2Trash==1.8.0
sentencepiece==0.1.97
six==1.12.0
smmap==5.0.0
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
SpeechRecognition==3.8.1
stack-data==0.6.2
sympy==1.11.1
tensorboard==2.11.2
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
terminado==0.17.1
textract==1.6.5
texttable==1.6.7
threadpoolctl==3.1.0
tinycss2==1.2.1
tinynetrc==1.3.1
tokenizers==0.13.2
tomli==2.0.1
torch==1.13.1
torchvision==0.14.1
tornado==6.2
tqdm==4.64.1
traitlets==5.9.0
transformers==4.26.0
typing-extensions==4.4.0
tzdata==2022.7
tzlocal==4.2
uri-template==1.2.0
urllib3==1.25.11
wcwidth==0.2.6
webcolors==1.12
webencodings==0.5.1
websocket-client==1.5.1
Werkzeug==2.2.2
xlrd==1.2.0
XlsxWriter==3.0.8
xxhash==3.2.0
y-py==0.5.5
yarl==1.8.2
ypy-websocket==0.8.2
zipp==3.12.1

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I converted the summarizer model to onnx and then ran it:

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']

model_id = "google/pegasus-pubmed"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)

I also tried the openvino runtime

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import OVModelForSeq2SeqLM
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']

model_id = "google/pegasus-pubmed"
model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)

Expected behavior

This is supposed to provide faster inference than the original pytorch model. Neither the onnx and nor the openvino runtime improve speed, in fact the inference time increases by manifold.

The text was updated successfully, but these errors were encountered:

echarlaix · 2023-03-02T18:22:33Z

Hi @harindercnvrg,

Thanks for letting us know about this. As you know the support of BigBird architectures export was removed in huggingface/optimum#778 (following huggingface/optimum#754), and won't be supported anymore by optimum-intel for that reason. Support will be added back once huggingface/optimum#754 is solved.

harindercnvrg mentioned this issue Feb 9, 2023

Support bigbird ONNX export with attention_type == "block_sparse" huggingface/optimum#754

Open

4 tasks

echarlaix closed this as completed Mar 31, 2023

tsmith023 mentioned this issue Jun 8, 2023

OVModelForSeq2SeqLM with Helsinki-NLP/opus-mt-es-en has slow inference times when exported to OpenVino #339

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference worse on onnx runtime and openvino runtime for converted seq2seq models on CPU #754 #188

Inference worse on onnx runtime and openvino runtime for converted seq2seq models on CPU #754 #188

harindercnvrg commented Feb 8, 2023 •

edited

Loading

echarlaix commented Mar 2, 2023

Inference worse on onnx runtime and openvino runtime for converted seq2seq models on CPU #754 #188

Inference worse on onnx runtime and openvino runtime for converted seq2seq models on CPU #754 #188

Comments

harindercnvrg commented Feb 8, 2023 • edited Loading

System Info

Tasks

Reproduction

Expected behavior

echarlaix commented Mar 2, 2023

harindercnvrg commented Feb 8, 2023 •

edited

Loading