Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference worse on onnx runtime and openvino runtime for converted seq2seq models on CPU #754 #188

Closed
1 of 2 tasks
harindercnvrg opened this issue Feb 8, 2023 · 1 comment

Comments

@harindercnvrg
Copy link

harindercnvrg commented Feb 8, 2023

System Info

CPU

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 57 bits virtual
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz
Stepping:            6
CPU MHz:             2200.000
CPU max MHz:         3500.0000
CPU min MHz:         800.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-31,64-95
NUMA node1 CPU(s):   32-63,96-127
python == 3.8.10

Installed packages:

absl-py==1.4.0
aiofiles==22.1.0
aiohttp==3.8.3
aiosignal==1.3.1
aiosqlite==0.18.0
anyio==3.6.2
argcomplete==1.10.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-timeout==4.0.2
attrs==22.2.0
autograd==1.5
azure-core==1.10.0
azure-storage-blob==12.6.0
Babel==2.11.0
backcall==0.2.0
backports.zoneinfo==0.2.1
beautifulsoup4==4.8.2
bleach==6.0.0
boto3==1.26.64
botocore==1.29.64
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
cma==2.7.0
cnvrg==0.7.54
colorama==0.4.6
coloredlogs==15.0.1
comm==0.1.2
compressed-rtf==1.0.6
contourpy==1.0.7
croniter==1.3.8
cryptography==39.0.0
cycler==0.11.0
datasets==2.9.0
debugpy==1.6.6
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.6
docx2txt==0.8
ebcdic==1.1.1
evaluate==0.4.0
executing==1.2.0
extract-msg==0.28.7
fastjsonschema==2.16.2
filelock==3.9.0
flatbuffers==23.1.21
fonttools==4.38.0
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2023.1.0
future==0.18.3
gitdb==4.0.10
GitPython==3.1.30
google-api-core==2.11.0
google-auth==2.16.0
google-auth-oauthlib==0.4.6
google-cloud-core==2.3.2
google-cloud-storage==2.7.0
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.58.0
grpcio==1.51.1
huggingface-hub==0.12.0
humanfriendly==10.0
idna==2.10
IMAPClient==2.1.0
importlib-metadata==6.0.0
importlib-resources==5.10.2
ipykernel==6.21.1
ipython==8.9.0
ipython-genutils==0.2.0
isodate==0.6.1
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.11
jsonpointer==2.3
jsonschema==4.17.3
jstyleson==0.0.2
jupyter-client==8.0.2
jupyter-core==5.2.0
jupyter-events==0.5.0
jupyter-server==2.2.1
jupyter-server-fileid==0.6.0
jupyter-server-mathjax==0.2.6
jupyter-server-terminals==0.4.4
jupyter-server-ydoc==0.6.1
jupyter-ydoc==0.2.2
jupyterlab==3.6.1
jupyterlab-git==0.41.0
jupyterlab-pygments==0.2.2
jupyterlab-server==2.19.0
kiwisolver==1.4.4
lxml==4.9.2
Markdown==3.4.1
MarkupSafe==2.1.2
matplotlib==3.6.3
matplotlib-inline==0.1.6
mistune==2.0.4
mpmath==1.2.1
msrest==0.6.21
multidict==6.0.4
multiprocess==0.70.14
natsort==8.2.0
nbclassic==0.5.1
nbclient==0.7.2
nbconvert==7.2.9
nbdime==3.1.1
nbformat==5.7.3
nest-asyncio==1.5.6
networkx==2.8.2
ninja==1.10.2.4
nncf==2.4.0
notebook==6.5.2
notebook-shim==0.2.2
numpy==1.23.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
olefile==0.46
onnx==1.12.0
onnxruntime==1.12.1
openvino==2022.3.0
openvino-telemetry==2022.3.0
optimum==1.6.3
optimum-intel==1.6.1
packaging==23.0
pandas==1.5.2
pandocfilters==1.5.0
parso==0.8.3
pdfminer.six==20191110
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.4.0
pkgutil-resolve-name==1.3.10
platformdirs==2.6.2
progress==1.6
prometheus-client==0.16.0
prompt-toolkit==3.0.36
protobuf==3.20.1
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==21.10.1
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pycryptodome==3.17
pydot==1.4.2
Pygments==2.14.0
pymoo==0.5.0
pyparsing==2.4.7
pyrsistent==0.19.3
python-dateutil==2.8.2
python-json-logger==2.0.4
python-pptx==0.6.21
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0
pyzmq==25.0.0
regex==2022.10.31
requests==2.28.2
requests-oauthlib==1.3.1
responses==0.18.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rsa==4.9
s3transfer==0.6.0
scikit-learn==1.2.1
scipy==1.10.0
Send2Trash==1.8.0
sentencepiece==0.1.97
six==1.12.0
smmap==5.0.0
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
SpeechRecognition==3.8.1
stack-data==0.6.2
sympy==1.11.1
tensorboard==2.11.2
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
terminado==0.17.1
textract==1.6.5
texttable==1.6.7
threadpoolctl==3.1.0
tinycss2==1.2.1
tinynetrc==1.3.1
tokenizers==0.13.2
tomli==2.0.1
torch==1.13.1
torchvision==0.14.1
tornado==6.2
tqdm==4.64.1
traitlets==5.9.0
transformers==4.26.0
typing-extensions==4.4.0
tzdata==2022.7
tzlocal==4.2
uri-template==1.2.0
urllib3==1.25.11
wcwidth==0.2.6
webcolors==1.12
webencodings==0.5.1
websocket-client==1.5.1
Werkzeug==2.2.2
xlrd==1.2.0
XlsxWriter==3.0.8
xxhash==3.2.0
y-py==0.5.5
yarl==1.8.2
ypy-websocket==0.8.2
zipp==3.12.1

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I converted the summarizer model to onnx and then ran it:

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']

model_id = "google/pegasus-pubmed"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)

I also tried the openvino runtime

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import OVModelForSeq2SeqLM
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")
billsum = billsum.train_test_split(test_size=0.2)
to_summarize = billsum["train"][0]['text']

model_id = "google/pegasus-pubmed"
model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("summarization", model=model, tokenizer=tokenizer)
prediction = pipe(to_summarize)

Expected behavior

This is supposed to provide faster inference than the original pytorch model. Neither the onnx and nor the openvino runtime improve speed, in fact the inference time increases by manifold.

@echarlaix
Copy link
Collaborator

Hi @harindercnvrg,

Thanks for letting us know about this. As you know the support of BigBird architectures export was removed in huggingface/optimum#778 (following huggingface/optimum#754), and won't be supported anymore by optimum-intel for that reason. Support will be added back once huggingface/optimum#754 is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants