-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support bigbird ONNX export with attention_type == "block_sparse" #754
Comments
Thank you, for OpenVINO, could you open an issue in https://github.com/huggingface/optimum-intel ? For ONNX Runtime, I suspect it is the same issue as #753 . |
I have raised the issue. |
@harindercnvrg Could you provide a reproduction script and the result of I would recommend as well to try on Optimum |
@fxmarty the system info provided at the top of the issue is from the lscpu command. I have also provided reproduction script at the bottom of the issue raised. |
Thanks @harindercnvrg my bad I missed the lscpu. I meant a reproduction script with time measured, the scripts above only run inference. So that I can try to reproduce the issue on my side. |
Orginal code:
Using ONNX
Using Openvino
|
Thank you! I can reproduce the issue on
Note that when using
This issue likely comes from there: the example input provided during the ONNX export is too short, hence registering the wrong controlflows that are slow for long sequences (as the one in the benchmark). Thank you for notifying, will fix! |
Hi @harindercnvrg , I investigated the issue a bit, and there is a critical issue for the ONNX export of BigBird, given that part of BigBird block sparse attention is written in numpy and pure python. Up to now, BigBird was solely exported using I worked a bit on rewritting BigBird to be pure PyTorch, which goes fine, but I am now hitting the issue that torch.onnx.export being extremely slow exporting in the block sparse attention case. For now, I would recommend you to stick with the PyTorch implementation, or maybe Tensorflow XLA one if you find. |
Hi @harindercnvrg , we will remove the support of bigbird and bigbird-pegasus in the ONNX export in #778 due to this issue. A large chunk of bigbird's implementation in transformers is written in numpy and pure pytorch that makes it unfit for the ONNX export. I tried to rewrite it as pure PyTorch, which succeeded, but then the export becomes prohibitively slow. If you would like to have a look and manage to solve the issue, you can start from: huggingface/transformers@main...fxmarty:transformers:use-torch-bigbird |
System Info
Installed packages:
Who can help?
@lewtun @michaelbenayoun
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I converted the summarizer model to onnx and then ran it:
I also tried the openvino runtime
Expected behavior
This is supposed to provide faster inference than the original pytorch model. Neither the onnx and nor the openvino runtime improve speed, in fact the inference time increases by manifold.
The text was updated successfully, but these errors were encountered: