-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of accuracy when Longformer for SequenceClassification model is exported to ONNX #776
Comments
Great! transferred the issue into |
Thank you, you can refer to the issue linked by @michaelbenayoun , the bottom-line issue is a bug in PyTorch: pytorch/pytorch#90607 As a dirty fix, you can pass the argument The issue is solely related to sequence length. Thus, I would recommend you to check which sequence lengths yield meaningful outputs, as I did here. |
@SteffenHaeussler Actually, I realize we disabled the ONNX export for longformer due to this bug. |
@fxmarty Thanks a lot for your help. So the problem lies in torchs onnx converter and is out of our hands. I tried some cheap tricks (torch->torchscript->onnx conversion), but obviously it doesn't work. For the moment, I copied the trained weights to a BertModel with similar architecture and fine-tuned the new model with my dataset. It looks promising and I will share the code snippet when I'm done - so others in the same situation can hopefully profit from this situation. I don't see at the moment, how I can support you with this bug. Let me know, if there is anything to be done. |
One dirty solution would be to rewrite this operation https://github.com/huggingface/transformers/blob/762dda44deed29baab049aac5324b49f134e7536/src/transformers/models/longformer/modeling_longformer.py#L924 in a way that is rightfully handled by the ONNX export. At the time when I had a look, I did not come up with an elegant solution though (i.e. not using nested loops). |
great. 👍 I can't promise anything, since my work schedule is overloaded, like for anyone else. But I will have a deeper look at it. |
I will close for now as longformer is for now not supported in the ONNX export of Optimum for this exact reason. |
Edit: This is a crosspost to pytorch #94810. I don't know, where the issue lies.
System info
transformers
version: 4.26.1Who can help?
I think
@younesbelkada
would be a great help :)
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This model is trained on client data and I'm not allowed to share the data or the weights, which makes any reproduction of this issue much harder. Please let me know when you need more information.
Here is the code snippet for the onnx conversion:
I follow this tutorial, but I also tried your tutorial. The onnx conversion with optimum is not available for Longformer so far and I haven't figured out yet, how to add it.
conversion:
Calculating the accuracy:
I also looked into the models' weights and the weights for the attention layer differ between torch and onnx. Here is an example:
For the layer longformer.encoder.layer.0.output.dense.weight, which aligns with onnx::MatMul_6692 in shape and position:
I get
Model config:
Expected behavior
I would expect a similar accuracy for both models:
Accuracy onnx: 17 %
Accuracy torch: 70 %
on test data with 3800 samples.
I would like to know what went wrong, how I can fix it, or who can help me. I'm clueless at the moment.
Alternatively I can also move to BigBird architecture since it has already some implementation on optimum.
I trained a small Longformer language model from scratch and fine-tuned it with custom data on a Sequence classification head. I used fp16 for training. The training run on a gpu.
The text was updated successfully, but these errors were encountered: