-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Training] Error building gradient graph for bert models for on-device training #22465
Comments
This looks similar to the issue I had and fixed in #22414 . You can verify it's the same issue if you change your loss to crossentropy and see artifact generation succeed. |
@jkbeavers thanks, will try that. By the way, i thought onnxruntime training was going to be deprecated? |
I am also facing a similar issue. Below is the code to reproduce.
constant.py
It didnot work inspite of changing the loss function to CrossEntropyLoss. @jkbeavers @riccardopinosio can you guys help me out of this ? |
@rkoystart I believe 1.20.0 does not have onnxruntime training because it's being deprecated, at least according to this page. I was not sure whether Microsoft plans to support a training flow for onnxruntime going further so I didn't spend any more time on this. |
@riccardopinosio Also hoping that the training flow support was not deprecated forever, there was no explanation added to the release notes. I am suspecting that the error maybe is coming from the output of the model being passed to the onnx defined loss function. Perhaps you could avoid using the onnx loss function and instead define and compute the your own loss within the model before outputting the loss result? |
Describe the issue
Hello,
see also this discussion. I'm opening this one as I think it's an issue as sifting through previous issues training should work for bert models.
I am trying to generate artifacts for distilbert like so:
The exported onnx model works perfectly for inference, but artifact generation throws up:
Seems to have issues building the gradient graph as it gets out of bounds on OutputDefs.
To reproduce
See the code provided above.
Urgency
It's blocking the development of go bindings to onnx training which we want to use in our product.
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.2
PyTorch Version
2.4.1+cu121
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: