-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with low accuracy exp and expm1 ops in tt-mlir #1199
Comments
What is the intention of this issue, to track it on our side? Should we open an issue on the tt-metal side for investigation? |
The idea is just to track the issue on our side for now, and if accuracy becomes a problem in any model bringup, reference this issue and open one on the tt-metal side for investigation. It also helps folks with e2e tests on our side, as for example, I had tests failing for accuracy, which prompted me to do this little investigation of accuracy. It is important to note that I did have a quick chat with tt-metal folks, and they are aware that they utilise a low-precision exp kernel. However, the original more precise one was slower, so they settled on this compromise. As we develop tt-mlir more, maybe this compromise should be revisited. |
Thanks for clarifying this. As I understand well this is currently not an actionable issue on our side, hence I am not sure in what component to put this issue right now. Any suggestions? |
Yes, you are correct that this is not an actionable issue currently, not sure about where to put it. Maybe @mrakitaTT has some ideas about the place of this issue in the overall effort? |
During the testing of the expm1 op, I noticed major discrepancies between the results of the exp and expm1 ops ran on CPU and Tenstorrent silicon. This was the cause for widening the atol tolerance in the
test_exp_op
test in thett-xla
repo (https://github.com/tenstorrent/tt-xla/blob/main/tests/TTIR/test_basic_ops.py).Running some examples for exp op, I get:
e^1: real value - 2.7182; tt-mlir: 2.6864; tt-metal(ttnn): 2.6719
e^2: real value - 7.3890; tt-mlir: 7.2165; tt-metal(ttnn): 7.1875
Small differences between tt-mlir and tt-metal could be attributed to the differences in environments (as I ran ttnn with python, and tt-mlir with ttrt). However, they both vary significantly from the real value. This leads to some of our tests needing relatively large margins of error, and could pose future problems as we onboards models to MLIR.
The text was updated successfully, but these errors were encountered: