Issues with low accuracy exp and expm1 ops in tt-mlir #1199

ajakovljevicTT · 2024-11-08T09:28:17Z

During the testing of the expm1 op, I noticed major discrepancies between the results of the exp and expm1 ops ran on CPU and Tenstorrent silicon. This was the cause for widening the atol tolerance in the test_exp_op test in the tt-xla repo (https://github.com/tenstorrent/tt-xla/blob/main/tests/TTIR/test_basic_ops.py).

Running some examples for exp op, I get:
e^1: real value - 2.7182; tt-mlir: 2.6864; tt-metal(ttnn): 2.6719
e^2: real value - 7.3890; tt-mlir: 7.2165; tt-metal(ttnn): 7.1875

Small differences between tt-mlir and tt-metal could be attributed to the differences in environments (as I ran ttnn with python, and tt-mlir with ttrt). However, they both vary significantly from the real value. This leads to some of our tests needing relatively large margins of error, and could pose future problems as we onboards models to MLIR.

The text was updated successfully, but these errors were encountered:

sdjordjevicTT · 2024-11-08T10:18:08Z

What is the intention of this issue, to track it on our side? Should we open an issue on the tt-metal side for investigation?

ajakovljevicTT · 2024-11-08T10:28:03Z

What is the intention of this issue, to track it on our side? Should we open an issue on the tt-metal side for investigation?

The idea is just to track the issue on our side for now, and if accuracy becomes a problem in any model bringup, reference this issue and open one on the tt-metal side for investigation. It also helps folks with e2e tests on our side, as for example, I had tests failing for accuracy, which prompted me to do this little investigation of accuracy.

It is important to note that I did have a quick chat with tt-metal folks, and they are aware that they utilise a low-precision exp kernel. However, the original more precise one was slower, so they settled on this compromise. As we develop tt-mlir more, maybe this compromise should be revisited.

sdjordjevicTT · 2024-11-08T12:39:39Z

What is the intention of this issue, to track it on our side? Should we open an issue on the tt-metal side for investigation?

The idea is just to track the issue on our side for now, and if accuracy becomes a problem in any model bringup, reference this issue and open one on the tt-metal side for investigation. It also helps folks with e2e tests on our side, as for example, I had tests failing for accuracy, which prompted me to do this little investigation of accuracy.

It is important to note that I did have a quick chat with tt-metal folks, and they are aware that they utilise a low-precision exp kernel. However, the original more precise one was slower, so they settled on this compromise. As we develop tt-mlir more, maybe this compromise should be revisited.

Thanks for clarifying this. As I understand well this is currently not an actionable issue on our side, hence I am not sure in what component to put this issue right now. Any suggestions?

ajakovljevicTT · 2024-11-08T15:25:57Z

What is the intention of this issue, to track it on our side? Should we open an issue on the tt-metal side for investigation?

The idea is just to track the issue on our side for now, and if accuracy becomes a problem in any model bringup, reference this issue and open one on the tt-metal side for investigation. It also helps folks with e2e tests on our side, as for example, I had tests failing for accuracy, which prompted me to do this little investigation of accuracy.
It is important to note that I did have a quick chat with tt-metal folks, and they are aware that they utilise a low-precision exp kernel. However, the original more precise one was slower, so they settled on this compromise. As we develop tt-mlir more, maybe this compromise should be revisited.

Thanks for clarifying this. As I understand well this is currently not an actionable issue on our side, hence I am not sure in what component to put this issue right now. Any suggestions?

Yes, you are correct that this is not an actionable issue currently, not sure about where to put it. Maybe @mrakitaTT has some ideas about the place of this issue in the overall effort?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with low accuracy exp and expm1 ops in tt-mlir #1199

Issues with low accuracy exp and expm1 ops in tt-mlir #1199

ajakovljevicTT commented Nov 8, 2024

sdjordjevicTT commented Nov 8, 2024 •

edited

Loading

ajakovljevicTT commented Nov 8, 2024 •

edited

Loading

sdjordjevicTT commented Nov 8, 2024

ajakovljevicTT commented Nov 8, 2024

Issues with low accuracy exp and expm1 ops in tt-mlir #1199

Issues with low accuracy exp and expm1 ops in tt-mlir #1199

Comments

ajakovljevicTT commented Nov 8, 2024

sdjordjevicTT commented Nov 8, 2024 • edited Loading

ajakovljevicTT commented Nov 8, 2024 • edited Loading

sdjordjevicTT commented Nov 8, 2024

ajakovljevicTT commented Nov 8, 2024

sdjordjevicTT commented Nov 8, 2024 •

edited

Loading

ajakovljevicTT commented Nov 8, 2024 •

edited

Loading