Does Tensor Parallelism support current scaling mode fp8 ? #6101
-
Hello, I checked the code about fp8 (not fp8 communication) and found how it works is to rewrite_op F.linear, but I didn't found out whether Linear1D_Col and Linear1D_Row support fp8, since if use fp8 of current scaling mode, the amax values of sharded weight and activations are needed, and this needs an communication of allreduce(reduceop=max)。 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Indeed this seems unimplemented for now. You're welcome to submit a PR or ping other members. |
Beta Was this translation helpful? Give feedback.
-
It seems that Transformers Engine's original implementation doesn't do this. I opened an issue #6105 for discussion. |
Beta Was this translation helpful? Give feedback.
It seems that Transformers Engine's original implementation doesn't do this. I opened an issue #6105 for discussion.