Does Tensor Parallelism support current scaling mode fp8 ? #6101

kuozhang · 2024-10-24T07:35:51Z

kuozhang
Oct 24, 2024

Hello, I checked the code about fp8 (not fp8 communication) and found how it works is to rewrite_op F.linear, but I didn't found out whether Linear1D_Col and Linear1D_Row support fp8, since if use fp8 of current scaling mode, the amax values of sharded weight and activations are needed, and this needs an communication of allreduce(reduceop=max)。

Answered by Edenzzzz

Oct 28, 2024

It seems that Transformers Engine's original implementation doesn't do this. I opened an issue #6105 for discussion.

View full answer

Edenzzzz · 2024-10-28T15:42:41Z

Edenzzzz
Oct 28, 2024

Indeed this seems unimplemented for now. You're welcome to submit a PR or ping other members.

0 replies

Edenzzzz · 2024-10-28T16:29:26Z

Edenzzzz
Oct 28, 2024

It seems that Transformers Engine's original implementation doesn't do this. I opened an issue #6105 for discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Tensor Parallelism support current scaling mode fp8 ? #6101

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Does Tensor Parallelism support current scaling mode fp8 ? #6101

kuozhang Oct 24, 2024

Replies: 2 comments

Edenzzzz Oct 28, 2024

Edenzzzz Oct 28, 2024

kuozhang
Oct 24, 2024

Edenzzzz
Oct 28, 2024

Edenzzzz
Oct 28, 2024