[NPU] refine update_loss_scaling npu kernel #32580

pangyoki · 2021-04-26T09:33:12Z

PR types

Performance optimization

PR changes

OPs

Describe

use ZerosLike and Memcpy instead of NPUMemsetAsync.

before (use NPUMemsetAsync)

As shown in the timeline, there is a blank correspondding to update_loss_scaling_op caused by NPUMemsetAsync.
update_loss_scaling_op cost about 103 ms.

only use ZerosLike
If only use ZerosLike to replace NPUMemsetAsync.

update_loss_scaling_op will launch many ZerosLike NPU ops.
update_loss_scaling_op cost about 22.2 ms.

In this PR, use ZerosLike and Memcpy

update_loss_scaling_op will launch only 1 ZerosLike NPU op, and then use Memcpy to set tensor to 0.
update_loss_scaling_op cost about 5.5 ms.

Performance

Speed up: 19448 tokens/s -> 20679 tokens/s, +6.33 %

paddle-bot-old · 2021-04-26T09:33:15Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu · 2021-04-26T12:07:30Z

paddle/fluid/operators/amp/update_loss_scaling_op_npu.cc

-        auto g = out->mutable_data<T>(place);
-        platform::NPUMemsetAsync(static_cast<void*>(g), 0,
-                                 out->numel() * sizeof(T), stream);
+        auto runner_zeros = NpuOpRunner("ZerosLike", {*out}, {*out});


mutable_data is needed.

zhiqiu

LGTM

zhiqiu

LGTM

refine update_loss_scaling npu kernel

c29fa34

zhiqiu reviewed Apr 26, 2021

View reviewed changes

pangyoki added 4 commits April 26, 2021 12:41

add mutable_data

d80f2b4

change Zerolike op to MemcpyAsync

696eeb6

delete useless code

cb4fa3d

add found_inf_vec

80eaf7c

zhiqiu previously approved these changes Apr 29, 2021

View reviewed changes

add memcpy if not finite

01a3739

pangyoki dismissed zhiqiu’s stale review via 01a3739 April 30, 2021 08:38

fix unittest

d81794f

zhiqiu approved these changes May 8, 2021

View reviewed changes

phlrain self-requested a review May 8, 2021 06:48

phlrain approved these changes May 8, 2021

View reviewed changes

pangyoki merged commit 4628b6f into PaddlePaddle:develop May 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] refine update_loss_scaling npu kernel #32580

[NPU] refine update_loss_scaling npu kernel #32580

pangyoki commented Apr 26, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 26, 2021

zhiqiu Apr 26, 2021

pangyoki Apr 26, 2021

zhiqiu left a comment

zhiqiu left a comment

[NPU] refine update_loss_scaling npu kernel #32580

[NPU] refine update_loss_scaling npu kernel #32580

Conversation

pangyoki commented Apr 26, 2021 • edited Loading

PR types

PR changes

Describe

Performance

paddle-bot-old bot commented Apr 26, 2021

zhiqiu Apr 26, 2021

Choose a reason for hiding this comment

pangyoki Apr 26, 2021

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

pangyoki commented Apr 26, 2021 •

edited

Loading