Support async copy for TensorFromVector with event #32563

liym27 · 2021-04-26T06:38:47Z

PR types

Performance optimization

PR changes

Others

Describe

Support async copy for TensorFromVector and FillNpuTensorWithConstant with event.

1. 功能：

利用event 机制，在TensorFromVector 和FillNpuTensorWithConstant中支持异步数据拷贝。

2. 收益：

2.1 正确性：保证异步拷贝数据，在拷贝完成前不会被析构

问题：cpu 数据异步拷贝到 npu Tensor 时，在拷贝完成前，cpu 数据被提前析构，导致拷贝结果不正确。
解决方案：
- event 机制保证顺序->"先拷贝再析构"：
  在cpu->npu 拷贝操作后，插入 event 到 stream，数据析构时，查询该 event 是否完成，未完成则延迟析构数据，保证"拷贝完再析构"。
- NPUPinnedAllocator 管理内存，达到”延迟析构“的目的：
  先将 cpu 数据拷贝到一个 cpu tensor，aka npu pinned tensor，然后将 npu pinned tensor 拷贝到 npu tensor。通过 NPUPinnedAllocator 管理 npu pinned tensor 的内存分配和释放。NPUPinnedAllocator 在 npu pinned tensor 对应的 event 完成后，再释放内存。该功能在 [NPU] Support npu pinned allocator #32840 中实现

2.2 性能提升：

异步拷贝比同步拷贝，有性能提升。单机训练 ernie3.0，性能如下表所示：

序号	方案	速度（tokens/s）	性能提升
1	FillNpuTensorWithConstant	19988.26	base
2	async	20545.58	+2.79%
3	async + malloc/free	21706.29	+8.60%
4	async + malloc/free + 避免shape为[1]时cpu tensor拷贝	21759.33	+8.70%
5	async + malloc/free + 避免shape为[1]时cpu tensor拷贝+thread	21466.57	+7.40%

本PR代码是方案4。

方案说明：

FillNpuTensorWithConstant：调用FillNpuTensorWithConstant，实现常量Tensor
async：通过event 机制，异步拷贝数据，保证数据正确性
malloc/free：不用昇腾接口 aclrtMallocHost/aclrtFreeHost，使用c++ malloc/free
避免shape为[1]时cpu tensor拷贝：shape为[1]的变量的拷贝，直接修改tensor 数据，不调用Copy函数
thread：线程批量析构数据

paddle-bot-old · 2021-04-26T06:38:51Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot-old · 2021-04-26T06:38:52Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

paddle/fluid/platform/device_context.cc

zhiqiu · 2021-04-27T11:46:21Z

paddle/fluid/memory/allocation/allocator.h

+    VLOG(4) << " --------- liym27: TopDecoratedAllocator : "
+               " decorated_allocators_.size() = "
+            << decorated_allocators_.size();
+


debug code can be reduced.

paddle/fluid/memory/allocation/naive_best_fit_allocator.cc

zhiqiu

LGTM

zhiqiu · 2021-04-30T03:21:07Z

paddle/fluid/memory/detail/system_allocator.cc

+
+  void* p;
+  // PINNED memory is visible to all NPU contexts.
+  auto result = aclrtMallocHost(&p, size);


Can be replaced with malloc in the next PR.

Thanks. I will fix it next PR

zhhsplendid

LGTM

paddle-bot-old · 2021-05-08T02:36:01Z

Sorry to inform you that 8e5a23b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

zhiqiu

LGTM

Use malloc/free to replace aclrtMalloc/FreeHost Avoid vector->npu_pinned_tensor when numel==1 Polish code Fix compile error

zhiqiu

LGTM

liym27 force-pushed the npu_asyc_copy_with_event branch from fe750df to e23c489 Compare April 26, 2021 07:56

zhiqiu reviewed Apr 27, 2021

View reviewed changes

liym27 force-pushed the npu_asyc_copy_with_event branch 2 times, most recently from 9955b3f to 854a11d Compare April 28, 2021 12:37

liym27 force-pushed the npu_asyc_copy_with_event branch from 854a11d to dff54a0 Compare April 28, 2021 23:14

liym27 force-pushed the npu_asyc_copy_with_event branch from 30da90c to 09c562a Compare April 29, 2021 08:19

liym27 force-pushed the npu_asyc_copy_with_event branch from 09c562a to 8bfd2f8 Compare April 29, 2021 11:15

liym27 force-pushed the npu_asyc_copy_with_event branch from 8bfd2f8 to 52c2ced Compare April 29, 2021 11:45

liym27 force-pushed the npu_asyc_copy_with_event branch from 52c2ced to 1fd121a Compare April 29, 2021 12:37

liym27 force-pushed the npu_asyc_copy_with_event branch from 1fd121a to 67a5fb8 Compare April 29, 2021 13:02

liym27 force-pushed the npu_asyc_copy_with_event branch from 67a5fb8 to 8e5a23b Compare April 30, 2021 00:05

liym27 requested review from zhhsplendid, pangyoki and zhiqiu April 30, 2021 02:58

zhiqiu previously approved these changes Apr 30, 2021

View reviewed changes

liym27 requested a review from phlrain April 30, 2021 07:52

zhhsplendid previously approved these changes Apr 30, 2021

View reviewed changes

liym27 closed this May 10, 2021

liym27 reopened this May 10, 2021

liym27 dismissed stale reviews from zhhsplendid and zhiqiu via c1ba6df May 10, 2021 10:07

liym27 force-pushed the npu_asyc_copy_with_event branch from 8e5a23b to c1ba6df Compare May 10, 2021 10:07

liym27 force-pushed the npu_asyc_copy_with_event branch from c1ba6df to 4dc032d Compare May 10, 2021 10:51

zhiqiu previously approved these changes May 11, 2021

View reviewed changes

Support async copy for TensorFromVector with event

2f6fc70

Use malloc/free to replace aclrtMalloc/FreeHost Avoid vector->npu_pinned_tensor when numel==1 Polish code Fix compile error

liym27 dismissed zhiqiu’s stale review via 2f6fc70 May 12, 2021 06:24

liym27 force-pushed the npu_asyc_copy_with_event branch from 4dc032d to 2f6fc70 Compare May 12, 2021 06:24

zhiqiu approved these changes May 12, 2021

View reviewed changes

liym27 merged commit 85512d6 into PaddlePaddle:develop May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support async copy for TensorFromVector with event #32563

Support async copy for TensorFromVector with event #32563

liym27 commented Apr 26, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 26, 2021

paddle-bot-old bot commented Apr 26, 2021 •

edited

Loading

zhiqiu Apr 27, 2021

liym27 Apr 29, 2021

zhiqiu left a comment

zhiqiu Apr 30, 2021

liym27 Apr 30, 2021

zhhsplendid left a comment

paddle-bot-old bot commented May 8, 2021

zhiqiu left a comment

zhiqiu left a comment

Support async copy for TensorFromVector with event #32563

Support async copy for TensorFromVector with event #32563

Conversation

liym27 commented Apr 26, 2021 • edited Loading

PR types

PR changes

Describe

1. 功能：

2. 收益：

2.1 正确性：保证异步拷贝数据，在拷贝完成前不会被析构

2.2 性能提升：

paddle-bot-old bot commented Apr 26, 2021

paddle-bot-old bot commented Apr 26, 2021 • edited Loading

zhiqiu Apr 27, 2021

Choose a reason for hiding this comment

liym27 Apr 29, 2021

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

zhiqiu Apr 30, 2021

Choose a reason for hiding this comment

liym27 Apr 30, 2021

Choose a reason for hiding this comment

zhhsplendid left a comment

Choose a reason for hiding this comment

paddle-bot-old bot commented May 8, 2021

zhiqiu left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

liym27 commented Apr 26, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 26, 2021 •

edited

Loading