Stream compute on pinned memory #8486

Flowingsun007 · 2022-06-24T13:52:41Z

新增StreamRole::kPinMemory类型
新增Visitor类型：IsStreamPinned
新增StreamType类型：PinnedEpStreamType
移除之前OpExprInterpContext、EagerBlobObject、eager interpreter里pin_memory相关的设计和特判，改用StreamRole进行实现
修改pin_memory相关的empty、cast、copy op实现，增加InferDeviceAndStream推导逻辑
删除了oneflow/oneflow/core/device/cpu_device_context.h，因为没有地方使用到了

github-actions · 2022-06-24T13:54:43Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions · 2022-06-24T14:05:15Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

…Inc/oneflow into stream_compute_on_pinned_memory

oneflow/core/vm/ep_stream_type.cpp

oneflow/core/common/stream_role.h

oneflow/core/eager/blob_instruction_type.h

oneflow/core/framework/tensor_impl.cpp

oneflow/core/vm/ep_stream_type.cpp

…Inc/oneflow into stream_compute_on_pinned_memory

lixinqi · 2022-06-27T07:35:54Z

oneflow/user/ops/empty_op.cpp

+  if (pin_memory) {
+    CHECK_OR_RETURN(out_device->type() == "cpu")
+        << "empty op only support pin_memory in cpu device but got " << out_device->type();
+    auto pin_device = JUST(Device::New("cuda"));


写TODO注释

lixinqi · 2022-06-27T07:36:09Z

oneflow/user/ops/cast_op.cpp

+  if (pin_memory) {
+    CHECK_OR_RETURN(in_device->type() == "cpu")
+        << "cast op only support pin_memory in cpu device but got " << in_device->type();
+    auto pin_device = JUST(Device::New("cuda"));


lixinqi · 2022-06-27T07:36:47Z

oneflow/user/ops/copy_op.cpp

  } else if (in_device->type() == "cpu" && out_device->type() != "cpu") {
    const auto device = JUST(Device::New(out_device->type(), out_device->device_id()));
    return Stream::New(device, StreamRole::kHost2Device);
+  } else if (in_device->type() == "cpu" && out_device->type() == "cpu" && pin_memory) {
+    auto pin_device = JUST(Device::New("cuda"));


github-actions · 2022-06-27T18:27:39Z

CI failed when running job: Build cu102. PR label automerge has been removed

…Inc/oneflow into stream_compute_on_pinned_memory

github-actions · 2022-06-28T02:45:21Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8486/

github-actions · 2022-06-28T02:54:16Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.6ms (= 12957.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.5ms (= 14345.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.5ms / 129.6ms)

OneFlow resnet50 time: 75.5ms (= 7551.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.8ms (= 8577.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 85.8ms / 75.5ms)

OneFlow resnet50 time: 49.3ms (= 9855.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.3ms (= 12059.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 60.3ms / 49.3ms)

OneFlow resnet50 time: 39.2ms (= 7844.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.5ms (= 8900.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.13 (= 44.5ms / 39.2ms)

OneFlow resnet50 time: 36.8ms (= 7370.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.6ms (= 7710.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.05 (= 38.6ms / 36.8ms)

OneFlow swin dataloader time: 0.264s (= 52.738s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.150s / 200, num_workers=1)
Relative speed: 0.572 (= 0.151s / 0.264s)

OneFlow swin dataloader time: 0.074s (= 14.863s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.442s / 200, num_workers=4)
Relative speed: 0.568 (= 0.042s / 0.074s)

OneFlow swin dataloader time: 0.040s (= 8.034s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.419s / 200, num_workers=8)
Relative speed: 0.550 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 145.5ms (= 14550.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 170.7ms (= 17067.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 170.7ms / 145.5ms)

OneFlow resnet50 time: 93.6ms (= 9357.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.6ms (= 11261.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 112.6ms / 93.6ms)

OneFlow resnet50 time: 70.4ms (= 14083.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 85.7ms (= 17138.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 85.7ms / 70.4ms)

OneFlow resnet50 time: 57.5ms (= 11508.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.4ms (= 16470.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 82.4ms / 57.5ms)

OneFlow resnet50 time: 52.4ms (= 10482.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.4ms (= 14277.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 71.4ms / 52.4ms)

Flowingsun007 added 2 commits June 24, 2022 10:26

raw impl

d2e367d

restruct

51df8ad

Flowingsun007 requested review from hjchen2, BBuf, jackalcooper, lixinqi, chengtbf, strint and daquexian as code owners June 24, 2022 13:52

Merge branch 'master' into stream_compute_on_pinned_memory

7fc2c50

Flowingsun007 added eager op refactor labels Jun 24, 2022

Flowingsun007 requested a review from oneflow-ci-bot June 24, 2022 13:53

Flowingsun007 added the enhancement label Jun 24, 2022

auto format by CI

2e33799

Flowingsun007 and others added 2 commits June 24, 2022 14:03

refine

0430209

auto format by CI

81eafcd

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 24, 2022 14:10

Flowingsun007 added 2 commits June 24, 2022 14:14

rm cpu_device_context.h

d207ded

Merge branch 'stream_compute_on_pinned_memory' of github.com:Oneflow-…

af210da

…Inc/oneflow into stream_compute_on_pinned_memory

Flowingsun007 commented Jun 24, 2022

View reviewed changes

oneflow/core/vm/ep_stream_type.cpp Outdated Show resolved Hide resolved

Flowingsun007 removed the request for review from oneflow-ci-bot June 24, 2022 14:26

refine

a345bec

Flowingsun007 requested a review from oneflow-ci-bot June 24, 2022 15:01

lixinqi reviewed Jun 24, 2022

View reviewed changes

merge master

4b52034

Flowingsun007 added 5 commits June 27, 2022 14:35

Merge branch 'stream_compute_on_pinned_memory' of github.com:Oneflow-…

eb2c583

…Inc/oneflow into stream_compute_on_pinned_memory

default use cuda device

12bb45e

include ep_backend_host_allocator.h

217af23

add todo

6c28fe1

remove check

bcb668c

lixinqi approved these changes Jun 27, 2022

View reviewed changes

add todo

ce4f9f5

Flowingsun007 removed the need-clean-ccache label Jun 27, 2022

Merge branch 'master' into stream_compute_on_pinned_memory

ce8f2a0

BBuf approved these changes Jun 27, 2022

View reviewed changes

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 27, 2022 09:21

Flowingsun007 added the need-highest-priority Only add this when you really need it!!! Will block all other PRs. label Jun 27, 2022

BBuf removed the need-highest-priority Only add this when you really need it!!! Will block all other PRs. label Jun 27, 2022

lixinqi removed the request for review from oneflow-ci-bot June 27, 2022 17:27

lixinqi added the automerge label Jun 27, 2022

lixinqi requested a review from oneflow-ci-bot June 27, 2022 17:36

github-actions bot removed the automerge label Jun 27, 2022

Flowingsun007 added 3 commits June 28, 2022 08:42

Merge branch 'master' into stream_compute_on_pinned_memory

0ee59a0

fix .td

62fcf14

Merge branch 'stream_compute_on_pinned_memory' of github.com:Oneflow-…

306e1ae

…Inc/oneflow into stream_compute_on_pinned_memory

Flowingsun007 added the need-highest-priority Only add this when you really need it!!! Will block all other PRs. label Jun 28, 2022

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 28, 2022 01:14

Merge branch 'master' into stream_compute_on_pinned_memory

11dd77a

Flowingsun007 merged commit 41fc1ad into master Jun 28, 2022

Flowingsun007 deleted the stream_compute_on_pinned_memory branch June 28, 2022 03:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream compute on pinned memory #8486

Stream compute on pinned memory #8486

Flowingsun007 commented Jun 24, 2022 •

edited

Loading

github-actions bot commented Jun 24, 2022

github-actions bot commented Jun 24, 2022

lixinqi Jun 27, 2022

lixinqi Jun 27, 2022

lixinqi Jun 27, 2022

Flowingsun007 Jun 27, 2022

github-actions bot commented Jun 27, 2022

github-actions bot commented Jun 28, 2022

github-actions bot commented Jun 28, 2022

Stream compute on pinned memory #8486

Stream compute on pinned memory #8486

Conversation

Flowingsun007 commented Jun 24, 2022 • edited Loading

github-actions bot commented Jun 24, 2022

github-actions bot commented Jun 24, 2022

lixinqi Jun 27, 2022

Choose a reason for hiding this comment

lixinqi Jun 27, 2022

Choose a reason for hiding this comment

lixinqi Jun 27, 2022

Choose a reason for hiding this comment

Flowingsun007 Jun 27, 2022

Choose a reason for hiding this comment

github-actions bot commented Jun 27, 2022

github-actions bot commented Jun 28, 2022

github-actions bot commented Jun 28, 2022

Flowingsun007 commented Jun 24, 2022 •

edited

Loading