Fix gather kernel check #7979

BBuf · 2022-04-07T13:49:12Z

给gather op加了index check，使它的值限制在指定维度的长度中，对齐报错信息。

oneflow/user/kernels/dim_gather_kernel_util.h

MARD1NO · 2022-04-08T02:59:19Z

oneflow/user/kernels/dim_gather_kernel_util.h

  XPU_1D_KERNEL_LOOP(index_offset, elem_cnt) {
    IDX_T coordinate[kDimGatherMaxDimCount] = {0};
    const IDX_T x = index[index_offset];
+#ifdef WITH_CUDA


问了下juncheng，这里应该是 __CUDA_ARCH__

这里的区别是什么？

doombeaker · 2022-04-08T03:21:19Z

oneflow/user/kernels/dim_gather_kernel_util.h

  XPU_1D_KERNEL_LOOP(index_offset, elem_cnt) {
    IDX_T coordinate[kDimGatherMaxDimCount] = {0};
    const IDX_T x = index[index_offset];
+#ifdef __CUDA_ARCH__
+    assert(x < dim_length && "gather index is out of bounds");


Suggested change

assert(x < dim_length && "gather index is out of bounds");

assert(x < dim_length); // gather index is out of bounds

是不是用注释就可以了，这个字符串的作业也只是注释作用？

好像不太一样，如果触发了这个检查会直接在命令行上抛出这个错误。

不会吧，assert应该是没有打印的能力的。

这里的字符串只是一个表达式，表达式的值是一个有效的指针地址(总不为0)。所以assert到底值是多少，完全取决于dim_length，所以这个字符串我觉得就只是注释用了。

另外我搜了下代码里，.cu 文件里的assert都没这样加字符串的

doombeaker · 2022-04-08T03:24:04Z

oneflow/user/kernels/dim_gather_kernel_util.cu

@@ -24,30 +24,30 @@ namespace user_op {
 template<typename IN_T, typename IDX_T>
 __global__ void DoCUDADimGather(const DimOpIndexNdHelper<IDX_T> input_nd_helper,
                                const DimOpIndexNdHelper<IDX_T> index_nd_helper, int ndim,
-                                int64_t elem_cnt, int32_t dim, const IDX_T* index,
+                                int64_t elem_cnt, int64_t dim_length, int32_t dim, const IDX_T* index,


Suggested change

int64_t elem_cnt, int64_t dim_length, int32_t dim, const IDX_T* index,

int64_t elem_cnt, int64_t dim_length, int32_t dim, const IDX_T* index,

int64_t dim_length 如果用 int32_t 也够的话，最好用 int32_t ，给 global kernel 的传参瘦身是很重要的。
不过我不确定 int32_t 是否够用，你更能准确判断吧。

好的，我改回int32_t

github-actions · 2022-04-11T09:34:46Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7979/

github-actions · 2022-04-13T08:14:12Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7979/

github-actions · 2022-04-13T08:28:40Z

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions · 2022-04-13T13:11:12Z

CI failed when running job: cuda-benchmark. PR label automerge has been removed

github-actions · 2022-04-16T09:29:01Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.7ms (= 12871.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.0ms (= 13997.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.0ms / 128.7ms)

OneFlow resnet50 time: 80.4ms (= 8039.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.9ms (= 8391.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 83.9ms / 80.4ms)

OneFlow resnet50 time: 52.6ms (= 10516.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.5ms (= 11091.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.05 (= 55.5ms / 52.6ms)

OneFlow resnet50 time: 41.6ms (= 8314.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.3ms (= 9468.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.14 (= 47.3ms / 41.6ms)

OneFlow resnet50 time: 39.8ms (= 7963.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.0ms (= 7607.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.96 (= 38.0ms / 39.8ms)

OneFlow swin dataloader time: 0.251s (= 50.230s / 200, num_workers=1)
PyTorch swin dataloader time: 0.259s (= 51.791s / 200, num_workers=1)
✔️ Relative speed: 1.031 (= 0.259s / 0.251s)

OneFlow swin dataloader time: 0.068s (= 13.586s / 200, num_workers=4)
PyTorch swin dataloader time: 0.067s (= 13.333s / 200, num_workers=4)
✔️ Relative speed: 0.981 (= 0.067s / 0.068s)

OneFlow swin dataloader time: 0.036s (= 7.260s / 200, num_workers=8)
PyTorch swin dataloader time: 0.037s (= 7.479s / 200, num_workers=8)
✔️ Relative speed: 1.030 (= 0.037s / 0.036s)

✔️ OneFlow resnet50 time: 135.3ms (= 13532.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.0ms (= 16002.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 160.0ms / 135.3ms)

OneFlow resnet50 time: 86.4ms (= 8645.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 99.2ms (= 9920.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 99.2ms / 86.4ms)

OneFlow resnet50 time: 61.9ms (= 12373.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.2ms (= 15239.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 76.2ms / 61.9ms)

OneFlow resnet50 time: 52.1ms (= 10417.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.8ms (= 13563.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 67.8ms / 52.1ms)

OneFlow resnet50 time: 52.3ms (= 10453.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.4ms (= 14487.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.39 (= 72.4ms / 52.3ms)

github-actions · 2022-04-16T09:56:07Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7979/

github-actions · 2022-04-16T10:49:23Z

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions · 2022-04-17T09:55:12Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12849.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 138.7ms (= 13873.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.08 (= 138.7ms / 128.5ms)

OneFlow resnet50 time: 78.7ms (= 7867.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.7ms (= 8365.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 83.7ms / 78.7ms)

OneFlow resnet50 time: 54.5ms (= 10890.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.0ms (= 12400.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.14 (= 62.0ms / 54.5ms)

OneFlow resnet50 time: 43.2ms (= 8648.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 48.6ms (= 9724.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.12 (= 48.6ms / 43.2ms)

OneFlow resnet50 time: 37.3ms (= 7460.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.4ms (= 7876.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.06 (= 39.4ms / 37.3ms)

OneFlow swin dataloader time: 0.251s (= 50.102s / 200, num_workers=1)
PyTorch swin dataloader time: 0.254s (= 50.786s / 200, num_workers=1)
✔️ Relative speed: 1.014 (= 0.254s / 0.251s)

OneFlow swin dataloader time: 0.065s (= 13.079s / 200, num_workers=4)
PyTorch swin dataloader time: 0.070s (= 13.980s / 200, num_workers=4)
✔️ Relative speed: 1.069 (= 0.070s / 0.065s)

OneFlow swin dataloader time: 0.036s (= 7.295s / 200, num_workers=8)
PyTorch swin dataloader time: 0.039s (= 7.828s / 200, num_workers=8)
✔️ Relative speed: 1.073 (= 0.039s / 0.036s)

✔️ OneFlow resnet50 time: 135.4ms (= 13537.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 157.8ms (= 15776.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 157.8ms / 135.4ms)

OneFlow resnet50 time: 88.1ms (= 8813.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 105.6ms (= 10558.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 105.6ms / 88.1ms)

OneFlow resnet50 time: 59.1ms (= 11816.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.5ms (= 15104.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 75.5ms / 59.1ms)

OneFlow resnet50 time: 53.1ms (= 10618.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.5ms (= 13309.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 66.5ms / 53.1ms)

OneFlow resnet50 time: 48.2ms (= 9630.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.1ms (= 13216.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 66.1ms / 48.2ms)

github-actions · 2022-04-17T13:40:38Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12850.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.0ms (= 14103.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.0ms / 128.5ms)

OneFlow resnet50 time: 80.2ms (= 8021.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.6ms (= 8560.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.07 (= 85.6ms / 80.2ms)

OneFlow resnet50 time: 51.3ms (= 10262.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.7ms (= 11930.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 59.7ms / 51.3ms)

OneFlow resnet50 time: 41.9ms (= 8379.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 46.0ms (= 9196.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.10 (= 46.0ms / 41.9ms)

OneFlow resnet50 time: 39.7ms (= 7939.4ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.1ms (= 8013.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.01 (= 40.1ms / 39.7ms)

OneFlow swin dataloader time: 0.254s (= 50.883s / 200, num_workers=1)
PyTorch swin dataloader time: 0.253s (= 50.698s / 200, num_workers=1)
✔️ Relative speed: 0.996 (= 0.253s / 0.254s)

OneFlow swin dataloader time: 0.064s (= 12.889s / 200, num_workers=4)
PyTorch swin dataloader time: 0.065s (= 13.046s / 200, num_workers=4)
✔️ Relative speed: 1.012 (= 0.065s / 0.064s)

OneFlow swin dataloader time: 0.036s (= 7.272s / 200, num_workers=8)
PyTorch swin dataloader time: 0.037s (= 7.460s / 200, num_workers=8)
✔️ Relative speed: 1.026 (= 0.037s / 0.036s)

✔️ OneFlow resnet50 time: 135.6ms (= 13561.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 155.9ms (= 15588.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 155.9ms / 135.6ms)

OneFlow resnet50 time: 87.1ms (= 8714.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 99.3ms (= 9932.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 99.3ms / 87.1ms)

OneFlow resnet50 time: 62.3ms (= 12457.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.0ms (= 15206.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 76.0ms / 62.3ms)

OneFlow resnet50 time: 51.4ms (= 10285.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.8ms (= 15363.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.49 (= 76.8ms / 51.4ms)

OneFlow resnet50 time: 48.4ms (= 9678.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 61.6ms (= 12322.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 61.6ms / 48.4ms)

github-actions · 2022-04-17T14:02:33Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7979/

BBuf added 20 commits March 22, 2022 19:15

fix reduce_sum scalar check bug

e6e6973

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

a0abdd5

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

00522df

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

68e0e08

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

0b90f9b

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

d81aa80

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

81954d7

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

c910fbe

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

32702cf

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

9fa0513

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

8402ce3

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

e3bf835

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

7a46669

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

5786ed5

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

1f66c27

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

265c15a

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

68f1baf

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

8f4ce10

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

df08558

fix gather kernel check bug

b7f4e5a

BBuf requested review from liujuncheng, guo-ran and MARD1NO as code owners April 7, 2022 13:49

BBuf commented Apr 7, 2022

View reviewed changes

oneflow/user/kernels/dim_gather_kernel_util.h Show resolved Hide resolved

Update dim_gather_kernel_util.h

5efac3c

MARD1NO approved these changes Apr 8, 2022

View reviewed changes

oneflow/user/kernels/dim_gather_kernel_util.h Outdated Show resolved Hide resolved

MARD1NO requested changes Apr 8, 2022

View reviewed changes

Flowingsun007 approved these changes Apr 8, 2022

View reviewed changes

fix comment

0680e85

doombeaker approved these changes Apr 8, 2022

View reviewed changes

BBuf requested review from daquexian and jackalcooper as code owners April 11, 2022 03:34

BBuf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot April 13, 2022 07:14

BBuf added the automerge label Apr 13, 2022

Merge branch 'master' into fix_gather_kernel_check

84db434

github-actions bot removed the automerge label Apr 13, 2022

BBuf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot April 14, 2022 01:57

BBuf added the automerge label Apr 16, 2022

Merge branch 'master' into fix_gather_kernel_check

6d4447d

github-actions bot removed the automerge label Apr 16, 2022

Merge branch 'master' into fix_gather_kernel_check

e4f3596

BBuf added the automerge label Apr 17, 2022

Merge branch 'master' into fix_gather_kernel_check

32ad956

mergify bot merged commit 6e9431d into master Apr 17, 2022

mergify bot deleted the fix_gather_kernel_check branch April 17, 2022 15:00

BBuf mentioned this pull request Apr 22, 2022

OneFlow 算子对齐 PyTorch 完备计划推进表 #4936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gather kernel check #7979

Fix gather kernel check #7979

BBuf commented Apr 7, 2022

MARD1NO Apr 8, 2022

BBuf Apr 8, 2022

BBuf Apr 8, 2022

doombeaker Apr 8, 2022

BBuf Apr 8, 2022

doombeaker Apr 8, 2022

doombeaker Apr 8, 2022

BBuf Apr 8, 2022

github-actions bot commented Apr 11, 2022

github-actions bot commented Apr 13, 2022

github-actions bot commented Apr 13, 2022

github-actions bot commented Apr 13, 2022

github-actions bot commented Apr 16, 2022

github-actions bot commented Apr 16, 2022

github-actions bot commented Apr 16, 2022

github-actions bot commented Apr 17, 2022

github-actions bot commented Apr 17, 2022

github-actions bot commented Apr 17, 2022

	assert(x < dim_length && "gather index is out of bounds");
	assert(x < dim_length); // gather index is out of bounds

	int64_t elem_cnt, int64_t dim_length, int32_t dim, const IDX_T* index,
	int64_t elem_cnt, int64_t dim_length, int32_t dim, const IDX_T* index,

Fix gather kernel check #7979

Fix gather kernel check #7979

Conversation

BBuf commented Apr 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2022

github-actions bot commented Apr 13, 2022

github-actions bot commented Apr 13, 2022

github-actions bot commented Apr 13, 2022

github-actions bot commented Apr 16, 2022

github-actions bot commented Apr 16, 2022

github-actions bot commented Apr 16, 2022

github-actions bot commented Apr 17, 2022

github-actions bot commented Apr 17, 2022

github-actions bot commented Apr 17, 2022