Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add view weight norm where zeropad2d global test #7886

Merged
merged 65 commits into from
Apr 12, 2022

Conversation

clackhan
Copy link
Contributor

No description provided.

@@ -86,7 +86,10 @@ Maybe<void> InferWhereXYScalarTensorDesc(user_op::InferContext* ctx) {
Maybe<void> GetWhereSbpSignatures(user_op::SbpContext* ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 op 因为 condition、x、y 都是 broadcastable 的,这里的 sbp 推导可能还不完全正确?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 op 因为 condition、x、y 都是 broadcastable 的,这里的 sbp 推导可能还不完全正确?

确实不完善,我在refine一下

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这里的推导逻辑会比较复杂,看是不是可以简化一下。比如只有condition、x、y shape相同时才支持所有输入的split,如果有两个相同,另一个不同,那就参考broadcast_add那里的逻辑,如果三个都不同,那就只支持B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这里的推导逻辑会比较复杂,看是不是可以简化一下。比如只有condition、x、y shape相同时才支持所有输入的split,如果有两个相同,另一个不同,那就参考broadcast_add那里的逻辑,如果三个都不同,那就只支持B

也不是太复杂,已经完善好了,应该可以覆盖所有情况, @wyg1997 @hjchen2 有时间review一下这一部分的修改

Comment on lines 28 to 32
CHECK_OR_RETURN(a_extend_shape.At(i) == 1 || b_extend_shape.At(i) == 1
|| a_extend_shape.At(i) == b_extend_shape.At(i))
<< Error::RuntimeError() << "The size of tensor a (" << a_extend_shape.At(i)
<< ") must match the size of tensor b (" << b_extend_shape.At(i)
<< ") at non-singleton dimension " << i;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里异常类型对齐了pytorch

Comment on lines 54 to 56
JUST(CheckBroadcastable(a_shape, b_shape));
JUST(CheckBroadcastable(a_shape, c_shape));
JUST(CheckBroadcastable(b_shape, c_shape));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的check没太大必要,因为在InferWhereTensorDesc 里实际上已经check过一遍了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的check没太大必要,因为在InferWhereTensorDesc 里实际上已经check过一遍了

确实没有必要,已删除CheckBroadcastable相关逻辑

@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2022

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.4ms (= 12842.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.0ms (= 14102.5ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.0ms / 128.4ms)

OneFlow resnet50 time: 78.9ms (= 7888.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.4ms (= 8543.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.08 (= 85.4ms / 78.9ms)

OneFlow resnet50 time: 51.3ms (= 10262.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.2ms (= 11030.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.07 (= 55.2ms / 51.3ms)

OneFlow resnet50 time: 44.7ms (= 8943.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.6ms (= 9513.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.06 (= 47.6ms / 44.7ms)

OneFlow resnet50 time: 39.6ms (= 7911.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.6ms (= 7528.5ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.95 (= 37.6ms / 39.6ms)

OneFlow swin dataloader time: 0.253s (= 50.545s / 200, num_workers=1)
PyTorch swin dataloader time: 0.255s (= 50.942s / 200, num_workers=1)
✔️ Relative speed: 1.008 (= 0.255s / 0.253s)

OneFlow swin dataloader time: 0.069s (= 13.730s / 200, num_workers=4)
PyTorch swin dataloader time: 0.068s (= 13.632s / 200, num_workers=4)
✔️ Relative speed: 0.993 (= 0.068s / 0.069s)

OneFlow swin dataloader time: 0.037s (= 7.346s / 200, num_workers=8)
PyTorch swin dataloader time: 0.036s (= 7.277s / 200, num_workers=8)
✔️ Relative speed: 0.991 (= 0.036s / 0.037s)

✔️ OneFlow resnet50 time: 135.3ms (= 13527.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 155.7ms (= 15571.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 155.7ms / 135.3ms)

OneFlow resnet50 time: 85.3ms (= 8529.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.9ms (= 10287.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 102.9ms / 85.3ms)

OneFlow resnet50 time: 59.2ms (= 11834.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.8ms (= 14965.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 74.8ms / 59.2ms)

OneFlow resnet50 time: 52.7ms (= 10544.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.7ms (= 13146.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 65.7ms / 52.7ms)

OneFlow resnet50 time: 47.8ms (= 9560.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.4ms (= 12478.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 62.4ms / 47.8ms)

@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2022

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7886/

@github-actions
Copy link
Contributor

CI failed when running job: cpu-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/7886/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.9ms (= 12891.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.0ms (= 14004.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.0ms / 128.9ms)

OneFlow resnet50 time: 79.4ms (= 7937.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.2ms (= 8316.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.05 (= 83.2ms / 79.4ms)

OneFlow resnet50 time: 51.6ms (= 10320.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.5ms (= 11906.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.15 (= 59.5ms / 51.6ms)

OneFlow resnet50 time: 42.9ms (= 8581.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.9ms (= 8578.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.00 (= 42.9ms / 42.9ms)

OneFlow resnet50 time: 39.5ms (= 7902.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 43.3ms (= 8663.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.10 (= 43.3ms / 39.5ms)

OneFlow swin dataloader time: 0.247s (= 49.445s / 200, num_workers=1)
PyTorch swin dataloader time: 0.251s (= 50.208s / 200, num_workers=1)
✔️ Relative speed: 1.015 (= 0.251s / 0.247s)

OneFlow swin dataloader time: 0.067s (= 13.460s / 200, num_workers=4)
PyTorch swin dataloader time: 0.066s (= 13.116s / 200, num_workers=4)
✔️ Relative speed: 0.974 (= 0.066s / 0.067s)

OneFlow swin dataloader time: 0.036s (= 7.277s / 200, num_workers=8)
PyTorch swin dataloader time: 0.037s (= 7.375s / 200, num_workers=8)
✔️ Relative speed: 1.014 (= 0.037s / 0.036s)

✔️ OneFlow resnet50 time: 135.3ms (= 13525.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 157.0ms (= 15696.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 157.0ms / 135.3ms)

OneFlow resnet50 time: 87.5ms (= 8754.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 99.3ms (= 9933.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 99.3ms / 87.5ms)

OneFlow resnet50 time: 59.7ms (= 11946.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.1ms (= 15025.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 75.1ms / 59.7ms)

OneFlow resnet50 time: 52.6ms (= 10519.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.0ms (= 13409.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 67.0ms / 52.6ms)

OneFlow resnet50 time: 48.4ms (= 9676.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.7ms (= 12536.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 62.7ms / 48.4ms)

@clackhan clackhan merged commit c5eca53 into master Apr 12, 2022
@clackhan clackhan deleted the add_view_weight_norm_where_zeropad2d_global_test branch April 12, 2022 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants