Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL]optimize conv3x3 when group==1 #5618

Merged
merged 4 commits into from
Mar 29, 2021

Conversation

daming5432
Copy link
Collaborator

@daming5432 daming5432 commented Mar 4, 2021

该实现主要对filter进行了重排以及一些其他修改。优化前后效果对比如下图:
图中数据均为 armv7 编译产物测得。
acb49efd3a397b7ef4bc6b0b7

@paddle-bot-old
Copy link

paddle-bot-old bot commented Mar 4, 2021

Thanks for your contribution!

int in_w_id2 = in_w_id1 + item_w * stride;
int in_w_id3 = in_w_id2 + item_w * stride;
int in_w_id4 = in_w_id3 + item_w * stride;
int in_h_id = mad24((item_h_id % out_h), stride, (-pad));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接写乘加实现,与显式使用mad24,单纯修改这类有多少性能提升,测试过这个吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我单测时测过,没有特别明显的变化,模型没有对应单独测,我再测一下。mad24手册上是建议对性考虑时优先使用

for (int w = 0; w < 3; w++) {
int in_w_val0 = select(in_w_base_id + in_w_id0 + w,
-1,
(in_w_id0 + w < 0 || in_w_id0 + w >= in_w));
(in_w_id0 + w < 0 | in_w_id0 + w >= in_w));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,按位与 操作比 或 操作,有多少性能提升,可以单独测下只修改此处的性能变化,如果有提升,select 都可以按此方式修改下。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,我直接用模型再测一下,上次测得模型都是未tune的,这次测试把tune之后的性能变化也补上。本来是修改成int in_w_val0 = ((in_w_base_id + in_w_id0 + w + 1) & -(in_w_id0 + w >= 0 & in_w_id0 + w < in_w)) - 1这种的,发现如果不修改filter实现方式性能有提升,修改后加上这个修改性能反而下降。

Copy link
Collaborator

@zhaoyang-star zhaoyang-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

两处 comments

Copy link
Collaborator Author

@daming5432 daming5432 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

int in_w_id2 = in_w_id1 + item_w * stride;
int in_w_id3 = in_w_id2 + item_w * stride;
int in_w_id4 = in_w_id3 + item_w * stride;
int in_h_id = mad24((item_h_id % out_h), stride, (-pad));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我单测时测过,没有特别明显的变化,模型没有对应单独测,我再测一下。mad24手册上是建议对性考虑时优先使用

for (int w = 0; w < 3; w++) {
int in_w_val0 = select(in_w_base_id + in_w_id0 + w,
-1,
(in_w_id0 + w < 0 || in_w_id0 + w >= in_w));
(in_w_id0 + w < 0 | in_w_id0 + w >= in_w));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,我直接用模型再测一下,上次测得模型都是未tune的,这次测试把tune之后的性能变化也补上。本来是修改成int in_w_val0 = ((in_w_base_id + in_w_id0 + w + 1) & -(in_w_id0 + w >= 0 & in_w_id0 + w < in_w)) - 1这种的,发现如果不修改filter实现方式性能有提升,修改后加上这个修改性能反而下降。

Copy link
Collaborator

@zhaoyang-star zhaoyang-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ysh329 ysh329 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daming5432 daming5432 merged commit c33eb44 into PaddlePaddle:develop Mar 29, 2021
@daming5432 daming5432 deleted the conv3x3_opt_opencl branch March 29, 2021 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants