-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL] fuse conv prelu pass #5461
[OpenCL] fuse conv prelu pass #5461
Conversation
@@ -21,7 +21,8 @@ __kernel void conv2d_1x1_opt( | |||
__private const int input_height, /* of one block */ | |||
__private const int output_width, | |||
__private const int output_height, | |||
__private const int old_w) { | |||
__private const int old_w, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是out_tensor_w嘛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是prelu的参数,是一个tensor。之前的act里面只要添加attribute就好了,prelu融合需要添加一个新的输入
output0 = activation_type4(output0, alpha0); | ||
output1 = activation_type4(output1, alpha1); | ||
output2 = activation_type4(output2, alpha2); | ||
output3 = activation_type4(output3, alpha3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
考虑能否把该值融合到activation_type和type4中,以增加默认参数的方式。这个与FUSE_SCALE_ACT还有些不同,后者FUSE_SCALE不仅在convkernel中有用到,在element kernel中也有用到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看原来的实现中有prelu的选项,如果有prelu定义时添加一个参数,如果没有就只有一个参数。你是说把第二个参数设置成默认就含有是么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的,第二个参数默认就含有
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
output2 = activation_type4(output2); | ||
output3 = activation_type4(output3); | ||
CL_DTYPE4 alpha0,alpha1,alpha2,alpha3; | ||
#ifdef PRELU_CH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
全局宏比较多,可以增加一些注释标记
比如
#ifdef PRELU_CH // {
// }
#endif defined(PRELU_ELE) // {
// }
等
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改
ffdc200
to
8d1c252
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…into fuse_conv_prelu_pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mtcnn_det2 骁龙865融合前耗时2.28ms,融合后2.07ms
mtcnn_det3 骁龙865融合前耗时5.69ms,融合后4.72ms