-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x86] Depthwise conv2d #6745
[x86] Depthwise conv2d #6745
Conversation
Thanks for your contribution! |
@@ -87,12 +87,16 @@ if (WITH_AVX AND AVX_FOUND) | |||
math_library (interpolate AVX2 TRUE DEPS math_function) | |||
math_library (power DEPS AVX2 TRUE DEPS avx_mathfuns) | |||
math_library (rnn AVX2 TRUE) | |||
math_library (conv_depthwise_direct AVX2 TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加上性能优化后的数据,例如:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__m128i mask = _mm_setr_epi32(0x80000000, 0x80000000, 0x80000000, 0); | ||
if (j + 1 == col) { | ||
__m256 rmaski_ = _mm256_loadu_ps(rmask_i); | ||
i0 = _mm256_mul_ps(i0, rmaski_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以用_mm256_maskload_ps 实现,有效数据load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x80000000); | ||
if (j + 1 == col) { | ||
__m256 rmaski_ = _mm256_loadu_ps(rmask_i); | ||
i0 = _mm256_mul_ps(i0, rmaski_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
} // namespace math | ||
} // namespace x86 | ||
} // namespace lite | ||
} // namespace paddl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加空行
lite/kernels/x86/conv_depthwise.cc
Outdated
int ow = o_dims[3]; | ||
int oc = o_dims[1]; | ||
|
||
lite::x86::math::conv_depthwise_direct( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是不是可以直接调用具体实现,减少嵌套调用。
如:if stride == 1
conv_depthwise_3x3s1_p1_direct(din,
dout,
num,
ch_out,
h_out,
w_out,
ch_in,
h_in,
w_in,
weights,
bias,
pad,
flag_bias,
act_param);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
add depthwise 3×3s1p1 3×3s2p1 optimize