Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] refactor elementwise OP for X86; remove _mm256_zeroupper() beca… #6957

Merged
merged 3 commits into from
Sep 18, 2021
Merged

Conversation

mjp9527
Copy link
Collaborator

@mjp9527 mjp9527 commented Sep 16, 2021

[X86] refactor elementwise OP for X86;
remove _mm256_zeroupper();

@paddle-bot-old
Copy link

Thanks for your contribution!

Copy link
Collaborator

@zhaoyang-star zhaoyang-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@zhaoyang-star zhaoyang-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@zhupengyang zhupengyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,任务紧急,有问题下次修改吧

@@ -38,7 +38,7 @@ static void activate_relu_inplace(float *data, int len, float alpha, int mode) {
__m256 vec_data = _mm256_loadu_ps(data + i);
_mm256_storeu_ps(data + i, _mm256_max_ps(vec_data, vec_zero));
}
_mm256_zeroupper();
// _mm256_zeroupper();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要的代码请删掉
其它的地方也一样

@@ -294,11 +294,12 @@ class RoiAlignComputeTester : public arena::TestCase {
uint64_t new_end = *rois_lod0.rbegin() + bno + 1;
rois_lod0.push_back(new_end);
for (int i = 0; i < (bno + 1); ++i) {
float x1 = randint(0, width / spatial_scale_ - pooled_width_);
float y1 = randint(0, height / spatial_scale_ - pooled_height_);
float x1 = 1.f * randint(0, width / spatial_scale_ - pooled_width_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast好一点?

}
}

#endif // LITE_WITH_ARM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释不对?

const typename Config::T* diny,
typename Config::T* dout,
int num) {
static_assert((IS_X_SINGLE && IS_Y_SINGLE) != true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert((IS_X_SINGLE && IS_Y_SINGLE) != true,
static_assert(!(IS_X_SINGLE && IS_Y_SINGLE),

Comment on lines +153 to +170
// marco func add
ElementWiseFunc(Add) ElementWiseFuncBCast(Add)
// marco func sub
ElementWiseFunc(Sub) ElementWiseFuncBCast(Sub)
// marco func mul
ElementWiseFunc(Mul) ElementWiseFuncBCast(Mul)
// marco func max
ElementWiseFunc(Max) ElementWiseFuncBCast(Max)
// marco func min
ElementWiseFunc(Min) ElementWiseFuncBCast(Min)
// marco func div
ElementWiseFunc(Div) ElementWiseFuncBCast(Div)
// marco func floordiv
ElementWiseFunc(FloorDiv) ElementWiseFuncBCast(FloorDiv)
// marco func mod
ElementWiseFunc(Mod) ElementWiseFuncBCast(Mod)
// marco func pow
ElementWiseFunc(Pow) ElementWiseFuncBCast(Pow)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议后面加 ";",这样格式能对齐

Comment on lines +51 to +56
#define ElementWiseFuncBCast(op) \
template <typename T> \
void Elementwise_Broadcast_##op(const T* dinx, \
const T* diny, \
T* dout, \
int batch, \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不推荐这种宏的方式,可以用模板实现的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants