Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added scale op FP32/BF16 FWD/BWD kernels #32975

Merged
merged 18 commits into from
May 25, 2021

Conversation

jakpiase
Copy link
Contributor

PR types

New features

PR changes

OPs

Describe

Added scale op FP32/BF16 FWD/BWD kernels for enabling Word2Vec bf16 training and fixing customer issues with switching from pure PaddlePaddle to oneDNN ops.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

jczaja
jczaja previously approved these changes May 20, 2021
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidanqing-intel
Copy link
Contributor

@jakpiase Hi did you test with ssdlite_mobilenet_v3 model?

@jczaja jczaja requested a review from arlesniak May 24, 2021 14:44
Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakpiase
Copy link
Contributor Author

@luotao1 Could you please start your review?

@lidanqing-intel
Copy link
Contributor

lidanqing-intel commented May 24, 2021

Hi,

  1. oneDNN scale kernel is found called in DNNL_VERBOSE logs
dnnl_verbose,exec,cpu,eltwise,jit:avx512_common,forward_training,data_f32::blocked:acdb:f0 diff_undef::undef::f0,,alg:eltwise_linear alpha:1 beta:3,1x16x160x160,1.36597
  1. and performance of ssdlite_mobilenetv3 on my i9 machine is improved from 2.23 QPS to 41.32 QPS.
  2. No oneDNN ref version is called from the log so perf is good now.

Overall, this PR improve ssdlite_mobilenetv3 performance a lot. We should merge it.

Copy link
Contributor

@Avin0323 Avin0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for unity_build_rule.cmake

@luotao1 luotao1 merged commit 86ea8dc into PaddlePaddle:develop May 25, 2021
lidanqing-intel pushed a commit to lidanqing-intel/Paddle that referenced this pull request Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants