Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

enable all activations in MKLDNN. #10089

Closed
wants to merge 2 commits into from

Conversation

zheng-da
Copy link
Contributor

Description

Previously, some activation types in MKLDNN aren't used because there was a precision problem.
This is to enable all activations in MKLDNN.

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@@ -45,11 +45,9 @@ namespace op {
bool SupportMKLDNNAct(const ActivationParam& param) {
// We only enable ReLU for now. It seems other activations have some precision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment why?

Copy link
Contributor

@marcoabreu marcoabreu Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explains why the operators have been disabled. This PR re-enables them and thus the comment is obsolete

@marcoabreu
Copy link
Contributor

How have the precision problems been resolved? Is there a test?

@zheng-da
Copy link
Contributor Author

it seems the precision problem hasn't been fixed in mkldnn. I'm notify them of this problem. hopefully, it can be fixed soon.

@marcoabreu
Copy link
Contributor

I see, thanks a lot! So we'll wait to merge this until Intel fixed the problem or how would you propose to move forward?

@pengzhao-intel
Copy link
Contributor

@zheng-da thanks a lot. I will follow up with our team :)

@zheng-da
Copy link
Contributor Author

I wanted to test if MKLDNN activation is working now. We can close the PR for now and reopen it after the bug in MKLDNN is fixed, or just keep it open. Either way is fine.

@pengzhao-intel
Copy link
Contributor

Look at the code, two implementations are the difference for the soft_relu. So, we get the different results.

mxnet:
https://github.com/apache/incubator-mxnet/blob/c9ec3118688c233a66ad847003a9e8d2d09e5952/src/operator/mshadow_op.h#L136

/*! \brief SoftReLU, also known as softplus activation */
struct softrelu : public mxnet_op::tunable {
  template<typename DType>
  MSHADOW_XINLINE static DType Map(DType a) {
    // Avoid overflow of exp for large inputs.
    // Thresholds 20.0 is chosen such that softrelu(a) = a
    // for a > 20 using floating precision
    if (a > DType(20.0f)) {
      return a;
    } else {
      return DType(math::log1p(math::exp(a)));
    }
  }
};

MXNET_UNARY_MATH_OP(softrelu_grad, -math::expm1(-a));

mkldnn:
https://github.com/intel/mkl-dnn/blob/f5218ff4fd2d16d13aada2e632afd18f2514fee3/tests/gtests/test_eltwise.cpp#L101

template <typename T>
T soft_relu_fwd(T s) {
    return logf(1 + ::expf(s));
}

template <typename T>
T soft_relu_bwd(T dd, T s) {
    return dd / (1 + ::expf(-s));
}

@zheng-da
Copy link
Contributor Author

but the error happens here: https://github.com/apache/incubator-mxnet/blob/master/tests/python/gpu/test_operator_gpu.py#L1111
activation with sigmoid fails.

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented Mar 14, 2018

@zheng-da tests/python/unittest/test_loss.py fail too which used softrelu.
I will look the case you pointed.

======================================================================

FAIL: test_loss.test_bce_loss


Traceback (most recent call last):

File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest

self.test(*self.arg)

File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new

orig_test(*args, **kwargs)

File "/work/mxnet/tests/python/unittest/test_loss.py", line 100, in test_bce_loss

assert mod.score(data_iter, eval_metric=mx.metric.Loss())[0][1] < 0.01

AssertionError:

-------------------- >> begin captured logging << --------------------

@piiswrong
Copy link
Contributor

@zheng-da @pengzhao-intel ping

@zheng-da
Copy link
Contributor Author

@pengzhao-intel is there any update from the Intel MKLDNN team?

@pengzhao-intel
Copy link
Contributor

@zheng-da @piiswrong sorry I missed the first ping. I will raise the priority for this issue and update to you soon.

@pengzhao-intel
Copy link
Contributor

@zheng-da @piiswrong Fixed the issue in the local and PR on the road by @jinhuang415

@zheng-da zheng-da mentioned this pull request Mar 30, 2018
5 tasks
@zheng-da zheng-da closed this Mar 30, 2018