-
Notifications
You must be signed in to change notification settings - Fork 6.8k
MXNET_MKLDNN_DEBUG=1 produces errors #10026
Comments
@marcoabreu @cjolivier01 This is a very useful function to verify the correctness of MKL-DNN OP. 1. Numerical precision for convolution @cjolivier01
2. The After I changed these two, all cases passed.
3. The bug in |
@marcoabreu We got the root cause of 3) in above comments. It's not the MKL-DNN implementation issues. Just need to improve the test method under MXNET_MKLDNN_DEBUG @cjolivier01 Thanks for the nice functionality to check the results of MKL-DNN. So, a possible fix for OpCheck.Init, @zheng-da please help take a review.
|
When I wrote the test, I followed the python test. https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L470 When assert_almost_equal is called https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L1313, it uses 1e-3 for both rtol and atol. I didn't know why the test fails. |
As for modifying OpCheck.Init, you can do
Please see the example here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_fully_connected.cc#L95 |
@zheng-da I have looked into Reorder2Default, but it will also convert to the original shape rather than the new shape of 'reshape'. |
@pengzhao-intel I believe when you call copyfrom it will convert the input memory shape into the same shape as the target. so if you Reorder2Default but then call copyfrom the mkldnn memory will be the new shape
|
above PR addresses issue. @marcoabreu can you close? |
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9995/32/pipeline/483
Setting
MXNET_MKLDNN_DEBUG=1
as environment variable will produce the following error in tests. This happens across all configurations and seeds. I do not think that this is a test failure.The text was updated successfully, but these errors were encountered: