Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky test: test_deformable_convolution_options #11173

Closed
ThomasDelteil opened this issue Jun 6, 2018 · 3 comments
Closed

Flaky test: test_deformable_convolution_options #11173

ThomasDelteil opened this issue Jun 6, 2018 · 3 comments
Assignees

Comments

@ThomasDelteil
Copy link
Contributor

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/919/pipeline/

======================================================================

FAIL: test_operator_gpu.test_deformable_convolution_options

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\test_operator_gpu.py", line 1495, in test_deformable_convolution_options

    check_consistency(sym, ctx_list)

  File "C:\jenkins_slave\workspace\ut-python-gpu\pkg_vc14_gpu\python\mxnet\test_utils.py", line 1346, in check_consistency

    raise e

  File "C:\jenkins_slave\workspace\ut-python-gpu\pkg_vc14_gpu\python\mxnet\test_utils.py", line 1341, in check_consistency

    equal_nan=equal_nan)

  File "C:\jenkins_slave\workspace\ut-python-gpu\pkg_vc14_gpu\python\mxnet\test_utils.py", line 493, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1540.520020 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(0, 10, 0, 1), a=-2.890014, b=2.496659

 a: array([[[[ -0.35369632,  -1.48071766,  -1.82718635],

         [  1.31354153,   3.43152452,  -2.12554169],

         [ -3.23107505,  -1.07072127,   0.        ]],...

 b: array([[[[ -0.35369611,  -1.48071766,  -1.82718599],

         [  1.31354201,   3.43152165,  -2.12554169],

         [ -3.23107576,  -1.07072186,   0.        ]],...

-------------------- >> begin captured stdout << ---------------------

Train Err: ctx 1 vs ctx 0 at deformable_conv_offset



--------------------- >> end captured stdout << ----------------------

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1969284661 to reproduce.

--------------------- >> end captured logging << ---------------------
@kalyc
Copy link
Contributor

kalyc commented Jun 14, 2018

Thanks for submitting this issue @ThomasDelteil
This is the related PR - ce2bca6

@eric-haibin-lin
Copy link
Member

eric-haibin-lin commented Jul 2, 2018

I'm able to reproduce the err with the following example in test_operator_gpu.py:

@with_seed()
def test_deformable_convolution_options():

    ctx_list = [{'ctx': mx.gpu(0),
                 'deformable_conv_data': (2, 2, 7, 7),
                 'deformable_conv_offset': (2, 18, 3, 3),
                 'type_dict': {'deformable_conv_data': np.float64, 'deformable_conv_offset': np.float64}},
                {'ctx': mx.gpu(0),
                 'deformable_conv_data': (2, 2, 7, 7),
                 'deformable_conv_offset': (2, 18, 3, 3),
                 'type_dict': {'deformable_conv_data': np.float32, 'deformable_conv_offset': np.float32}},
                ]
    sym = mx.sym.contrib.DeformableConvolution(no_bias=True, num_filter=3, kernel=(3,3), stride=(2,2), name='deformable_conv')

    import random
    mx.random.seed(i)
    np.random.seed(i)
    random.seed(i)
    check_consistency(sym, ctx_list)
FAIL: Failure: AssertionError (
Items are not equal:
Error 808.527832 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(1, 13, 0, 2), a=-1.771303, b=-0.532353
 a: array([[[[-13.73718166,   1.72301412,  -5.12820005],
         [  0.        ,  18.53644371,   2.99046469],
         [  0.        ,   0.1865356 ,   2.20830846]],...
 b: array([[[[-13.73717976,   1.72301388,  -5.12820482],
         [  0.        ,  18.53644371,   2.99046636],
         [  0.        ,   0.18653677,   2.20830774]],...)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/ubuntu/v1.2/tests/python/gpu/test_operator_gpu.py", line 1562, in <module>
    test_deformable_convolution_options()
  File "/home/ubuntu/v1.2/tests/python/gpu/../unittest/common.py", line 157, in test_new
    orig_test(*args, **kwargs)
  File "/home/ubuntu/v1.2/tests/python/gpu/test_operator_gpu.py", line 1522, in test_deformable_convolution_options
    check_consistency(sym, ctx_list)
  File "/home/ubuntu/v1.2/python/mxnet/test_utils.py", line 1354, in check_consistency
    raise e
AssertionError:
Items are not equal:
Error 808.527832 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(1, 13, 0, 2), a=-1.771303, b=-0.532353
 a: array([[[[-13.73718166,   1.72301412,  -5.12820005],
         [  0.        ,  18.53644371,   2.99046469],
         [  0.        ,   0.1865356 ,   2.20830846]],...
 b: array([[[[-13.73717976,   1.72301388,  -5.12820482],
         [  0.        ,  18.53644371,   2.99046636],
         [  0.        ,   0.18653677,   2.20830774]],...
-------------------- >> begin captured logging << --------------------

It looks like the the gradient for deformable_conv_offset doesn't pass consistency check.

I can reproduce it with the following cmd consistently:

nosetests-2.7 tests/python/gpu/test_operator_gpu.py:test_deformable_convolution_options

@YuwenXiong have you seen this before? Any idea why this happens? With some other seeds, I sometimes see a is positive while b is negative. I'm wonder if any code in the cuda kernel is disregarding dtype of the inputs.

@marcoabreu
Copy link
Contributor

Closing as of #11595

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants