BatchNorm backward(train_mode=False) incorrect behavior on context mx.gpu() #18209

brian-mann-math · 2020-04-30T21:02:10Z

Description

When training DeepDream style CNN visualizations using VGG16_bn, I noticed that the results did not seem correct compared to PyTorch. Running an apples-to-apples comparison, I discovered that the problem does not exist for pretrained CNNs without BatchNorm.

Furthermore, after some experimenting with train and predict mode, I have narrowed the problem to a bug in nn.BatchNorm .backward(train_mode=False) when nn.BatchNorm is initialized on ctx=mx.gpu().

In the example below, you can see BatchNorm works fine on mx.cpu() but not mx.gpu(). If you compare to PyTorch (either on gpu or cpu) you will see results very close to my example run on cpu.

Error Message

There is no error generated - the gradient is just computed incorrectly on GPU.

To Reproduce

Running MXNet version 1.6.0. Installed via pip install mxnet-cu101mkl.

import mxnet as mx
from mxnet import gluon, autograd, nd
from mxnet.gluon import nn

import numpy as np

def test(ctx):
    print("Context:", ctx)
    bn = nn.BatchNorm()
    bn.initialize(ctx=ctx)
    # For apples-to-apples comparison with PyTorch
    # create a numpy array with a fixed random seed
    # and convert to ndarray or torch tensor as needed
    np.random.seed(42) 
    x = np.random.randn(16, 100)
    x = nd.array(x, ctx=ctx)
    
    x.attach_grad()
    with autograd.record():
        y = bn(x).norm()
    y.backward()
    print('Gradient with .backward(train_mode=True):')
    print(x.grad)
    
    x.attach_grad()
    with autograd.record():
        y = bn(x).norm()
    y.backward(train_mode=False)
    print('Gradient with .backward(train_mode=False):')
    print(x.grad)

test(mx.gpu())
print('\n')
test(mx.cpu())

this should output

Context: gpu(0)
Gradient with .backward(train_mode=True):

[[ 1.54331431e-07 -9.90738691e-09  2.17082999e-07 ...  3.70418292e-08
  -4.56034911e-07 -1.70796298e-07]
 [-5.31670935e-07 -6.50823111e-08 -2.83676826e-07 ...  1.12570149e-08
  -3.72994634e-07 -7.78542471e-07]
 [ 1.03611626e-07  1.26948521e-07  4.37038494e-07 ...  4.80843418e-08
   8.25267421e-07  4.09489786e-07]
 ...
 [-2.66233769e-07  2.10875697e-07 -1.98560244e-07 ... -2.59636550e-07
   1.43746172e-06 -4.55288415e-07]
 [-4.00260944e-07  1.21869405e-07  4.91964613e-07 ...  2.66210179e-07
   1.15469061e-06  3.84336460e-07]
 [ 2.54435122e-07 -9.07216915e-08 -5.26608744e-07 ...  4.71994440e-07
  -4.30146486e-07 -5.01510669e-07]]
<NDArray 16x100 @gpu(0)>
Gradient with .backward(train_mode=False):

[[ 1.5412704e-07 -9.9587867e-09  2.1744940e-07 ...  3.6937106e-08
  -4.5603929e-07 -1.7041563e-07]
 [-5.3151490e-07 -6.5216888e-08 -2.8525079e-07 ...  1.1193077e-08
  -3.7290255e-07 -7.7769141e-07]
 [ 1.0385542e-07  1.2743682e-07  4.3893598e-07 ...  4.8083859e-08
   8.2525179e-07  4.0822903e-07]
 ...
 [-2.6548534e-07  2.1083363e-07 -1.9969134e-07 ... -2.5873848e-07
   1.4370964e-06 -4.5389191e-07]
 [-3.9940915e-07  1.2190962e-07  4.9197621e-07 ...  2.6582453e-07
   1.1571157e-06  3.8428274e-07]
 [ 2.5489476e-07 -9.0713620e-08 -5.2661966e-07 ...  4.7046700e-07
  -4.3135498e-07 -5.0008794e-07]]
<NDArray 16x100 @gpu(0)>


Context: cpu(0)
Gradient with .backward(train_mode=True):

[[ 1.52795309e-07 -9.42009581e-09  2.20310852e-07 ...  3.75795253e-08
  -4.56473600e-07 -1.71663274e-07]
 [-5.25615860e-07 -6.48453096e-08 -2.87071714e-07 ...  1.17580106e-08
  -3.70792492e-07 -7.82021573e-07]
 [ 1.02882176e-07  1.28814335e-07  4.42847067e-07 ...  4.88764371e-08
   8.30220586e-07  4.10084482e-07]
 ...
 [-2.60770662e-07  2.13814275e-07 -2.01395267e-07 ... -2.60059522e-07
   1.44180660e-06 -4.55384509e-07]
 [-3.95230529e-07  1.24432887e-07  4.94030417e-07 ...  2.67437116e-07
   1.16999058e-06  3.86242363e-07]
 [ 2.52621589e-07 -9.11339484e-08 -5.34086894e-07 ...  4.72164828e-07
  -4.29882903e-07 -5.05452988e-07]]
<NDArray 16x100 @cpu(0)>
Gradient with .backward(train_mode=False):

[[ 0.01185271 -0.0011071   0.01301745 ...  0.0038546  -0.01176191
  -0.00832031]
 [-0.04086535 -0.00770687 -0.01701735 ...  0.00120027 -0.00958996
  -0.03797242]
 [ 0.00802236  0.01523095  0.02622019 ...  0.00499825  0.02128031
   0.01989006]
 ...
 [-0.0202621   0.02531023 -0.01193004 ... -0.02662995  0.03713372
  -0.02210556]
 [-0.03070656  0.01466694  0.02933322 ...  0.02728209  0.0299198
   0.01867896]
 [ 0.019618   -0.01075783 -0.03143681 ...  0.04828149 -0.01112048
  -0.02442675]]
<NDArray 16x100 @cpu(0)>

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

----------Python Info----------
Version : 3.6.5
Compiler : GCC 7.2.0
Build : ('default', 'Apr 29 2018 16:14:56')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 10.0.1
Directory : /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Num GPUs : 1
Commit Hash : 6eec9da
----------System Info----------
Platform : Linux-4.14.171-105.231.amzn1.x86_64-x86_64-with-glibc2.9
system : Linux
node : ip-172-16-165-164
release : 4.14.171-105.231.amzn1.x86_64
version : #1 SMP Thu Feb 27 23:49:15 UTC 2020
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 1812.110
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.14
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0018 sec, LOAD: 0.5285 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0005 sec, LOAD: 0.5373 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1000 sec, LOAD: 0.1116 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0091 sec, LOAD: 0.3228 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0350 sec, LOAD: 0.1672 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0486 sec, LOAD: 0.3785 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0022 sec, LOAD: 0.1064 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.002210855484008789 sec.

The text was updated successfully, but these errors were encountered:

brian-mann-math added the Bug label Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchNorm backward(train_mode=False) incorrect behavior on context mx.gpu() #18209

BatchNorm backward(train_mode=False) incorrect behavior on context mx.gpu() #18209

brian-mann-math commented Apr 30, 2020

BatchNorm backward(train_mode=False) incorrect behavior on context mx.gpu() #18209

BatchNorm backward(train_mode=False) incorrect behavior on context mx.gpu() #18209

Comments

brian-mann-math commented Apr 30, 2020

Description

Error Message

To Reproduce

Environment