You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
import mxnet as mx
from mxnet import autograd, np, npx, gluon, init
from mxnet.gluon import nn
import time
npx.set_np()
data = mx.np.random.uniform(size=(32, 100, 100), ctx=mx.gpu())
label = mx.np.ones((32, 100, 100), ctx=mx.gpu())
net = nn.Sequential()
net.add(nn.BatchNorm(axis=-1))
net.initialize(init.Xavier(), ctx=mx.gpu())
loss = gluon.loss.L2Loss()
t = time.time()
for _ in range(5000):
with autograd.record():
l = loss(net(data), label)
l.backward()
mx.nd.waitall()
print('spent: {}s'.format(time.time() - t))
MXNet version: static build with branch v1.7x commit 75ab155
I got around 5 sec with axis=1 and 30 sec with axis=-1 on P3.8xlarge (V100).
Both of case are computing the 32 * 100 data for each axis
similar to #10095
Description
MXNet version: static build with branch v1.7x commit 75ab155
I got around 5 sec with axis=1 and 30 sec with axis=-1 on P3.8xlarge (V100).
Both of case are computing the 32 * 100 data for each axis
similar to #10095
Solution
Thanks @ptrendx to point out that cudnn 7.4 (https://docs.nvidia.com/deeplearning/sdk/cudnn-release-notes/rel_7xx.html#rel_741) added a new cudnnBatchNormalization*Ex API that gives much better speed for axis = -1
The text was updated successfully, but these errors were encountered: