Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake error with "CUDA_cublas_device_LIBRARY" #1457

Closed
harumo11 opened this issue Sep 21, 2018 · 15 comments
Closed

CMake error with "CUDA_cublas_device_LIBRARY" #1457

harumo11 opened this issue Sep 21, 2018 · 15 comments

Comments

@harumo11
Copy link
Contributor

Hello,
Thanks for good library and developers.

I'm trying to install for C++.
But, I get error messages when I type following command.

cmake .. -DEIGEN3_INCLUDE_DIR=/home/robot/opt/eigen3 -DENABLE_CPP_EXAMPLES=ON -DBACKEND=cuda -DCUDNN_ROOT=/usr/local/cuda 

And I get following error messages.

-- BACKEND: cuda
CUDA_LIBRARIES: /usr/local/cuda/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so;/usr/local/cuda/lib64/libcurand.so
-- Found CUDNN (include: /usr/local/cuda/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Successfully include CUDNN flags
-- Eigen dir is /home/robot/opt/eigen3
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
    linked by target "dynet" in directory /home/robot/opt/dynet/dynet
    linked by target "1_linear_regression" in directory /home/robot/opt/dynet/tutorial
    linked by target "0_multiply" in directory /home/robot/opt/dynet/tutorial
    linked by target "xor-multidevice" in directory /home/robot/opt/dynet/examples
    linked by target "xor" in directory /home/robot/opt/dynet/examples
    linked by target "rnnlm-aevb" in directory /home/robot/opt/dynet/examples
    linked by target "fflm" in directory /home/robot/opt/dynet/examples
    linked by target "xor-batch" in directory /home/robot/opt/dynet/examples
    linked by target "imdb" in directory /home/robot/opt/dynet/examples
    linked by target "tok-embed" in directory /home/robot/opt/dynet/examples
    linked by target "rnnlm-batch" in directory /home/robot/opt/dynet/examples
    linked by target "embed-cl" in directory /home/robot/opt/dynet/examples
    linked by target "xor-autobatch" in directory /home/robot/opt/dynet/examples
    linked by target "mnist" in directory /home/robot/opt/dynet/examples
    linked by target "rnnlm-batch-nce" in directory /home/robot/opt/dynet/examples
    linked by target "rnn-autobatch" in directory /home/robot/opt/dynet/examples
    linked by target "poisson-regression" in directory /home/robot/opt/dynet/examples
    linked by target "read-write" in directory /home/robot/opt/dynet/examples
    linked by target "rnnlm" in directory /home/robot/opt/dynet/examples
    linked by target "attention" in directory /home/robot/opt/dynet/examples
    linked by target "encdec" in directory /home/robot/opt/dynet/examples
    linked by target "tag-bilstm" in directory /home/robot/opt/dynet/examples
    linked by target "rnnlm-cfsm" in directory /home/robot/opt/dynet/examples

-- Configuring incomplete, errors occurred!
See also "/home/robot/opt/dynet/build/CMakeFiles/CMakeOutput.log".
See also "/home/robot/opt/dynet/build/CMakeFiles/CMakeError.log".

More nvidia library is needed?
I already installed cudnn and cuda tool kit.

By the way, my nvidia environment is as below.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
| 23%   28C    P8    12W / 250W |    287MiB / 11175MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1061      G   /usr/lib/xorg/Xorg                           180MiB |
|    0      1282      G   /usr/bin/gnome-shell                          98MiB |
|    0      1940      G   /usr/lib/firefox/firefox                       2MiB |
|    0      2191      G   /usr/lib/firefox/firefox                       2MiB |
+-----------------------------------------------------------------------------+

Please tell me how to solve this problem.
Thanks in advance.

@pmichel31415
Copy link
Collaborator

Hi, can you give your CUDA and compiler version, as well as the output of a clean cmake (remove the build dir, recreate it and rerun cmake)?

@pmichel31415
Copy link
Collaborator

pmichel31415 commented Sep 21, 2018

Please also tell us the version of dynet (exact commit) you're trying to install

@harumo11
Copy link
Contributor Author

Hi, @pmichel31415
Thanks for replaying.
My CUDA version and other information are as below.

  • GCC version
    7.3.0
  • CUDA tookkit version
    10.0
  • cuDNN verion
    v7.3.0
  • dynet version
    Sorry, I don't know how to confirm dynet version.
    I copied dynet from github on last Friday(21 Sep. 2018).
    And I can get following information.
 git log
---------------------------------------------------------------------------------------
 commit c93962933803039cc59bc3cb21277048def086bc (HEAD -> master, origin/master,     origin/HEAD)
 Merge: d9d3e800 bac99679
Author: Paul Michel <pmichel31415@gmail.Com>
Date:   Tue Sep 18 23:23:07 2018 -0400

   Merge pull request #1452 from clab/fix-block-dropout
   
   Fix block dropout on GPU

commit bac99679eb6f9b963e186d8fe336cfe214cba7d4 (origin/fix-block-dropout)
Author: Paul Michel <pmichel31415@gmail.com>
Date:   Tue Sep 18 16:33:13 2018 -0400

   Fix block dropout on GPU

commit d9d3e8004771746fb6c8a491c087632b7018c87f (tag: 2.1)
Author: Brian Lester <blester125@users.noreply.github.com>
Date:   Tue Sep 18 10:51:16 2018 -0400

   Fix glorot initialization for convolutional kernels (#1420)
   
   * fix glorot initialization for convs
   
:
cat /proc/driver/nvidia/version
---------------------------------------------------------------------------------------------------------
NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.48  Thu Sep  6 06:36:33 CDT 2018
GCC version:  gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3) 

I tried to delete build directory and rebuild,
But I got same errors.

@harumo11
Copy link
Contributor Author

And my OS environment is Ubuntu18.04.

@codelilei
Copy link

I also got the problem when I compiled caffe with the latest CUDA 10.0, and after upgrading CMake from 3.12.1 to 3.12.2 it's done @harumo11

@pmichel31415
Copy link
Collaborator

Yes I suspect this is an issue with CUDA 10.0 @harumo11 if you can confirm that @codelilei 's fix is working for you on Dynet let us know so we can update the documentation

@texttheater
Copy link

Confirmed: I had the same problem (Ubuntu 18.04, CUDA 10.0) and installing CMake 3.12.2 (instead of the distro's 3.10.2) fixed it.

@harumo11
Copy link
Contributor Author

Thanks @texttheater

I tried to install CMake 3.12.2, and the error was not appeared.
But I got another error in stead of the error as below.

$ cmake .. -DEIGEN3_INCLUDE_DIR=/home/robot/opt/eigen3 -DENABLE_CPP_EXAMPLES=ON -DBACKEND=cuda -DCUDNN_ROOT=/usr/local/cuda 
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- BACKEND: cuda
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "10.0") 
CUDA_LIBRARIES: /usr/local/cuda/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so;/usr/local/cuda/lib64/libcurand.so
-- Found CUDNN (include: /usr/local/cuda/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Successfully include CUDNN flags
-- Eigen dir is /home/robot/opt/eigen3
-- Configuring done
CMake Warning at /usr/local/share/cmake-3.12/Modules/FindCUDA.cmake:1816 (add_library):
  Cannot generate a safe runtime search path for target dynet because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  dynet/CMakeLists.txt:287 (cuda_add_library)


CMake Warning at tutorial/CMakeLists.txt:4 (add_executable):
  Cannot generate a safe runtime search path for target 1_linear_regression
  because files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.


CMake Warning at tutorial/CMakeLists.txt:4 (add_executable):
  Cannot generate a safe runtime search path for target 0_multiply because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target xor-multidevice
  because files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:44 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target xor because files in
  some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:43 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target rnnlm-aevb because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:41 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target fflm because files in
  some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:26 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target xor-batch because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:24 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target imdb because files in
  some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:25 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target tok-embed because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:42 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target rnnlm-batch because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:23 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target embed-cl because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:34 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target xor-autobatch because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:22 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target mnist because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:27 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target rnnlm-batch-nce
  because files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:28 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target rnn-autobatch because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:21 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target poisson-regression
  because files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:29 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target read-write because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:30 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target rnnlm because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:31 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target attention because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:35 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target encdec because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:36 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target tag-bilstm because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:38 (ADD_EXAMPLE)


CMake Warning at examples/CMakeLists.txt:6 (add_executable):
  Cannot generate a safe runtime search path for target rnnlm-cfsm because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda/lib64

  Some of these libraries may not be found correctly.
Call Stack (most recent call first):
  examples/CMakeLists.txt:37 (ADD_EXAMPLE)


-- Generating done
-- Build files have been written to: /home/robot/opt/dynet/build

I tried following command.

cmake .. -DEIGEN3_INCLUDE_DIR=/home/robot/opt/eigen3 -DENABLE_CPP_EXAMPLES=ON

The command gave me no error, although default back-end set libeigen.

Should do I re-install cuDNN?

@harumo11
Copy link
Contributor Author

There is no relationship between CUDA_cublas_device_LIBRARY (ADVANCED) and runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in: .

Sorry, I solved runtime library [libcudnn.so.7] in /usr/lib/x86_64-linux-gnu may be hidden by files in: error with this thread.

@harumo11
Copy link
Contributor Author

harumo11 commented Sep 26, 2018

I solved this CMake error.
Thanks a lot for @pmichel31415, @texttheater and @codelilei.

robot@robot-NG:~/opt/dynet/build$ cmake .. -DEIGEN3_INCLUDE_DIR=/home/robot/opt/eigen3 -DENABLE_CPP_EXAMPLES=ON -DBACKEND=cuda -DCUDNN_ROOT=/usr/local/cuda/
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- BACKEND: cuda
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "10.0") 
CUDA_LIBRARIES: /usr/local/cuda/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so;/usr/local/cuda/lib64/libcurand.so
-- Found CUDNN (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so)
-- Successfully include CUDNN flags
-- Eigen dir is /home/robot/opt/eigen3
-- Configuring done
-- Generating done
-- Build files have been written to: /home/robot/opt/dynet/build

Therefore I will close this issue.
Thanks again!

KellenSunderland added a commit to KellenSunderland/incubator-mxnet that referenced this issue Nov 24, 2018
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
KellenSunderland added a commit to KellenSunderland/incubator-mxnet that referenced this issue Nov 26, 2018
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
KellenSunderland added a commit to KellenSunderland/incubator-mxnet that referenced this issue Jan 14, 2019
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
KellenSunderland added a commit to KellenSunderland/incubator-mxnet that referenced this issue Jan 14, 2019
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
KellenSunderland added a commit to KellenSunderland/incubator-mxnet that referenced this issue Jan 14, 2019
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
KellenSunderland added a commit to KellenSunderland/incubator-mxnet that referenced this issue Jan 16, 2019
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
KellenSunderland added a commit to apache/mxnet that referenced this issue Jan 16, 2019
)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
marcoabreu pushed a commit to apache/mxnet that referenced this issue Jan 17, 2019
…ugs. (#13897)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
ashokei added a commit to NervanaSystems/ngraph-mxnet that referenced this issue Feb 12, 2019
* fix link for gluon model zoo (#13583)

* Fix exception handling api doc (#13519)

* Fix exception handling api doc

* Update waitall api doc

Co-Authored-By: anirudh2290 <anirudh2290@apache.org>

* add cpp example inception to nightly test (#13534)

* add inception test

* fix max iter for mlp

* rename and add comment

* rename epoch num

* Add notes about debug with libstdc++ symbols (#13533)

* Add imresize and copyMakeBorder to mx.image (#13357)

* Add imresize API to docs

* address comments

* copyMakeBorder

* [MXNET-1253] fix control_flow_op (#13555)

* fix control_flow_op

* change type for M

* add test for sparse where op

* Add Intel MKL blas to Jenkins (#13607)

* add mkl blas to Jenkins

* add mkl install script

* fix bug in mkl script

* remove python2 ut and add cpu-mkl node

*  #13385 [Clojure] - Turn examples into integration tests (#13554)

* fix the Float not showing correctly problem (#13617)

Merge this PR for 1.4.x

* [MXNET-1155] Add scala packageTest utility (#13046)

* [MXNET-1155] Add scala packageTest utility

* Clean up utility

* Safe change directory in Makefile for scala

* mvn install file instructions with details

* [MXNET-1224]: improve scala maven jni build and packing. (#13493)

Major JNI feature changes. Please find more info here: https://cwiki.apache.org/confluence/display/MXNET/Scala+maven+build+improvement

* [MXNET-1225] Always use config.mk in make install instructions (#13364)

* Always use config.mk in make install instructions

* Specify Cuda 0 for ubuntu with mkldnn

* Scala install doc avoid build_from_source

Minor doc fixes

* Fix build_from_source CMake usage

* CPP Install Instruction with CMake

* Use cmake out of source build

* Fix warning in waitall doc (#13618)

* Optimize C++ API (#13496)

* Optimize C++ API

Pass parameter with reference instead of value.
Add const as well as it is not changed.

* fix docs/architecture/overview.md

Fix BinaryShapeFunction typedef
Add a right brace for SmoothL1Shape_

* fix quantize pass error when the quantization supported Op are excluded in the model (#13596)

* Scripts for building dependency libraries of MXNet (#13282)

* openblas script

* ps-lite dependencies

* USE_S3 dependencies

* image libraries

* license

* add batch norm test (#13625)

* add batch norm test

* fix formatting

* use out_arr as input

* fix typo

* remove const

* use ptr

* eval ptr

* Set install path for libmxnet.so dynamic lib on Mac OS (#13629)

* Fix the bug of BidirectionalCell (#13575)

* Fix the bug of BidirectionalCell

I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length"  when call unroll( )  to compute r_outputs and r_states.

* add a test for BidirectionalCell

* Fix the bug of BidirectionalCell

I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length"  when call unroll( )  to compute r_outputs and r_states.

* fix test_bidirectional_unroll_valid_length( )

Fix the error of parameter.

* Fix the bug of BidirectionalCell

I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length"  when call unroll( )  to compute r_outputs and r_states.

* fix test_bidirectional_unroll_valid_length( )

* Feature/mkldnn static (#13628)

* Revert "Revert "Feature/mkldnn static 2 (#13503)" (#13540)"

This reverts commit a3eca5f5c96eed0bc29bd4e58e470997091a1fb3.

* include headers on mkldnn lib

* retrigger

* retrigger

* build config for maven and pip (#13556)

* config for pip

* symbol whitelist

* maven build config

* Fix for import mxnet taking long time if multiple process launched (#13602)

* https://github.com/apache/incubator-mxnet/issues/12255
doing import mxnet in multiple processes take very long time.
Details : #12255
One of the reason we have OMP tuning code which iterates to find OMP
tune overhead. We are reducing this iteration count to reduce the
overehead of tuning code.
Also, We added an environment variable where users can set the number
of cores that should be used to determine tuning.

* cpplint fix

* Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc

* fixing formatting in doc

* Add reshape op supported by MKL-DNN (#12980)

* Add reshape op supported by MKL-DNN

* fix build issue

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix white space

* add unit test

* merge if blocks

* Improve dev_menu usability, local build and virtualenv (#13529)

* Improve dev_menu, add build command and virtualenv creation with local builds for easy testing

* Update dev_menu.py

Co-Authored-By: larroy <pedro.larroy.lists@gmail.com>

* Cuda off by default, use ccache

* address CR

* [Clojure] Correct the versions in the README so they correspond to the latest maven.org release (#13507)

* Correct the versions so they correspond to the latest maven.org release

* trigger build

* feedback from @kohr-h

* Optimization of metric evaluation (#13471)

* Change argsort to argpartition

* Global statistics in metrics

* Fix lint

* Fixes from review

* Trigger

* Fixes from review, fix to F1, MCC and perplexity metrics,
added test for global stats

* Fix lint

* Fix compatibility with Python 2

* Revert "Feature/mkldnn static (#13628)" (#13638)

This reverts commit 5bcf2bd6e8b48fa27bfcfdafd06401ec2d28978b.

* support mkl log when dtype is fp32 or fp64 (#13150)

* support mkl log when dtype is fp32 or fp64

* remove macro

* ensure data size less than or equal MKL_INT_MAX

* code specification

* fix indent

* for retrigger

* [MXNET-1209] Tutorial transpose reshape  (#13208)

* transpose tutorial

* Adding Anirudhs comments

* Update tutorial with some more examples

* Adding links

* Fixing the links, adding more examples

* Update reshape_transpose.md

* Fixing spelling mistakes

* Updating image resolution

* Adding Simon's comments

* Small fixes

* Update reshape_transpose.md

* Update reshape_transpose.md

* empty commit

* empty commit

* updated reference to Apache MXNet (#13645)

* Complimentary gluon DataLoader improvements (#13606)

* init

* add tests

* doc

* lint

* fix openmp

* Improve CCache handling (#13456)

* Remove gitignore entries

* Modify Makefile

* Modify user permissions

* Add new ccache wrapper function

* Change PATH rewrite to a different one to resolve CUDA issues

* Add ccache to gpu cmake

* Enable ccache for every build

* Set permissions for arm dockerfiles

* Disable ccache for ASAN

* Remove g++-8 ccache redirect

* Update Android Dockerfiles for user permissions

* Fix ASAN compiler typo

* Remove sanity for speed

* Move build dir creation in android armv8

* Revert "Remove sanity for speed"

This reverts commit e8386a774dafe96337930b9cac36cb24fc36585e.

* Add ccache for NVCC in Makefile

* [MXNET-918] Random module (#13039)

* introduce random API

* revert useless changes

* shorter types in APIDoc gen code

* fix after merge from master

* Trigger CI

* temp code / diag on CI

* cleanup type-class code

* cleanup type-class code

* fix scalastyle

* Fix incorrect delete in MXExecutorReshape exception handling (#13376)

* Fix bad delete.

Delete the pointed-to handle on cleanup, not the location of the handle itself. Also don't delete it if we didn't set it in the first place.

* Remove unusued 'exec' var from MXExecutorBindEX.

* [MXNET-1251] Basic configuration to do static-linking (#13621)

* Basic configuration to do static-linking

* update build script and place it in the install part

* clean up the code further

* revert maven into build-from-source

* add curl to deps

* [MXNET-1195] Cleanup Scala README file (#13582)

* Updated the Scala-Readme with upto-date information

* Updated the header

* Removed redundant build status

* Minor formatting changes

* Addressed the PR feedback

* Added section on Scala training APIs

* Removed mention of deprecated Model API

* scripts for building libmxnet binary and wheel (#13648)

* add script for making all dependencies

* tools for building pip package

* build scripts for lib and wheel

* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API (#13294)

* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API

* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API

* Updated the code to address the review comments.

* Added the README file for the folder.

* Addressed the review comments

* Addressed the review comments to use argmax and default mean values.

* Update MKLDNN_README.md (#13653)

* Support Quantized Fully Connected by INT8 GEMM (#12922)

* add quantized fully connect support

* disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library

* retrigger to ci testing

* move implementation to cc file and add  STORAGE_TYPE_ASSIGN_CHECK

* fix typo bug

* retrigger the ci test

* fix typo bug

* retrigger ci

* retrigger the ci test

* retrigger the ci

* retrigger the ci test

* retrigger ci test

* fix indent issue

* retrigger the ci

* retrigger the ci test

* add verbose message

* update log message

* using range for loop

* using for auto range

* enable MKL BLAS ci test

* fix typo issue

* use TYPE_ASSIGN_CHECK

* retrigger the ci

* add build fix for Scala/Java build (#13655)

* Fix Jetson compilation (#13532)

* remove omp which can cause ssd accuracy variance (#13622)

* Revert "[MXNET-43] Fix Jetson compilation" (#13665)

* Revert "remove omp which can cause ssd accuracy variance (#13622)"

This reverts commit 655f1c6f7a0706dd622f73db9af2e6df895ca213.

* Revert "Fix Jetson compilation (#13532)"

This reverts commit 48e25c4cae355753dd96ea7afe004bf78e0719e4.

* Fix Jetson compilation (#13666)

* turn on Sphinx warnings as errors (#13544)

* turn on warnings as errors

* move warnings as error logic to build_all_version

* fix typo in comment

* add warning as error option for docs pipeline

* bump ci to test again; use this chance to add notes on this feature

* fix bugs in image.py docs

* Update CODEOWNERS, add Pedro Larroy. (#13579)

* Revert "Revert "[MXNET-43] Fix Jetson compilation" (#13665)" (#13672)

This reverts commit 3433776dac7be75928082bbc1d552fca248fb8e8.

* Accelerate DGL csr neighbor sampling (#13588)

* Speedup and fix bug in dgl_csr_sampling op

* Update dgl_graph.cc

* simplify functions.

* avoid adding nodes in the last level in the queue.

* remove a hashtable lookup in neigh_pos.

* reduce a hashtable lookup in sub_ver_mp.

* merge copying vids and layers.

* reduce hashtable lookup when writing to output csr.

* fix a bug.

* limit the number of sampled vertices.

* fix lint.

* fix a compile error.

* fix compile error.

* fix compile.

* remove one hashtable lookup per vertex and hashtable iteration.

* remove queue.

* use vector for neigh_pos.

* fix lint

* avoid init output arrays.

* fix tests.

* fix tests.

* update docs.

* retrigger

* retrigger

* [MXNET-1252][1 of 2] Decouple NNVM to ONNX from NNVM to TenosrRT conversion (#13659)

* fix unpicklable transform_first on windows (#13686)

* Move the debug output message into MXNET_MKLDNN_DEBUG (#13662)

* NEWS.md backport from v1.4.x to master (#13693)

* merge NEWS.md from 1.4.x to master

* NEWS.md backport from v1.4.x to master

* Fallback to dense version for grad(reshape), grad(expand_dims) (#13599)

* fallback to dense version for grad(reshape), grad(expand_dims)

* add _backward_reshape gpu version

* reshape test case comments

* fix gpu test

* remove mkldnn support for _backward_reshape

* ONNX export: Add Flatten before Gemm (#13356)

* Add Flatten before Gemm

* ONNX export test: Allow multiple inputs in forward pass

* ONNX export: Test for fully connected

* [MXNET-1164] Generate the document for cpp-package using Doxygen (#12977)

* Adding cpp-package directory to the Doxyfile. Updating the index.md file in c++ api directory.

* Updating the link to classes in C++ API to point to correct html file.

* Updated the links to use relative paths.

* Removed the extra slash character in the url

* Excluded the 3rdparty folder as per the review comment.

* Update git clone location to apache github (#13706)

* Add timeout/retry logic to docker cache download (#13573)

* Added timeout/retry (linear backoff) to docker cache download

* Units changed, as time.sleep takes seconds as argument

* Improved error handling

* Using retry decorator

* Added retry decorator to _login_dockerhub method

* Fixed wrong import

* Fix NDArray ToDLPack Bug (#13698)

* Added javadocs and improved example instructions (#13711)

* Rearrange tests written only for update_on_kvstore = True (#13514)

* Update test_gluon_trainer.py

* Update test_gluon_trainer.py

* test

* Update mshadow to support batch_dot with fp16. (#13716)

* fp16 dot

* update mshadow

* update mshadow

* update mshadow

* Fix the quantization script to support Python2 (#13700)

* fix the quantization script to support python2

* Fix comments, fix similiar issue in imagenet_inference.py

* ONNX test code cleanup (#13553)

* ONNX test code cleanup

* Make tests use the common test case list

* Remove import test_cases

* Make Gluon backend rep common

* Partially enable broadcast tests

* Common function to populate tests

* Make backend common

* test models

* Test nodes

* ONNX export: Test for fully connected

* Edit CI scripts mxnet export test cleanup

* Further cleanup backend tests

* README

* Some corrections

* test case format for test_models

* update social media section (#13705)

* script for installing gpu libraries and build tools (#13646)

* Port of scala infer package to clojure (#13595)

* Port of scala infer package to clojure

* Add inference examples

* Fix project.clj

* Update code for integration tests

* Address comments and add unit tests

* Add specs and simplify interface

* Minor nit

* Update README

* update code owner (#13737)

* AdamW operator (Fixing Weight Decay Regularization in Adam) (#13728)

* tests

* remove optimizer and move op to contrib

* rename parameter

* ONNX import/export: Add missing tests, ONNX export: LogSoftMax (#13654)

* Logsoftmax, missing tests

* Support multiple outputs in Gluon backendrep

* Remove repeated unsqueeze test

* Allow multiple output support

* ONNX test code cleanup - part 2 (#13738)

* Common test caller

* Remove incorrect comment

* Make corrections to CI

* fix ci script

* Update basic_layers.py (#13732)

* ONNX import: Hardmax (#13717)

* ONNX import: Hardmax

* Fix lint errors

* add github link for issue with reshape

* gluon docfix (#13631)

* Fixes for trainer with update_on_kvstore=False (#13721)

* add clarification for param_dict

* more tests for dist kvstore

* more unittests

* fix a bug

* more dist exception test

* revert optimizer list

* fix bug and comment

* fix doc rendering and lint

* add invalid sched test

* fix website

* trigger

* update doc

* Reorder module import orders for dist-kvstore (#13742)

* Reorder module import orders for dist-kvstore

* more code comments

* CMake: Enable installation of cpp-package headers (#13339)

* Allow CMake based installation of cpp-package

* Add installation of missing nnvm headers

* Add documentation as to where public headers will be installed

* disable error checking when building old versions (#13725)

* Integrate MKLDNN Conv1d and support 3d layout (#13530)

* add 3d layout support for MKLDNN Conv and Activation

* fix lint

* code refactor

* add testcase for group1 conv and skip quantization for conv1d

* fix lint

* avoid conv1d quantization

* code refactor and add activation ut

* del todo

* Making MKL-DNN default on MXNet master (#13681)

* mkldnn is default makefile and explicitly turn off for buidls

* add endif

* retrigger

* retrigger

* build mkldnn as static lib

* update makefile to statically build mkldnn

* build static mkldnn

* fix static name

* fix static name

* update static for mac

* rename mkldnn dep in ci

* remove moving mkldnn dynamic lib

* retrigger

* remove commented code

* retrigger

* remove mkldnn dnaymic for unitest

* retrigger

* retrigger

* force static for mkldnn lib

* turn of mkldnn on arm builds

* remove dynamic mkldnn bind

* update jenkins to use only mkldnn

* remove last flag

* turn mkldnn by default on mac

* move mkldnn files for GPU MKLDNN build

* copy lib mxnet in gpu build

* only link windows

* add mkldnn.mk

* try force linking

* retrigger

* retrigger

* remove mkldnn dynanmic check

* use ifndef

* remove test mkldnn install

* fix spacing

* fix index

* remove cp of mkldnn since statically linked

* add libmkldnn.a to list of files to pack

* include mkl_ml

* add mkldnn to pack

* add libiomp to ci pack

* move static libs

* fix typo

* pack mkldnn

* retrigger

* add linux artifacts

* move libmkldnn in gpu cmake build

* move libmkldnn and libiomp5 on gpu workspace

* move linked files

* fix typo

* fix typo

* add artifacts for tensorrt

* move mkldnn lib in scala build

* move mkldnn lib on cpu scala

* create dir for binding

* rename libmkldnn in scala

* move mklml dep in scala builds

* move mkl to another linked folder

* move libmkl to another dir

* add libmklml

* move mkldnn

* move mkldnn on centos

* specify new dynamic path

* retrigger

* remove mkldnn dynamic lib

* remove moving mkldnn artifact

* add ld path

* retrigger

* Revert "remove moving mkldnn artifact"

This reverts commit 16cca196e9e1ad92db74f4e8a01b3b052076d268.

* Revert "remove mkldnn dynamic lib"

This reverts commit d51043622d4ef7fcb95aff6a3e84d91ab71b48c9.

* update makefile

* Revert RPATH change and trigger CI

* correcting use-mkldnn flags for two tests

* mkldnn default on linux for starters

* reverting naming rules of pack_lib

* adding mkldnn=0 flags to centos non mkldnn builds

* adding mkldnn=0 flags to ubuntu gpu non mkldnn builds

* removing mkldnn binary operation for ubuntu gpu cmake non mkldnn build

* removing mkldnn binary operation for centos non-mkldnn unittest

* adding explicit USE_MKLDNN=0 flags for clang builds

* adding explicit USE_MKLDNN=0 flags for cpu ubuntu builds

* removing mkldnn binaries from non mkldnn builds scala gpu

* adding explicit flag mkldnn=0 for tensorrt gpu build

* adding explicit flag mkldnn=0 for ubuntu cmake asan

* adding centos cpu mkldnn tests to CI

* adding CentOS GPU MKLDNN build and unittest

* not keeping mkldnn default for mac os

* setting mkldnn default for x86_64 only

* running docs with mkldnn=0 flag

* removing CentOS CPU Scala MKLDNN test

* setting mkldnn default for x86_64 only

* not making mkldn default on windows

* removing Centos MKLDNN tests from CI

* retrigger

* retrigger

* retrigger

* use relative links; update links (#13741)

* [MXNET-1231] Allow not using Some in the Scala operators (#13619)

* add initial commit

* update image classifier as well

* create Util class make Some conversion

* add test changes

* adress Comments

* fix the spacing problem

* fix generator base

* change name to Option

* fix bug in profiler tutorial when using cpu (#13695)

try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error

* local docs build feature (#13682)

* make ROIAlign support position-sensitive pooling (#13088)

* make ROIAlign support position-sensitive pooling

* add unittest for RoIAlign op

* fix ccplint error

* fix python3 compability for unittest

* change OMP for better performance

* delete blank line to trigger CI

* add shape check when position_sensitive is true

* fix the typo

* typo: shuold -> should

* remove private() clause in omp statement

* add examples and fix the dependency problem (#13620)

* add examples and fix the dependency problem

* add Nightly run and optimized script

* add explanation for the line

* Update Adam optimizer documentation (#13754)

* Less cudaGet/SetDevice calls in Gluon execution (#13764)

* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine

* Scope requests so it's not needed for dev_menu (#13771)

* Fix USE_MKLDNN check in Makefile (#13775)

* fix makefile

* change make/config.mk

* add comments

* retrigger ci

* fix c complier to clang (#13778)

* Fixed mailing list addresses (#13766)

* [MXNET-1255] update hybridize documentation (#13597)

* update hybridize documentation

* address review comments

* improve doc

* address comments

* address comments

* [MXNET-244] Work around likely compiler bug on nested inlines and temporary acces… (#13535)

* Work around likely compiler bug on nested inlines and temporary access to stream

* Don't compile khatri_rao tests if we don't have LAPACK

* Address CR comment

* Use curl to download sample data instead of wget. (#13761)

* fix bipartite match memory corruption (#13727)

* remove attributs clear on TRT nodes for GetOptimizedSymbol (#13703)

* Add CPU test coverage and refine cmake builds (#13338)

* add license (#13793)

* [MXNET-862] Basic maven jenkins pipeline (#13450)

* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel

* Re-organize Scala maven build (#13626)

* Re-organize scala maven build

1. Automatically detect which platform to build for scala.
2. Remove platform dependend submodules
3. Fix cyclic module dependencies
4. Fix scalatype style check
5. Now mvn can be executed in submodule
6. Maven build can be executed from any directory not only in root project
7. Checkin javah header file, and use verify task to detect native API changes
8. Improve incremental build performance
9. Remove unittest and integration-test profile, use proper task instead
10. Delete generated scala file during maven clean.

* Redo maven deploy related tasks.

1. Removed maven release plugin.
2. Make maven build friendly to CI, allow cli override version.
3. Moved gpg signing to deploy stage.
4. Created a separeated deploy module.
5. Updated Makefile to new maven build change.
6. Remove unused nexus-staging-plugin
7. Added nightly and staging profile for CI.

* Support mkldnn for Scala.

* Add extra header file to export for error checking (#13795)

* add extra header file to include

* fix sanity check

* fix sanity

* move c_api_common.h to include folder

* fix build error

* keep c_api_common.h internal

* strip out error handling API into a separate header

* consolidate comment into one paragraph per review

* remove unnecessary include

* fix redirection issues; set default version to master (#13796)

* [MXNET-898] ONNX import/export: Sample_multinomial, ONNX export: GlobalLpPool, LpPool (#13500)

* ONNX import/export: Sample_multinomial

* ONNX export: GlobalLpPool, LpPool

* Handle default p_value

* Add tests for multinomial, lppool, globallppool

* add a comment about shape test

* whitelist symbols for using MXNet error handling externally (#13812)

* fix for params with no dims in onnx (#13413)

* fix for params with no dims

* fix

* fix

* retrigger build

* test added

* retrigger CI

* retrigger ci

* Remove semicolon in libmxnet.sym file (#13822)

* Remove semicolon in libmxnet.sym file

* empty commit to trigger CI

*  Clojure example for fixed label-width captcha recognition  (#13769)

* Clojure example for fixed label-width captcha recognition

* Update evaluation

* Better training and inference (w/ cleanup)

* Captcha generation for testing

* Make simple test work

* Add test and update README

* Add missing consts file

* Follow comments

* Update LICENSE File with subcomponents (#13808)

* Update LICENSE File with subcomponents

* Fix JavaScript licenses

* Dockerfiles for Publish Testing (#13707)

* Add new Maven build for Scala package (#13819)

* clean up build

* fix minor issue and add mkldnn

* fix mx_dist problem

* fix clojure build

* fix skip test

* ONNX ops: norm exported and lpnormalization imported (#13806)

* ReduceL1, l2 export, lpnormalization import added

* fix

* fix

* fix

* fix

* remove useless code (#13777)

* Fixing a symlink issue with R install (#13708)

* fix minor indentation (#13827)

* [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (#13676)

* ONNX export: Random uniform, Random normal

* ONNX export: MaxRoiPool

* tests for maxroipool, randomnormal, randomuniform

* onnx export ops (#13821)

* onnx export ops

* retrigger ci

* retrigger ci

* fix

* [MXNET-1260] Float64 DType computation support in Scala/Java (#13678)

* Added Float64 as a supported datatype in NDArray

* Added unit tests for Float64 in NDArray

* Fix for failing Clojure unit tests

* Added Float and Double as MX_PRIMITIVES for computation in Scala

* Trying out second approach --> Private Impl methods with generic signature, and public methods calling the Impls

* Fixed errors in *= method

* Added Float64 in IO.scala and DataIter.scala

* Added another testcase for IO.DataDesc creation

* Fixed failing CI

* Added Float64 in Predictor class

* Added Float64 in Classifier class

* Added Double as a possible return type to : classifyWithNDArray

* Added unit tests for Classifier and Predictor.scala classes for Float64/Double

* Approach 3 --> Using a trait to mirror Float and Double in Scala

* Added comments on MX_PRIMITIVES.scala

* Added Float64/Double support for inference in ImageClassifier APIs

* Added unary- and compareTo in MX_NUMBER_LIKE

* Renamed MX_NUMBER_LIKE to MX_PRIMITIVE_TYPE

* Fixed linting issue

* Now specifying dType from the available data in copyTo and MXDataIter.scala for creating a new DataIterator

* Add primitives support handling to the generator for proper conversion

* Reduced code duplication in classify method in Classifier.scala

* Fix infer package for new signatures and address some bugs

* Removed code duplication in getPixelsArray

* remove debugging

* Changed classifyWithNDArray method in Classifier.scala

* Removed code duplication in predictImpl

* Satisfying lint god _/\_

* Fixed failing PredictorSuite test

* Renamed MX_FLOAT to Camel case

* Revert "Renamed MX_FLOAT to Camel case"

This reverts commit 9d7c3ce6f9c4d6ed2c46041a02e23c0f1df8dfe5.

* Added an implicit conversion from int--> float to support int operations in NDArrays. (These ops were already supported in the previous versions)

* Added Float64 as a training option to ImClassification Suite. Also added integration tests for it

* Satisfy Lint God _/\_

* Added Float64 support in Java NDArray

* Added Float64 support in Java's Predictor API

* Added yours truly to the Contributors list

* Added method comments on Predictor.predict with Array[Double] as a possible input

* Added method comments explaining what MX_PRIMITIVE_TYPE is

*  Fixed errors cause by rebasing with master

* Added licences to the files

* [MXNET-1263] Unit Tests for Java Predictor and Object Detector APIs (#13794)

* Added unit tests for Predictor API in Java

* Added unit tests for ObjectDetectorOutput

* Added unit tests for ObjectDetector API in Java

* Addressed PR comments

* Added Maven SureFire plugin to run the Java UTs

* Pom file clean up -- moved surefire plugin to parent pom.xml

* Renamed skipTests to SkipJavaTests

* Fix scala doc build break for v1.3.1 (#13820)

* Fix doc build break for v1.3.1

* ignore errors on v1.3.x during scala docs gen

* Remove MXNET_STORAGE_FALLBACK_LOG_VERBOSE from test_autograd.py (#13830)

* Add Local test stage and option to jump directly to menu item from commandline (#13809)

* Removes unneeded nvidia driver ppa installation (#13814)

* Improve license_header tool by only traversing files under revision c… (#13803)

* Improve license_header tool by only traversing files under revision control

* use HEAD instead of master for CI

* Disabled flaky test (#13758)

* change to compile time (#13835)

* fix Makefile for rpkg (#13590)

* fix Makefile for rpkg

* update R and roxygen2 requirements

* add roxygen requirement

* add roxygen requirement

* [CI] Prevent timeouts when rebuilding containers with docker. (#13818)

* Prevent timeouts when rebuilding containers with docker.
Increase timeout from 120 to 180 for pipelines

* Increase docker cache timeout

* Increase timeout also for docs

* limit parallel builds to 10

* Code modification for  testcases of various network models in directory example (#12498)

* example testcase modified

* rcnn file add

* license add

* license init

* CI test trigger

* rcnn modify give up

* trigger

* modify for better user experience

* change the default parameter to xpu=None

* Update bdk_demo.py

* Update fcn_xs.py

* Update test.py

* Update train.py

* Update bdk_demo.py

* Update bdk_demo.py

* modify review comments

* refine

* modify Readmes according to the changed code.

* finetune READMEs

* re-trigger ci

* re-trigger ci twice

* Add copyrights for third party licenses to license file (#13851)

* Fix Tree Reduction on new instance type p3dn.24xlarge (#13852)

* add fallback for gpu topology detection using CUDA 9.2

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* bring 3rdparty packages to upstream/master

* rebase to master

* Update gpu_topology.h

* [Clojure] package infer tweaks (#13864)

* change object detection prediction to be a map

* change predictions to a map for image-classifiers

* change return types of the classifiers to be a map
- add tests for base classifier and with-ndarray as well

* tweak return types and inputs for predict
- add test for plain predict

* updated infer-classify examples

* adjust the infer/object detections tests

* tweak predictor test

* Feedback from @kedarbellare review

* put scaling back in

* put back predict so it can handle multiple inputs

* restore original functions signatures (remove first)

* Modifying clojure CNN text classification example (#13865)

* Modifying clojure CNN text classification example

* Small fixes

* Another minor fix

* adding tolerance to flaky test (#13850)

* adding tolerance

* retrigger ci

* retrigger ci

* Julia v0.7/1.0 support and drop v0.6 support (#12845)

* Fix cpp examples build on Mac. (#13826)

This is a regression of addning @rpath name to libmxnet.so on Mac,
example executable is not able to find libmxnet.so anymore.
Add @rpath search path to fix this issue.

* Fix launch bounds in spatial transformer (#13188)

* Fix launch bounds in spatial transformer

* Adding explanation in comment.

* Update example scripts classpath. (#13849)

* [MXNET-1177]Adding Scala Demo to be run as a part of Nightly CI (#13823)

* Adding Scala Demo to be run as a part of Nightly CI

* Addressed PR feedback : making a profile to fetch nightly jars only on CI

* Changed name from scalacidemo to scala_ci_demo

* Synchronized the scala-demo and java-demo for nightly CI runs

* Pruned the maven command to simply maven install

* changed running from ./.sh to bash .sh to be consistent

* Add CODEOWNERS for Julia package (#13872)

* fix ssd quantization script error (#13843)

* fix ssd quantization script error

* update readme for ssd

* move quantized SSD instructions from quantization/README.md to ssd/README.md

* update ssd readme and accuracy

* update readme for SSD-vGG16

* Fix permissions of ci/docker/install/ubuntu_publish.sh (#13840)

* Avoid adding SegfaultLogger if process already has sig handler. (#13842)

In current implemenation, we override signal handler regardless if MXNET_USE_SIGNAL_HANDLER=1.
This breaks caller process behavior and cause process exit unexpectedly.
The example use case is libmxnet.so is loadded into java process via JNI or JNA. JVM will crash
due to SegfaultLogger.

In this PR, we will not register SegfaultLogger if there is a signal handler registered.

* fix the fetching GPU problem (#13889)

* Fix SN-GAN example doc (#13877)

* fix the wrong argument

* fix broken link

* update Spectral Normalization Code (#13868)

* update sn_code

* update sn_code

* Temporarily disable website testing (#13887)

* Fixed java benchmark failing error by fixing the classpath (#13891)

* Jenkins nightly maven with static build script and gpu (#13767)

* Added logging to GitHub commit status publishing (#13615)

* Add a test for SGLD optimizer with comparisons for set noise seeds. (#13762)

* [MXNET-703] Update to TensorRT 5, ONNX IR 3. Fix inference bugs. (#13310)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
https://github.com/clab/dynet/issues/1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version

* Fix the order of error term's operands (#13745)

* fix the order of error term's operands

* address comments

* Add mkldnn OP for slice (#13730)

* add mkldnn slice

* fix lint

* fix lint

* mv SliceEx to matrix_op.cc

* fix lint

* optimize dispatch_mode

* retrigger ci

* fix indent

* fix bug in nag optimizer (#13683)

* fix bug in nag optimizer

```
grad += wd * weight
mom[:] += grad
grad[:] += self.momentum * mom
weight[:] += -lr * grad
```
This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight   weight = weight - (lr * (grad + momentum * state)) ` only minus once.

* fix bug in nag test

fix bug in nag test

* rewrite nag test

* rewrite nag

* fix nag with in-place operations

* fix nag with in-place operations

*  #13813 examples with opencv4/origami (#13813)

* Fix BatchNorm converter for CoreML when fix_gamma=True (#13557)

* beta doc fixes (#13860)

* Update profiler doc (#13901)

* Update c_api_profile.cc

* Update c_api_profile.cc

* Fix for test always returning true (#13911)

* Add error checking for cpp examples. (#13828)

* add ccache to docs build (#13832)

* Java install info update (#13912)

* updated java dependency

* update to duplicated java cpu

* java gpu update

* Updated java dependency version information

* Static build instruction for MXNet in general (#13914)

* update scripts and tutorial

* add the static test for scala package

* kill publish test

* fix build issue

* address comments

* julia: fix `argmax` for NDArray (#13871)

- fix 0-based index output to 1-based index

close #13786

* Support populating errors back to MXNet engine in callback (#13922)

* add an optional error_msg in engine on_complete callbcak

* use dmlc::Error struct to make error population extendable

* Fix document build (#13927)

* fix doc build

* Revert "Temporarily disable website testing (#13887)"

This reverts commit 9d4281271c871a938f1ac4ee55b218872031963d.

* test_ImageRecordIter_seed_augmentation flaky test fix (#12485)

* Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG before each augmentation to guarantee reproducibilit

* Update image record iterator tests to check the whole iterator not only first image

* Version switching user experience improvements (#13921)

* fix version switching for anchors and search

* improved redirects

* fix bug for dev previews; remove hardcoded protocol

* Julia: fix filename quoting in docstring (#13894)

Quoting filename with backticks to prevent
markdown mis-rendering some of them with underscore.

* disable default MKLDNN for cross compilation (#13893)

* disable default MKLDNN for cross compilation

* adding temporary debug logs

* Julia: deprecate `mx.empty`, replace it with `UndefInitializer` (#13934)

In Julia 0.7+, constructing a uninitialized array is provided via
the APIs:
        - `Array{T,N}(undef, dims...)`
        - `Array{T,N}(undef, dims)`
        - `Array{T}(undef,   dims...)`
        - `Array{T}(undef,   dims)`

There is an API `mx.empty(dims...)` serving for this purpose.

This PR proposes that deprecating the original API `mx.empty` and
provide the functionality with the API design similar to Julia's Base.

        - `NDArray{T,N}(undef, dims...)`
        - `NDArray{T,N}(undef, dims)`
        - `NDArray{T}(undef,   dims...)`
        - `NDArray{T}(undef,   dims)`
        - `NDArray(undef,      dims...)`
        - `NDArray(undef,      dims)`

e.g.

```julia
julia> NDArray{Int,2}(undef, 5, 2)
5×2 NDArray{Int64,2} @ CPU0:
 94290755905104  94290752678143
 94290752660544     68719476760
 94290752674408  94290737734368
 94290752660544              18
 94290752674408              18

julia> NDArray(undef, 5, 2)  # default type is `mx.MX_float`
5×2 NDArray{Float32,2} @ CPU0:
 -29112.406f0       5.2029858f-8
      3.0763f-41    6.7375383f-10
      1.7613131f19  0.0f0
      4.840456f30   0.0f0
      4.4262863f30  0.0f0
```

- The original `mx.empty` APIs are still functional.
  If user invokes them, a deprecation warning will be popped up.

* Runtime feature detection (#13549)

* Prototype for runtime feature detection

* Includes from diamond to quotes

* Add CPU feature and BLAS flavour flags

* Add BLAS flavour and CPU SSE and AVX flags

* MXNET_USE_LAPACK

* Fix C++ linting errors

* Expose runtime feature detection in the public C API and in the Python API

* Refactor Storage -> FeatureSet

* Refine documentation

* Add failure case

* Fix pylint

* Address CR comments

* Reduce verbosity of container builds (wget output) (#13888)

* Add back R tests and fix typo around R and perl tests (#13940)

* Add back R tests and fix typo around R and perl tests

* Fix permissions

* Fix copy&paste mistake around roxygen and remove previous permission override

* fix doc of take operator (#13947)

* #13624 clojure nightly tests (#13624)

* Add erfinv operator for calculating inverse error function (#13811)

* add default behaviour for argmax

* prototype of erfvin

* add test

* gpu support

* Revert "add default behaviour for argmax"

This reverts commit 64e9f1a9e3c9cabf312b8d80b3520b22da31c0b6.

* move erfinv to contrib

* edit copyright

* remove atof

* use std and update license

* add license exclude file

* fix per eric's comments

* change license header

* Update project.clj file to use the snapshots repo to be able to pull (#13935)

nightly Scala jar - also update readme

* Julia: add windows-cpu build (#13937)

- Julia v0.7
- Julia v1.0

* split_v2 (#13687)

* Update autoencoder example (#12933)

* Fixing the autoencoder example

* adding pointer to VAE

* fix typos

* Update README.md

* Updating notebook

* Update after comments

* Update README.md

* Update README.md

* Retrigger build

* Updates after review

* Static build for Python (#13916)

* add python unit test

* address comments

* switch sanity test to Gluon module test

* We don't run tests (╯‵□′)╯︵┻━┻

* add variant in the environment variable

* add document improvement

* kill the conflict

* Flaky maven binary download (#13974)

* Aggregate SGD (#13346)

* Aggregate SGD

* Make OpWrapperGenerator understand Tuple<float>

* Trigger

* Add NNVM Tuple to cpp-package op.h

* Trigger

* Fix pylint aggregate SGD

* Update info about new ENV vars and modifying 2 tests that require
update_on_kvstore to be true

* Fix

* Aggregate SGD support for Gluon trainer

* Added text to doc about aggregate update in SGD optimizer

* Docs changes from review

* Gradient multiplier (contrib) operator (#13632)

* Added the gradient reversal contrib operator

Missing test for backwards pass

* Fixed linting errors

* Fixed forward test

* Added random forward / backward test for gradient reversal

* Update test_contrib_operator.py

* Fixed typo in gradient reversal op description

* Replace forward code with the identitiy implementation

* Fixed typos in function docs

* Changed default behavior to identity

* Replaced backward code with scalar_mul

* Fixed backward operator and unit test

* Renamed operator to gradient multiplier

* Update test_contrib_operator.py

Retrigger flaky test

* Update gradient_multiplier_op.cc

Improved the description of the scalar multiplier

* Update README.md (#13973)

* Fixing the doc for symbolic version of rand_zipfian (#13978)

* Fixes #12779

* Gluon end to end tutorial (#13411)

* initial draft gluon tutorial

* add reference

* add cpp inference

* improve wording

* address pr comments

* add util functions on dataset

* move util file

* update link

* fix typo, add test

* allow download

* update wording

* update links

* address comments

* use lr scheduler with optimizer

* separate into 2 tutorials

* add c++ tutorial to test whitelist

* [MXNET-1293] Adding Iterables instead of List to method signature for infer APIs in Java (#13977)

* Added Iterables as input type instead of List in Predictor for Java

* Added Iterables to ObjectDetector API

* Added tests for Predictor API

* Added tests for ObjectDetector

* Use CPUPinned context in ImageRecordIOParser2 (#13980)

* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error

* Added optional parameters to BilinearResize2D to do relative scaling (#13985)

* Added optional parameters to BilinearResize2D to do relative scaling

* Removed unnecessary params in unit tests.

* Fixed deprecated casting style

* [MXNET-1301] Remove the unnecessary WaitAll statements from inception_inference example (#13972)

* Removed the unnecessary WaitAll statements

* Removed the WaitAll() calls wherever they are not necessary.

* [MXNET-1000] get Ndarray real value and form it from a NDArray (#12690)

* add visualize

* adding Any type input to form NDArray

* fix bug and add tests

* add a toString method

* add Visualize Util and migrate visualize structure to there

* update with tests

* refactor code

* fix the minor issue

* add multiple types support

* add changes on names and tests

* make code elegant and improve readability

* api change (#13903)

* ONNX export: Add Crop, Deconvolution and fix the default stride of Pooling to 1 (#12399)

* Added Deconvolution and Crop to ONNX exporter

* Added default for pool_type

* Sample python bilinear initializer at integral points in y-direction (#12983)

* Sample python bilinear initializer at integral points in y-direction

* Add unit test for bilinear initializer

* [MXNET-703] Minor refactor of TensorRT code (#13311)

* Python BucketingModule bind() with grad_req = 'add' (#13984)

* remember grad_req from bind and apply it to sub-modules

* unit-test for gradient accumulation with bucketing modules

* MXNET-1295 Adding integer index support to Sequence* family of operators. (#13880)

* Adding integer index support to Sequence* family of operators.

Adding ability to use int32 arrays, or any castable-to-int type, as
the sequence_length array to SequenceMask, SequenceLast, and
SequenceReverse. Previously these operaters all requred sequence_length
to be the same data type as the input array.

See MxNet Jira ticket here:
  https://issues.apache.org/jira/browse/MXNET-1295

See also GitHub issues here:
   https://github.com/apache/incubator-mxnet/issues/12649
   https://github.com/dmlc/gluon-nlp/issues/346

* Adding explicit braces to an if statement to fix g++ warning

* fixing sequence_mask.cu by adding IType to template

* Fixing whitespace errors reported by linter

* Adding unit tests

* Fixing length of lines to pass linter

* Disabled flaky test test_negative_binomial_generator (#13784)

* Fix website error pages (#13963)

* fix error redirect

* add error artifacts for local build

* build docs with CPP package (#13983)

* Update scala-package gitignore configuration. (#13962)

* [MXNET-1232] fix demo and add Eclipse support (#13979)

* fix demo and add Eclipse support

* fix on docs

* fix typo

* Update docs/install/java_setup.md

Co-Authored-By: lanking520 <lanking520@live.com>

* add fixes in docs

* fix compile error in debug mode (#13873)

the latest BufferEntry do not contain ctx function and results in compile errors.
inside of BufferEntry is an object of NDArray, that is the expected data.

* Image normalize operator - GPU support, 3D/4D inputs (#13802)

* CPU version of normalize operator is working and unit test added

* Add GPU implementation and tests

* Working GPU normalize transforms

* Add default values, fix imports, fix documentation

* Add backward implmentation for image normalize

* Add tests for backward pass

* Move back operators to its original files

* Add review comments

* Add 4D example

* Make infer type generic

* Fix inline function build error

* make functions as inline to avoid multiple definition conflict across cc and cu

* Fix build errors

* Fix failing GPU tests

* remove debug; add support for v1.4.x docs; fix publish bug (#14015)

*  Return value docs for nd.random.* and sym.random.* (#13994)

* mx.random.multinomial python documentation updated, return type details added

* multinomial documentation clarified

* added basic case for negative_binomial

* added basic case for generalized_negative_binomial

* basic case added for gamma

* added basic case for exponential

* basic case added for randn

* remaining base cases added.

* randint case added

* cleaned up return types for random.py

* zboldyga added to contributors

* spacing typo correction

* updated symbol.random return types, minor correction to ndarray.random return types

* removed trailing whitespace in docs

* Julia: split ndarray.jl into several snippets (#14001)

- `ndarray/type.jl`
- `ndarray/context.jl`
- `ndarray/show.jl`
- `ndarray/remap.jl`
- `ndarray/array.jl`
- `ndarray/arithmetic.jl`
- `ndarray/comparison.jl`
- `ndarray/io.jl`
- `ndarray/reduction.jl`
- `ndarray/statistic.jl`
- `ndarray/linalg.jl`
- `ndarray/trig.jl`
- `ndarray/activation.jl`
- `ndarray/autoimport.jl`

* float32 -> float16 cast consistency across implementations (#13857)

* Added test showing float32->float16 discrepancy when mshadow float2half() is used.

* Temp update mshadow submodule SHA to point to PR368 (b211cb7).

* Temp switch to url = https://github.com/DickJC123/mshadow.git

* Updata mshadow submodule SHA.

* Improve code style per reviewer comments.

* Move back to dmlc/mshadow.git, now with float->half rounding.

* Expand test_operator.py:test_cast_float32_to_float16 to test np.nan.

* Improve bulking in Gluon (#13890)

* Improve bulking in Gluon

* Trigger CI

* Fix MXNet R package build (#13952)

* fix mxnet r package build

* add ci

* remove mkldnn-gpu test for R

* add minimal test for MKLDNN-R

* pick mlp as minimal R test

* Fix inconsistent handling for FResourceRequestEx for imperative and symbolic executor (#14007)

* Update op_attr_types.h

* Update attach_op_resource_pass.cc

* [MXNET-1180] Java Image API (#13807)

* add java example

* add test and change PredictorExample

* add image change

* Add minor fixes

* add License

* add predictor Example tests

* fix the issue with JUnit test

* Satisfy Lint God ʕ •ᴥ•ʔ

* update the pom file config

* update documentation

* add simplified methods

* Export resize and support batch size (#14014)

* add image resize operator and unit test

* refactor the resize operator and address lint issues

* address comment and add doc

* assert size is more than 2

* add test case of 4D input

* use ndarray datatype

* add inline to Shape

* add 4D input example

* refactor the duplicate code and separate the resize from image_random

* clean up the code

* add resize implementation

* delete the variable not used

* refactor the code with structure and enum to make code more understandable

* fix the lint

* address comments

* address comment 1. add description 2. refactor unit test and add dtype

* update data type check

* lint

* move the common utitlity to image_utils

* add default value for keep_ratio

* change the operator doc

* update the image utility function

* fix lint

* use Hang implementation to achieve image resize operator GPU

* update the check and doc

* refactor the caffe_gpu_interp2_kernel

* update doc and fix the cpu compile error

* update the comment

* fix lint

* add unit test for gpu

* address comments

* remove the crop and centercop utility function to make the PR clear

* fix the syntax error

* delete the warning

* add unit test with 4D

* fix typo

* add more unit test

* fix unit test

* set atol = 1

* fix missing numpy import

* fix the unit test

* delete test case

* fix unit test missing dependency

* fix error data type

* unify the style and add invalid interp

* update the doc

* add NAG optimizer to r api (#14023)

* Now passing DType of Label downstream to Label's DataDesc object (#14038)

* fix test_stn (#14063)

* re-enable test after issue fixed https://github.com/apache/incubator-mxnet/issues/10973 (#14032)

* Remove all usages of makefile for scala (#14013)

* Remove all usages of makefile for scala

* Unify making folders for scala/java setup

* Fix mxdoc path

* Add batch mode to calls

* fix nightly test on tutorials (#14036)

* fix nightly test

* fix typo

* trigger ci

* update the scala installation tutorial on intellij (#14033)

* update the scala installation tutorial on intellij

* update the so answer

* update the so answer

* Image ToTensor operator - GPU support, 3D/4D inputs (#13837)

* Add CPU implementation of ToTensor

* Add tests for cpu

* Add gpu implementation and tests

* Fix lint issues

* Cleanup includes

* Move back changes to original image operators files

* Add 4D example

* resolve merge conflicts

* Fix failing tests

* parallelize on channel in kernel launch

* rewrote the concat test to avoid flaky failures (#14049)

ran 10000 times with no failures

* Fix website scala doc (#14065)

* Fix doc building

* Remove deplicate in

* [Clojure] Add resource scope to clojure package (#13993)

* Add resource scope to clojure package

* add rat

* fix integration test

* feedback from @benkamphaus
- move from defs to atoms to make the tests a bit better

* adding alias with-do and with-let 
more tests

* another test

* Add examples in docstring

* refactor example and test to use resource-scope/with-let

* fix tests and problem with laziness 
now they work as expected!

* refactor to be a bit more modular

* remove comments

* Update NOTICE (#14043)

* modifying SyncBN doc for FP16 use case (#14041)

LGTM

* add new cloud providers to install page (#14039)

* add new cloud providers

* fix colon

* CUDNN dropout (#13896)

* cudnn dropout

* test dropout as stateful op

* add cudnn_off

* refactor

* fix bug when using inf forward

* turn on cudnn in gluon

* reuse dropout state space

* dropout passthrough

* address comments

* fix test_depthwise_convoltuion for occasional CI failures (#14016)

* keeping same contexts for comparison

* enabling test

* testing default context

* Revert "testing default context"

This reverts commit 1f95d0228178debde14680839bb6abab14c6d049.

* Disabling test due to CI failure on MKL-DNN

* ONNX export: broadcast_to, tile ops (#13981)

* Expand,tile op export

* fix

* adding test cases

* adding comments

* [MXNET-1258]fix unittest for ROIAlign Operator (#13609)

* fix roi align test

* retrigger unittest

* add more test detail for ROIAlign test

* remove url in test_op_roi_align

* remove blank line in test_op_roi_align in test_operator

* merge master

* Update test_operator.py

* retrigger CI

* Fix performance regression in normalize operator (#14055)

* parallelize on channel forward pass

* parallelize on channel normalize backward pass

* Fix lint issues

* Trying to fix CI build failure on GPU

* Fix failing GPU test on CI Do not pass normalize param as is to GPU kernel

* Fix to_tensor tests

* Pass mean and std_dev as native types for kernel

* Fix CI failure. Do not pass mean, std as vector to kernel

* Add maven wraper to scala project. (#13702)

* Increase perfomance of BulkAppend and BulkFlush (#14067)

* Better bulkappend

* Fix lint

* [MXNET-1178] updating scala docs (#14070)

* updating scala docs

* Addressed PR feedback

* update the version name (#14076)

* [MXNET-1121] Example to demonstrate the inference workflow using RNN (#13680)

* [MXNET-1121] Example to demonstrate the inference workflow using RNN

* Addressed the review comments. Updated the ReadMe files.

* Removed the unnecessary creation of NDArray

* Added the unit tests to nightly tests to catch the failure.

* Updated the makefiles and unit tests so that the examples are built and tested in nightly

* Added the visual representation of the model and fixed the CI failure.

* Added the missing pdf file.

* Fixing the broken ci_test.sh

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/README.md

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Update cpp-package/example/inference/simple_rnn.cpp

Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com>

* Applying unresolved changes to README file.

* Fixing the CI build failure.

* Updated the RNN example from sequence generation to sentiment analysis

* Updated the readme files. Updated the example to use trained model and updated the unit test.

* Addressed the review comment to increase the default sequence length. Added the examples with inputs of various lengths.

* Updated the example to handle variable length input. Updated the readme and unit test files accordingly.

* Updated the example to share the memory between executors by createing shared executors.

* Updated the creation of executors from largest to smallest bucket key

* Creating the executor for the highest bucket key.

* Updated the unit test to check for the results in a range and modified the function name to be consistent with others.

* Fixed the logic to find the right bucket.

* hybridize rnn and add model graph (#13244)

* hybridize rnn and add model graph

* trigger CI

* separate mxboard visualization

* add options and she-bang

* add defaults

* trigger CI

* rename export-model

* Exclude concat layer  for gpu quantization (#14060)

* exclude concat for gpu quantization

* remove quantized_concat test in non-subgraph flow

* Remove inplace support for ToTensor operator (#14083)

* Remove stale check for op req type

* Do not register to tensor operator with in place option.

* [MKLDNN] Enable signed int8 support for convolution. (#13697)

* Enable s8s8 support for MKLDNN convolution.

* Fix cpp build

* Fix build.

* Fix build

* Remove openmp min/max reduction for windows build

* Add mkldnn_OIhw4i16o4i_s8s8 support

* Add all s8s8 weight format

* Change ssd quantize script.

* Update

* Manually cast mshadow shape size to size_t

* Fix merge.

* Fix perl package.

* Retrigger CI

* Fix GPU test

* Fix GPU test

* Rerun CI

* Rerun CI

* Rerun CI

* Rerun CI

* Remove weight_channelwise_scale from params.

* Fix

* Keep API compatible.

* Rerun CI

* Rerun CI

* Rerun CI

* Rerun CI

* Address comments.

* fix.

* Address debug build.

* Add comment for next_impl

* Rerun ci

* Add new api MXExecutorSetMonitorCallbackEX

* Add default value for monitor_all for cpp header.

* Rerun CI

* fix

* script change for uint8.

* trigger ci

* trigger ci

* [MXNET-1291] solve pylint errors in examples with issue no.12205 (#13815)

* Unify the style here

Unify the style here and remove the testing 'print' code segment.

* Unify the description of comment

Change the description of comment from "multi-layer perceptron" to "Get multi-layer perceptron"

* Unify the style of comments

Unify the style of comments suggested by @sandeep-krishnamurthy

* git pull the lastest code from master of incubator-mxnet

* Complete rebase

* Solve PEP8 [C0304 ] Final newline missing

Sovle example/deep-embedded-clustering/solver.py(150): [C0304 ] Final newline missing

* fix merge issue

* skip output_names unittest for mxnet-ngraph
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this issue Feb 16, 2019
…che#13310)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this issue Feb 18, 2019
…ugs. (apache#13897)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
takanokage added a commit to takanokage/knn-bench that referenced this issue Feb 28, 2019
fixes: CUDA_cublas_device_LIBRARY NOTFOUND.
clab/dynet#1457.
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this issue Apr 26, 2019
…ugs. (apache#13897)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this issue Jun 23, 2019
…che#13310)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
@bravegag
Copy link

I also got the problem when I compiled caffe with the latest CUDA 10.0, and after upgrading CMake from 3.12.1 to 3.12.2 it's done @harumo11

This solved the issue for me too! thank you!

@suryadesu
Copy link

suryadesu commented Dec 11, 2020

Hi,
I have been facing similar error while installing Torch on my server.
Cuda version - 11.0
CMake version - 3.18.2
Ubuntu version - 20.04
I see the above solutions work for CMake version >=3.12.2
Can someone please help with this.
Thanks

-- Found Torch7 in /home/cs17btech11048/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 6.0 6.0
-- got cuda version 11.0
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_60,code=sm_60;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
    linked by target "THC" in directory /home/cs17btech11048/torch/extra/cutorch/lib/THC

-- Generating done
CMake Generate step failed.  Build files cannot be regenerated correctly.

Error: Build error: Failed building.

@b4zz4
Copy link

b4zz4 commented Jul 6, 2021

I have the same problem.
with cuda 11.3

@youyuxiansen
Copy link

find the ${CUDA_CUBLAS_LIBRARIES} and comment it in your CMakeList.txt

@hsen-dev
Copy link

hsen-dev commented Nov 8, 2021

solution: https://codeyarns.com/tech/2019-03-20-caffe-cuda_cublas_device_library-error.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants