System setup for deep learning on Ubuntu 16.04.3 with single NVIDIA Geforce GTX 1080.
Installing NVIDIA graphics driver is usually a headache! However you could just skip it! (not always)
Actually you don't need to install a NVIDIA graphics driver as CUDA Toolkit comes with pre-packaged GPU drivers. The NVIDIA graphics card driver will be installed together with CUDA as a by-product. However you need to install from the CUDA .run
file and choose whether or not you want to install driver as well.
However, you need to make sure that the CUDA driver bundled with CUDA toolkit does support your graphics card. It is still recommended to install the latest standalone driver separately as the driver bundled with CUDA is usually out-of-date.
There are three ways to install NVIDIA proprietary drivers.
- Download from official NVIDIA website and follow their installation instructions. (risky and may need manual tweeking)
- Using a PPA repository Install Nvidia Drivers from PPA
- (Recommended) Using
Additional Drivers
underSoftware & Updates
. In your Ubuntu press theWin
key and searchAdditional Driver
and select for the recommended NVIDIA proprietary driver from official Ubuntu package repository.
When the driver is loaded, the driver version can be found by executing the command:
$ cat /proc/driver/nvidia/version
Run the NVIDIA system manager interface tool to query the status of all your GPUs:
$ nvidia-smi
or use the following command to watch the results in real-time:
$ sudo watch nvidia-smi
You can check the status of driver modules loaded in the Linux Kernel:
$ lsmod | grep nvidia # there should be output
$ lsmod | grep nouveau # there should be no output
Go to NVIDIA website and download the CUDA Debian Installer (.deb
).
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda
In order to automatically set all the environment variables every time you open your terminal, put the following exports into your ~/.bashrc
file.
export CUDA_HOME=/usr/local/cuda-8.0
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
In order for the above to take effect, source
your ~/.bashrc
or re-open your terminal.
$ source ~/.bashrc
The version of the CUDA Toolkit can be checked by running:
$ nvcc -V
Verify CUDA installation by running the examples.
$ cuda-install-samples-8.0.sh ~
$ cd NVIDIA_CUDA-8.0_Samples/
$ make
NOTE: If compiler error cannot find -lnvcuvid
, or skipped incompatible libnvcuvid.so
need to change the settings to be consistent with the nvidia driver you are using (e.g. nvidia-375). A lazy fix would be to run:
find . -type f -execdir sed -i 's/UBUNTU_PKG_NAME = "nvidia-367"/UBUNTU_PKG_NAME = "nvidia-375"/g' '{}' \;
Run the examples:
$ cd bin/x86_64/linux/release/
$ ./deviceQuery
At the end of printout, you will see:
Result = PASS
Download respective packages from NVIDIA website as below and follow the corresponding instructions. You need to sign up an account with NVIDIA to download the files.
- Install from tarball (recommended)
$ tar -xzvf cudnn-8.0-linux-x64-v7.tgz
Copy the following files into the CUDA Toolkit directory.
$ sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
$ sudo ldconfig
Adding -P
retains the symbolic links, and avoids the error: /sbin/ldconfig.real: /usr/local/cuda/lib64/libcudnn.so.7 is not a symbolic link
.
NOTE: If you install cuDNN from tarball, you need to add the destination directores (e.g. /usr/local/cuda/lib64
) into $LD_LIBRARY_PATH
in order for other packages to look for the libraries. Doing so by adding the following line into your ~/.bashrc
.
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
- Install from .deb
cuDNN v7.0
$ sudo dpkg -i libcudnn7_7.0.1.13-1+cuda8.0_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.0.1.13-1+cuda8.0_amd64.deb
$ sudo dpkg -i libcudnn7-doc_7.0.1.13-1+cuda8.0_amd64.deb
cuDNN v6.0
$ sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb
$ sudo dpkg -i libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb
$ sudo dpkg -i libcudnn6-doc_6.0.21-1+cuda8.0_amd64.deb
- Verify cuDNN
To verify that cuDNN is installed and is running properly, compile the mnistCUDNN sample located in the /usr/src/cudnn_samples_v7
directory in the debian file. (Note: The samples codes are a separate .tgz download or installed with cuDNN deb file)
Copy the cuDNN sample to a writable path:
$ cp -r /usr/src/cudnn_samples_v7/ $HOME
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
$ make clean && make
$ ./mnistCUDNN
If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:
Test passed!
- General dependencies
$ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
$ sudo apt-get install --no-install-recommends libboost-all-dev
$ sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
- CUDA
Refer to Install CUDA toolkit. Note: CUDA 8 is required on Ubuntu 16.04.
- BLAS
Choose one of the followings
$ sudo apt-get install libatlas-base-dev
or
$ sudo apt-get install libopenblas-dev
- Python
Install pip
, Python development package, and numpy
$ sudo apt-get install python-pip python-dev python-numpy
and install the following dependency packages to avoid ImportError: No module named skimage.io
and google.protobuf.internal
$ pip install scikit-image
$ pip install protobuf
- Complile Caffe using CMake (Recommended)
If you don't have git
and CMake
in your system, install them first
$ sudo apt install git cmake
Download Caffe and compile
$ git clone https://github.com/BVLC/caffe.git
$ cd caffe
$ mkdir build && cd build
$ cmake -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
$ make all -j8
$ make runtest -j8
NOTE: CUDA_USE_STATIC_CUDA_RUNTIME
(Default ON
)
-- When enabled the static version of the CUDA runtime library will be used
in CUDA_LIBRARIES
.
On 16.04, aarch64
has issues with a static cuda runtime. So we need to disable CUDA_USE_STATIC_CUDA_RUNTIME
.
In order for python to find caffe
module, you need to set the env variable PYTHONPATH
by adding the following line into your ~/.bashrc
. Otherwise, ImportError: No module named caffe
.
export PYTHONPATH=$HOME/caffe/python:$PYTHONPATH
- Compile Caffe using Makefile.config (make)
If you are using Makefile.config
and make
, you need to add the hdf5 include directory
$ cp Makefile.config.example Makefile.config
$ echo "INCLUDE_DIRS += /usr/include/hdf5/serial/" >> Makefile.config
$ echo "LIBRARY_DIRS += /usr/lib/x86_64-linux-gnu/hdf5/serial/" >> Makefile.config
and uncomment the USE_CUDNN := 1
to build with cuDNN acceleration, choose which BLAS
you are using either atlas
or openblas
, etc.
$ make all -j8
$ make test -j8
# (Optional)
$ make runtest -j8
- Test AlexNet
In /build
, run
$ tools/caffe time --model=../models/bvlc_alexnet/deploy.prototxt --gpu=0
Please refer to the complete and up-to-date install instructions from TensorFlow website.
- Install dependencies
$ sudo apt-get install libcupti-dev
This requires installing libcudnn7
, libcudnn7-dev
, and libcudnn7-doc
in certain directories. The headers must be located at:
/usr/include/cudnn.h
and libraries in:
/usr/lib/x86_64-linux-gnu/libcudnn_static.a
/usr/lib/x86_64-linux-gnu/libcudnn.so.7
/usr/lib/x86_64-linux-gnu/libcudnn.so
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
/usr/lib/x86_64-linux-gnu/libcudnn.so.6
/usr/lib/x86_64-linux-gnu/libcudnn_static_v6.a
/usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21
/usr/lib/x86_64-linux-gnu/libcudnn_static.a
Refer to section Install cuDNN and Install from .deb will setup everything.
- Install TensorFlow via virtualenv for Python 2.7
Install and activate virtualenv:
$ sudo apt-get install python-pip python-dev python-virtualenv
$ virtualenv --system-site-packages ~/tensorflow
Activate TensorFlow virtualenv environment:
$ source ~/tensorflow/bin/activate
NOTE: Note that you must activate the virtualenv environment each time you use TensorFlow. Using the following command to deactivate TensorFlow environment when you are done with it.
(tensorflow)$ deactivate
Install TensorFlow with GPU support in the virtual environment:
(tensorflow)$ pip install --upgrade tensorflow-gpu
- Verify the installation Activate the virtualenv and run the following python code:
(tensorflow)$ python
(tensorflow)# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
The code prints "Hello, TensorFlow!"
- Uninstall TensorFlow
$ rm -r ~/tensorflow
- Common errors
ImportError: libcudnn.Version: cannot open shared object file:
No such file or directory
Install the exact Version of libcudnn. TensorFlow may not support the latest versions of cuDNN.