EXPLORING LEARNING STRATEGIES FOR TRAINING DEEP NEURAL NETWORKS USING MULTIPLE GPUS
Please compile Torch7 by using following steps:
- git clone https://github.com/torch/distro.git /home/Tools/torch_cuda-7.5 --recursive
- Modify path_to_nvcc=/usr/local/cuda-7.5/bin/nvcc in the file /home/Tools/torch_cuda-7.5/install.sh
- Make sure the path of /usr/local/cuda-7.5 in the file ~/.bashrc Then running
- cd /home/Tools/torch_cuda-7.5 ; ./clean.sh
- rm -rf ./install
- remove the torch-activate entry from your shell start-up script (~/.bashrc or ~/.profile)
- bash install-deps
- ./install.sh
- ./test.sh
- set LD_LIBRARY_PATH & PATH export PATH="/home/Tools/torch_cuda-7.5:/home/Tools/torch_cuda-7.5/bin:/home/Tools/torch_cuda-7.5/install:/home/Tools/torch_cuda-7.5/install/bin:/home/Tools/torch_cuda-7.5/install/share/lua/5.1:$PATH" export LD_LIBRARY_PATH="/home/Tools/torch_cuda-7.5/lib:/home/Tools/torch_cuda-7.5/install/lib/lua/5.1:/home/Tools/torch_cuda-7.5/install/lib:$LD_LIBRARY_PATH" . /home/Tools/torch_cuda-7.5/install/bin/torch-activate
Installation Reference: http://torch.ch/docs/getting-started.html and https://github.com/torch/distro
Please install the related Twitter packages at Distributed learning in Torch (https://blog.twitter.com/2016/distributed-learning-in-torch) before running. First we git clone packages
- git clone https://github.com/twitter/torch-distlearn
- git clone https://github.com/twitter/torch-dataset
- git clone https://github.com/twitter/torch-thrift
- git clone https://github.com/twitter/torch-autograd
- git clone https://github.com/twitter/torch-ipc
Then, we can go to each folder and run commands
- luarocks install autograd
- luarocks install thrift
- luarocks install dataset
- luarocks install ipc
- luarocks install distlearn
We can test run.sh and speech.lua by modifying input and ouput.
- Before running the distributed learning, please make sure turn ACS off. Please run lspci -vvv and make sure you get "ACSCtl: SrcValid-" instead of "ACSCtl: SrcValid+" for PLX PCI-e switch. There are some information about GPU communications:
- If you get "ACSCtl: SrcValid+" for the PCI bridge: PLX Technology, run "setpci -s bus#:slot#.func# f2a.w=0000" to disable ACSCtl on the PLX switch. Please run 3 steps:
lspci | grep -i plx , …check bus#:slot#.func#
sudo lspci -s 03:08.0 -vvvv | grep -i acs , …check ACSCtl: SrcValid+
sudo setpci -s 03:08.0 f2a.w=0000 , …make ACSCtl: SrcValid-
- We can check the setting of GPU cards and their topo matrix using the command of "nvidia-smi topo --matrix".
- The activation function of Relu is better than Tanh. But, ReLu may fall into 0% accuracy with the unsuitable learning rate. There is no such problem when using Tanh.
- We may use the command of "nvidia-smi --loop=10 > nividia.log" to reduce the happening of "Segmentation fault" in torch-distlearn.
- Microsoft: F. Seide et al., "1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs," Interspeech 2014.
- Amazon: http://www.nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf
- Dougal Maclaurin, David Duvenaud, Matt Johnson, "Autograd: Reverse-mode differentiation of native Python"
- Twitter: https://blog.twitter.com/2016/distributed-learning-in-torch
- Yu & Deng’s "Automatic Speech Recognition, A Deep Learning Approach"
UPDATE 16th March 2017, by Chien-Lin Huang https://sites.google.com/site/chiccoclhuang/