-
Notifications
You must be signed in to change notification settings - Fork 940
Create_AMI
This tutorial outlines the steps to create TFoS AMI on AWS EC2 using a p2.xlarge instance using Ubuntu Server 16.04.
A pre-built AMI image is available for you to use. See Get Started on EC2.
We launch an Ubuntu Server 16.04 LTS (HVM) AMI with a p2.xlarge instance in Amazon EC2. 16 GB of storage on the root partition is required.
- Go to https://us-west-2.console.aws.amazon.com/console
- Select EC2
- Request Spot Requests
- Specify an AMI
- Specify the spot max price
- Wait for instance to enter running state
Please follow [AWS instruction](http://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-keypairs.html] to create a keypair. Here is an example command):
export EC2_KEY=ec2_${USER}
export EC2_PEM_FILE=~/.ssh/ec2_${USER}.pem
ec2-create-keypair -O ${AWS_ACCESS_KEY_ID} -W ${AWS_SECRET_ACCESS_KEY} --region us-west-2 ${EC2_KEY}
emacs ${EC2_PEM_FILE}
chmod 600 ${EC2_PEM_FILE}
SSH onto your instance:
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${EC2_PEM_FILE} root@<MASTER>
For GPU instances only, you will need to install the CUDA drivers and the CuDNN libraries.
wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
rm cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda
Downloading cuDNN requires logging into NVIDIA developer site, so we can’t use wget to fetch the files. Download the following files from NVIDIA and upload them to your AWS instance.
sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb
sudo mkdir /usr/lib/x86_64-linux-gnu/include
sudo cp /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/include
TensorFlow now provides pip packages for latest CPU and GPU builds. In most cases, you can just install TensorFlow via one of the following:
sudo apt-get install python-pip
pip install tensorflow # Python 2 CPU
pip3 install tensorflow # Python 3 CPU
pip install tensorflow-gpu # Python 2 GPU
pip3 install tensorflow-gpu # Python 3 GPU
TensorFlowOnSpark also provides a pip package, which you can install via:
pip install tensorflowonspark
python
> import tensorflow as tf
> from tensorflowonspark import TFCluster
If you see no errors, then you should be ready to go. If you encounter any issues, you should check the official installation instructions from TensorFlow for more information.
For some specialized installations, e.g. to enable RDMA/iverbs, you may need to compile TensorFlow from source.
The following instructions are provided here for easy reference, but the definitive instructions are available on the TensorFlow site
Install Build Dependencies:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev
Configure the environment, by adding to the following lines to your ~/.bash_profile file.
export CUDA_ROOT=/usr/local/cuda
export CUDA_HOME=$CUDA_ROOT
export PATH=$PATH:$CUDA_ROOT/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64:$CUDA_ROOT/extras/CUPTI/lib64
export HADOOP_HOME=/root/ephemeral-hdfs
export SPARK_HOME=/root/spark
export PATH=${PATH}:${HADOOP_HOME}/bin:${SPARK_HOME}/bin
Install Java 8:
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get install -y oracle-java8-installer
Install Bazel:
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel
Clone TensorFlow repository and configure your build:
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
./configure
You should access almost all defaults:
- Hadoop File System support? [y/N] y
- CUDA support? [y/N] y
- CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
- Cudnn version you want to use. [Leave empty to use system default]: 5.1.10
- location where cuDNN 5.1.10 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
- compute capability of your device [Default is: "3.5,5.2"]: 3.7
Build TensorFlow (Be patient. It can take several hours):
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Install the package:
sudo pip install /tmp/tensorflow_pkg/tensorflow-*.whl
Test TensorFlow:
cd
python
> import tensorflow as tf
Exit from your terminal:
rm -rf /root/.cache/bazel/
cat /dev/null > ~/.bash_history && history -c && exit
Use Amazon EC2 console to create an AMI image from your instance: Actions -> Image -> Create Image.