First AWS is a commercial service, and they are in the business of charging people money for access. This course relies either on free instances, or on the $100 dollars that Amazon give you (https://aws.amazon.com/education/awseducate/), but be aware that there are consequences of leaving instances turned on for long periods of time. Each type of instance costs money per hour. For example, if you accidentally leave a GPU instance running, you will be charged $1.14 an hour until you stop it.
AWS also offers "spot pricing" where the price fluctuates according to demand - this is great for batch compute, but I won't recommend it for this course as if the price rises your machine may be terminated and you can lose work unless you have good processes in place. If you look at the pricing history it is quite interesting to try to work out what people are doing. On the GPU instances people occasionally spike the price to the on-demand rate, while for others it exceeds the standard rate.
Each instance can be in one of a few states:
-
Pending: once launched, the instance goes through some initial checks and is allocated to a machine.
-
Running: the machine is up and active, and you can SSH in and use it.
-
Stopped: the machine is not currently active, but the state of the hard disk is maintained, and can be run again.
-
Terminated: the machine is destroyed, including the hard disk.
Broadly speaking, you can think of the "running" state as "using electricity", and the "stopped" state as "using up a hard disk". AWS will always charge you for electricity and disk space, but generally charges more for electricity.
You should all be eligible for the one year special tier, which gives you certain advantages:
-
750 hours / month of linux t2.micro server time. So you can leave a micro instance on all the time if you wanted to.
-
30GB of EBS storage. This is where your instance disks are stored, so you can have a few in the stopped state and not be charged.
The t2.micro (or even slightly bigger) instances are great for checking compilation and automation, as you don't need to worry about cost.
Any other instances will eat into your $100, so you need to be a bit careful about managing them. Previously AWS used to round each session up to an hour, but they now do per-minute billing, so you can start and stop an instance and only get charged for the time it is running. However, bear in mind that the startup time is included there, so you should not be starting too many very short-lived expensive instances.
I have no more money to give you if yours runs out, and you need to keep money for the later courseworks, so you need a certain amount of planning in how you spend your money. Don't start working with a GPU or large instance unless you know you'll be able to spend a reasonable amount of time with it, and use cheaper instances, lab/personal machines and VMs to test build automation, compilation, and testing whenever possible. Though if you consider $100 at $1.13 an hour, you're looking at 50+ hours, so there is plenty there.
-
Always check your instances are finished when you stop working with non-free instances. Check it has transitioned to the stopped or terminated state, and refresh the EC2 console to really make sure.
-
Protect your instance key-pairs.
-
Use the cheapest machine you can for the current purpose; testing OpenCL code for correctness doesn't necessarily need a GPU; checking that builds work can often be done in a tiny instance.
-
Plan your work; get everything possible done on a local machine (VM or native, linux or windows) first.
You need to have an Amazon account first (the kind you use to buy books), then go to the AWS site and create an AWS account.
It will ask you for a credit or debit card, but this will not be charged if you only use free instances, and/or stay within the $100 credit you'll get. As I mentioned above, this is real money, but as long as you manage your instances it won't cost you anything.
There is also something called the AWS Educate Starter account, which is another way to get credits. This route does not require a credit-card, but there are limits to what you can do with it: once you use up the Starter credits the account cannot be used. Another important difference is that you are limited to a small number of instance types on the Starter account: "All t2 instance types, m4.large, and m4.xlarge". So you cannot use AWS Starter accounts to access GPUs.
Initially your AWS account will be limited in the types and number of instances that can be launched, and you may need to ask Amazon to approve you for the more expensive instance types. The reason is that they don't want new people to accidentally incur a massive bill, and I think they also worry about people using throwaway accounts to steal huge amounts of compute time. Don't leave playing with AWS until the day before submission, as it may take a couple of days to get authorised for GPUs.
For you AMI type choose "Ubuntu Server 16.04 LTS (HVM)".
Select the free tier (you could choose a more expensive one, but then you need to spend money).
Go to "Next: Configure Instance Details"
You should be able to leave them at the defaults (though it is interesting to look at all the options by hovering over the (i) buttons).
You can leave at the defaults, but again, it is interesting to read. If you ever need to work with big-ish data then these options matter a lot.
We don't need this, but it is useful if you have 20 instances and you need to be able to identify which is which (Err, don't create 20 instances unless you are rich).
This one is quite important. Your server will be alive on the internet, open to the world, so you need to limit access to you. We will use use one port, allowing SSH, though we will allow it to be accessible from anywhere.
-
Select "Create a new security group". (It should be auto-selected, and the defaults listed below should be correct as well).
-
Make sure the "Type" on the left is SSH (Secure Shell Server).
-
Protocol and Port Range will then be fixed to TCP and 22.
-
For Source, specify Anywhere. This is so you can login from wherever your happen to be (so could anyone else, but SSH should stop them).
-
For security group name, choose something meaningful like "ssh-only".
Do Next: a dialogue should pop up saying "Select an existing or new key pair."
First, do not proceed without a key pair. These things are important, as they are the thing that allows you to SSH into your instance.
-
Read the description of key pairs that it shows to you.
-
Select "Create a new key pair".
-
Choose a key pair name. I would suggest putting your name or login in it, for example, I use "jsd06-key-pair"
-
Download the key pair. This thing is important for as long as the instance is running, so keep it somewhere safe. However, you can always generate more key pairs if you lose one. If you are on a shared unix machine, change the permissions so that only you can access it:
chmod og-rwx jsd-key-pair.pem
-
Finish the process, and your instance will launch.
Just re-emphasising the important of key pairs: they are essentially the front-door key to your server.
If you ever accidentally put your key-pair somewhere publically accessible, then you should abandon that key-pair and create a new one. It is possible (and a good idea) to protect your key-pair with a passphrase as well, or import an existing ssh key, but the details start to get more complicated.
Use the "View Instances" button, or just go back to the AWS dashboard (it doesn't matter how you get there).
You should now be able to see an instance running in the dashboard, with a green symbol, and probably a status that says "Initialising". If you click on that row, then the bottom of the dashboard will show you details about it.
The thing we need to connect is the DNS or IP address of the instance - either will work to identify it. For example, I have previously received:
ec2-54-201-95-131.us-west-2.compute.amazonaws.com
as an instance, which is correspondingly at the IP address:
54.201.95.131
To connect to the server, you need to SSH to it.
You can ssh directly from the command line, using:
ssh -i <path-to-your-key-pair> ubuntu@<dns-name-of-your-server>
That should drop you into a command line on the remote server.
There is a great ssh terminal for windows called PuTTY, which I highly recommend. To use it, you need to convert the .pem file to a putty .ppk file:
-
Start PuTTYgen (one of the programs that comes with putty)
-
Conversions -> Import Key.
-
Select your .pem file and open it.
-
At this stage you can choose a passphrase using the "Key passphrase" box, which will be used to encrypt the private key. Personally I prefer to have a passphrase, as otherwise anyone who gets the key can get any of my running instances. However, you can leave it blank, and just protect the key file well.
-
File -> Save Private Key.
-
You may get prompted about an empty passphrase, just ignore if you didn't want one.
-
Choose a .ppk file to save it as.
You can now start PuTTY itself. You might want to set this up once and save it:
-
Session: "Host Name (or IP address)": Put the DNS name of your amazon instance.
-
Connection -> SSH -> Auth: Specify the private .ppk file you just created.
-
Connection -> Data: In "Auto-login username" put "ubuntu".
-
Connection -> "Saved Sessions": Choose some name for this connection, e.g. "AWS", and hit save.
-
Hit "Open"
You should now be dropped into your remote server. If you switch to a new instance you will need to change the host settings, but the rest should stay the same.
By default, your instance has almost nothing on it. Try running g++:
g++ -v
And it will tell you it is not installed, but does suggest how to install it:
sudo apt-get install g++
This involves two commands:
-
sudo A program for running commands as root or adminstrator.
-
apt-get A package manager which handles the installation or removal of tools and libraries.
Similarly, try running git, make, and so on, and you'll find you need to install them too.
TBB is also available as a package, but you need to search for that:
apt-cache search tbb
You should see four or five packages related to tbb, but libtbb-dev should force everything you need in:
sudo apt-get install libtbb-dev
You now have a few options for getting code over to your machine:
-
Copying files over via scp (file transfer via SSH).
-
cat
ing files down the SSH connection (not really recommended, but occasionally very useful). -
Pulling code over via git.
I am going to recommend getting the code via git, as it is a nice way of doing things, and makes it easier to bring any patches you make in the test environment back out to github. The main sticking point is authentication, as your AWS instance will be able to communicate with github, but doesn't have access to your keys.
You can use https
to move code backwards and forwards
over git, but this requires you to type/paste in your password each
time you push or pull. It is simpler in the short term,
but wastes a lot of time long-term. The better solution
is to use SSH, and it is also generally more secure.
You could transfer your SSH keys over and use ssh-agent
remotely, but it is better to keep your keys where you
control them, using a method called SSH agent forwarding.
First, make sure you are currently authenticated with github, by doing:
ssh git@github.com
or the equivalent in PuTTY. If you receive something
like Permission denied (publickey).
, then you haven't
got an agent set up. Start ssh-agent
or pageant
, load
your github SSH keys in (these are distinct from your
AWS keypair), then try again. Hopefully eventually you
will see something like:
Hi jds06! You've successfully authenticated, but GitHub does not provide shell access.
This shows that you successfully have agent
authentication working on your local machine. We
can now use authentication forwarding (-A
) to allow
the remote server to access your local authentication
agent:
ssh -A -i <path-to-your-key-pair> ubuntu@<dns-name-of-your-server>
You should end up on the remote server again, but if you (within the SSH session, on the remote server) do:
ssh git@github.com
You should see that you are authenticated with github on the other machine.
You can now issue git clone command to get your repository remotely, then do commit/push/pull as normal.
If you make modifications on the remote server,
don't forget to "push" any changes back into github,
and then (if necessary) to "pull" changes back down
to your normal working repository. For those who
are not used to the remote git, then the git commit -a
command is useful, as it auto-stages all your changes.
I would not recommend doing much editing on
the remote instances, it should be more for
testing, tuning, and experimentation. But inevitably
you will need to change some source files, and
need some way of editing the files remotely (you
don't want to be pulling for each edit). There
is a command line editor called nano
installed on pretty much all unix machines which
you can use to make small changes. For example,
to edit the file wibble.cpp
, just do:
nano wibble.cpp
You'll end up in an editor which behaves as you
would expect. Along the bottom are a number of
keyboard short-cuts, with ^
representing control.
The main ones you'll want are:
^X
(ctrl+x) : Quit the editor (it will prompt to save).^O
(ctrl+o) : Write the file out without changing it.^G
(ctrl+g) : Built-in documentation.
Other text editors can be used or installed (emacs, vim, ...), but I would suggest nano for most tasks you will encounter here.