-
Notifications
You must be signed in to change notification settings - Fork 10
A Hands On Introduction to AWS: November 10, 2021
Wednesday, November 10 2021, from 10:00am - 12:00pm PDT
Instructor: Saranya Canchi
Moderator: Jeremy Walter
Helper: Rayna Harris
This 2 hour hands-on tutorial will introduce you to creating a computer "in the cloud" and logging into it, via Amazon Web Services. We'll create a small general-purpose Linux computer, connect to it, and run a small job while discussing the concepts and technologies involved.
😺 While we wait to get started
- ✔️ Have you checked out the pre-workshop resources page?
- 📝 Please fill out our pre-workshop survey if you have not already done so! Please click this link
- If you are on a windows computer, make sure you have Mobaxterm. Check out the install guide!
I am Saranya Canchi and I am joined today by Rayna Harris. We are both part of the training and engagement team for the NIH Common Fund Data Ecosystem, a project supported by the NIH to increase data reuse and cloud computing for biomedical research.
👍 Have you heard of the NIH Common Fund Data Ecosystem? Put up a ✔️ for yes and a ❎ for no!
You can contact us at srcanchi@ucdavis.edu (@S_Canchi) and rmharris@ucdavis.edu (@raynamharris).
We have the following goals for this workshop:
- Help you think about if and how to use cloud computers for your work!
- Gather questions, feedback and refine the tutorial materials!
So, please ask lots of questions, and even the ones we can't answer yet we'll figure out for you!
Today, everything you do will be paid for by us. In the future, if you create your own AWS account, you'll have to put your own credit card on it. We'd be happy to answer questions about how to pay for AWS.
😺 Your free login credentials will work for the next 24 hours
- Brief introduction to AWS and the cloud
- Set up an instance and connect to it
- Install and run things in the cloud computer
- Learn how to download output files to local machine
- Take your questions
If you have questions at any point,
- Drop them in the chat, or
- Direct messages to the moderator are welcome, or
- Unmute yourself and ask during the workshop
We're going to use the raise hand ✋ reaction in zoom to make sure people are on board during the hands-on activities.
- Renting and use of IT services over the internet.
- No direct, active management by the user.
- Avoid or minimize up-front IT infrastructure cost.
- Amazon and Google, among others, rent compute resources over the internet for money.
There are lots of reasons, but basically "you need a kind of compute or network access that you don't have."
- More memory than you have available otherwise
- An operating system you don't have access to (Windows? Mac?)
- Installation privileges for software
- May not want to install brand new software on your local computer
- Amazon web services is one of the most broadly adopted cloud platforms
- It is a hosting provider that gives you a lot of services including cloud storage and cloud compute.
- Instance - a computer that is running ...somewhere in "the cloud". The important thing is that someone else is worrying about the hardware etc, so you're just renting what you need!
- Cloud computer - same as an "instance".
- Image - the basic computer install from which an instance is constructed. The configuration of your instance at launch is a copy of the Amazon Machine Image (AMI)
- EC2 - elastic compute cloud.
ℹ️ Amazon's main compute rental service is called Elastic Compute Cloud (or EC2) and that's what we'll be showing you today.
- Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud.
- Basically, you rent virtual computers that are configured according to your needs and run applications and analyses on that computer.
- Best suited for analyses that could crash your local computer. E.g. those that generate or use large output files or take too long
- Sign up process is relatively easy (you need a credit card and some patience to deal with delays in two-factor authentication)
- Simple billing
- Stable services with only 3-4 major outages that only lasted 2-3 hours and did not affect all customers (region-specific). A large team of employees who are on top of any problems that arise!
- Lots of people use it, so there are a ton of resources
- Spot instances (unused EC2 instances) - you can bid for a price. It is cheap, but your services might be terminated if someone outbids you.
We will create a cloud computer - an "instance" - and then log in to it.
💻 Log in at: https://aws-cfde-training-workshop.signin.aws.amazon.com/console
Use your registration e-mail (see bottom of this page if you forgot!) and password CFDErocks!
Put up a ✋ on Zoom when you've successfully logged in with the workshop user credentials.
Checklist for hands-on walk-through
- Select a region on the top right: US West (N. California)us-west-1
- Pick the AMI (OS): Ubuntu 20.04 LTS - Focal
- Pick an instance: T2 micro free tier
- Edit security groups: to your last name
- Launch
Other ways to connect to the instance:
We have tutorials on connecting to an instance for Windows Users using MobaXterm and for Mac Users using MacOS Terminal. Please visit our "Connect to an Instance" webpage and select your OS using the tabs on the top of the page.
- Install a simple bioinformatic software (FastQC)
- Download fastq (raw RNA Sequence) data
- Run fastqc on downloaded data
- Transfer output files from AWS computer to local computer.
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.
- It provides a modular set of analyses to help you identify problems in the quality of your samples or sequence.
- The aim of this tool is to spot issues that originate from the sequencer or in the starting library material.
- Output of fastqc is an HTML based permanent report
⚠️ Note👎
Copy+Paste
does not work if you are using Safari (MacOS) to run the AWS terminal. Please use another web browser (e.g. Chrome or Firefox), or type in the commands.
- Update system packages:
sudo apt update
- Make a directory
mkdir fastq
- Change into the directory
cd fastq
- Download a fastq data file from osf.io
curl -L https://osf.io/8rvh5/download -o ERR458494.fastq.gz
Click the raised hand ✋ reaction if you were able to run the last command successfully and download ERR458494.fastq.gz
- Check if your file has been downloaded
ls -l
- Install FastQC
sudo apt install fastqc -y
Click the raised hand ✋ reaction if you were able to run the last command successfully. To double check it was successful, type
fastqc --version
. If it returns 0.11.9, that means installation was successful.
- Run FastQC on the dowloaded file
fastqc ERR458494.fastq.gz
Click the raised hand ✋ reaction if you were able to run the last command successfully and download an html file.
- view files
ls
apt-cache search [search term 1]
- search available software for installation
sudo apt update
- download packaged information from all configured sources from the internet
- This will update the package lists from all repositories in one go. Remember to do this after every added repository!
sudo apt install <program1> -y
- install package
- Other programs ("ncbi blast+", )
mkdir <directory name>
- make a new directory
- equivalent to making a new folder in Windows
cd <directory name>
- change directory
- equivalent to double clicking a folder
curl -L <url> <filename> -o <file.html>
- curl stands for "Client URL"
- transfers data to or from a network server
- "-L" or location/link
- "-o" output
fastqc ERR458494.fastq.gz
- Run FastQC on ERR458494.fastq.gz
- ERR458494.fastq.gz - "Yeast" Sample
Analysis Modules Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/
What a good data file looks like https://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html
What bad data looks like https://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc.html
Video Walkthrough:
FastQC tool for read data quality evaluation
Using FastQC to check the quality of high throughput sequence
- Go to the MobaXterm website to download
- Click on "GET MOBAXTERM NOW!"
- The Home Edition is perfect for normal use and it is free! Click "Download now"
- Click on "MobaXterm Home Edition v20.6 (Portable edition)" and save as in your Downloads folder
- Go to Downloads folder, click on the zipped folder, click "Extract all", click "Extract"
- The MobaXterm application is now in the unzipped folder
- Click on the MobaXterm application to open it!
- Go back to your instance page, select it and click on "Connect". The Public DNS information you need to connect to your instance via ssh can be found in the "SSH client" tab:
- In MobaXterm, click on "Session"
- Click on "SSH"
- Enter the Public DNS as the "Remote host"
- Check box next to "Specify username" and enter "ubuntu" as the username
- Click the "Advanced SSH settings" tab
- Check box by "Use private key"
- Use the document icon to navigate to where you saved the private key (e.g., "amazon.pem") from AWS on your computer. It is likely on your Desktop or Downloads folder
- Click "OK"
- A terminal session should open up with a left-side panel showing the file system of our AWS instance! You can click on the FastQC html file and view in browser to open. There are also options in the panel to download files.
- Start Terminal
- Change the permissions on the .pem file for security purposes (removes read, write, and execute permissions for all users except the owner (you)
chmod og-rwx ~/Desktop/amzon.pem
- Change directory to Desktop. Your
.pem
file is on your Desktop
cd ~/Desktop
Go back to your instance page, select it and click on "Connect". The information you need to connect to your instance via ssh can be found in the "SSH client" tab:
- Use the
scp
command on your local terminal to copy your.html
file!
scp -i <your-.pem> ubuntu@???-??-??-???-??.us-west-1.compute.amazonaws.com:/home/ubuntu/fastq/ERR458494_fastqc.html ./
-i
flag points to identity file. Don't forget to change the stuff after ubuntu@
to match your instance!
When you shut down your instance, any data that is on a non-persistent disk goes away permanently. But you also stop being charged for any compute and data, too!
💡 Stopping vs hibernation vs termination
-
Stopping:
- saves data to EBS root volume
- only EBS data storage charges apply
- No data transfer charges or instance usage charges
- RAM contents not stored
-
Hibernation:
- charged for storage of any EBS volumes
- stores the RAM contents
- it's like closing the lid of your laptop
-
Termination:
- complete shutdown
- EBS volume is detached
- data stored in EBS root volume is lost forever
- instance cannot be relaunched
To enable Hibernation, click the box in the Configure Instance step of the setup.
Launch a t2.nano, Ubuntu 20.04 LTS - Focal instance in the the East US (Ohio) region. Change the root storage volume to 16 GiB and add an additional EBS volume (8 GiB).
Bonus points: Your added volume will persist after you have terminated your instance. Where can you find it?
Hint
- Go to Amazon Market place and search for the "Ubuntu 20.04 LTS - Focal". Should be the first result.
- Look in tab 4 called "Add Storage" to add additional storage volumes.
Please fill out our post workshop survey!
So far in this workshop, we have only encountered programs that install quickly. The analysis we ran was also pretty quick because we only ran it on one file!
In your own work, you may encounter programs that have lengthy installations, and/or you may need to analyze a large number of files.
While performing a long-running task on a remote machine, a sudden drop in your internet connection would terminate the SSH session and your work would be lost!
The screen
utility provides a work-around to this problem. screen
is a terminal multiplexer i.e. you can open many virtual terminals. Processes running in screen
will continue to run even when the terminal is not visible, or if you get disconnected from the internet.
- Install
screen
:
sudo apt-get install screen
- Running
screen
screen
Press space (twice) or enter to get the command prompt
- Run a code inside
screen
session
top
- Detaching screen
Press ctrl + a + d keys
- List screen sessions
screen -ls
- Reattach screen
screen -r <screen_ID>
- Re-Detach screen repeat step four
- A little bit about AWS and cloud computing
- How to launch an instance
- How to connect to the instance
- How to install and run a software program on the instance
- How to terminate your instance
We'll send around the post-workshop survey link via e-mail -- please do fill it out, thank you!
Check our Events page for information on upcoming workshops!
You can contact us at training@cfde.atlassian.net with requests for new topics or questions about the workshops.
- Home
- Resources for Attendees
- Resources for Instructors
- Training Workshop Notes
-
HuBMAP Tools
-
R
-
RNA-Seq Concepts, Design and Workflows
-
RNA-Seq in the Cloud
-
Snakemake Part I & II
-
UNIX