This lecture will unite the last lecture's content on genomic analysis with our previous coding in R. The packages we'll use this week are from Bioconductor, a collection of software specifically designed for genomic analysis in R.
Genome variant analysis (Background)
- Types of genomic variation
- Tools to predict genomic variations
- Learn the common file formats for variation data
- Databases and online resources for human variation data
Genomic Data (hands-on tutorials)
- Use Bioconductor packages to work with genomic data in R
- Load, inspect, and query genomic data (BED/SEG, BAM, VCF files)
- Identify and annotate genomic variants
We will be working through some tutorials directly on your laptop using R Studio.
- Tutorial is tested for R-4.0.3
- You should run this script in VSCode to ensure all Bioconductor packages are installed.
## start R session ## R ## run this command within R session ## source("../../software/genomic_data.R")
- This script will install the following packages:
Rsamtools
: querying BAM filesVariantAnnotation
: reading VCF filesGenomicRanges
: manipulating genomic dataplyranges
: fast & easy tool for mannipulating GRanges
- If you have not done so already, update your local copy of the class repository from GitHub. You should have a directory (
lecture16
) containing the following three RMarkdown tutorials:- Lecture16_GenomicData.Rmd: store genomic data as objects, assess genomic ranges, apply operations on genomic data
- Lecture16_Rsamtools.Rmd: load and query sequencing data; compute “pile-up” statistics at genomic loci to identify genomic variants
- Lecture16_VariantCalls.Rmd: load and assess variant (vcf) data
Extensions
(on left panel) > Type in search bar:"R Extension"
> SelectR Extension for Visual Studio Code
by Yuki Ueda- The extension page should look something like this: https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r
- To knit R Markdown files, you'll need the R Extension as well as
pandoc
. - Install
pandoc
outside of VScode by downloading the installer here: https://pandoc.org/installing.html
- Please download all data files found in this folder and add them to your
lecture16
directory. The files should have the following filenames:BRCA.genome_wide_snp_6_broad_Level_3_scna.seg
BRCA_IDC_cfDNA.bam
BRCA_IDC_cfDNA.bam.bai
GIAB_highconf_v.3.3.2.vcf.gz
(if this file was automatically uncompressed on your computer, resulting in a file namedGIAB_highconf_v.3.3.2.vcf
, look in your Trash folder to find the original file ending ingz
)GIAB_highconf_v.3.3.2.vcf.gz.tbi