Skip to content

plot H2A gencode ex

Andy Pohl edited this page Aug 6, 2014 · 2 revisions

A very common practice is to plot the aggregate bigWig signal at many loci, all centered (or anyhow anchored) at a common feature, such as the transcription start site (TSS). To examine the binding of histone H2A in TSS regions genome-wide, we can first collect the genes and make a BED file:

$ wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_20/gencode.v20.annotation.gtf.gz
$ zcat gencode.v20.annotation.gtf.gz \
  | awk 'BEGIN{OFS="\t"}{if($3=="gene" && $20=="\"protein_coding\";"){print $1, $4-1, $5, $18, "0", $7}}' \
  | sed 's/\"//g;s/;//' \
  | sort -k1,1 -k2,2n > gencode_pc.bed

then we'll download the bigWig file (from ENCODE hESC cells):

$ paraFetch 30 10 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneH1hescH4k20me1StdSig.bigWig esc-H4k20me1.bw

then we'll run bwtool aggregate to average all the H2A signal upstream and downstream of the TSS of each of these genes from GENCODE:

Clone this wiki locally