Skip to content

Latest commit

 

History

History
88 lines (70 loc) · 2.36 KB

README.md

File metadata and controls

88 lines (70 loc) · 2.36 KB

Course on Unix and work with genomic data - Prague, November 2020

Final exercise

Task 6

Create an executable shell script that carries the data processing on vcf file (/data-shared/vcf_examples/luscinia_vars.vcf.gz). Count proportion of transitions (A -> G, G -> A, C -> T, T -> C) and transversions (A -> C/T, C -> A/G, G -> C/T, T -> A/G) in total and by type of nucleotide. Create .R script that prepares the graph visualising the results.

Solution

I made a shell script which analyzes /data-shared/vcf_examples/luscinia_vars.vcf.gz file and prints on standard output total number and proportion of transitions and transversion in total and also transitions and transversion for each nucleotide.

Run the solution

To run the solution clone this repository with

git clone https://github.com/CalounovaT/ngs-course-2020.git

Then move to this directory with

cd ngs-course-2020

Shell script

To get the results run this command

./trans_script.sh

You should see the results printed

transitions =  206468 (0.61%)
transversions = 131944 (0.39%)
A -> C = 11758 (0.03%)
A -> G = 54261 (0.16%)
A -> T = 23421 (0.07%)
C -> A = 10824 (0.03%)
C -> G = 18966 (0.06%)
C -> T = 50124 (0.15%)
G -> A = 50969 (0.15%)
G -> C = 18091 (0.05%)
G -> T = 10951 (0.03%)
T -> A = 26219 (0.08%)
T -> C = 51114 (0.15%)
T -> G = 11714 (0.03%)

trans_script.sh also creates summary csv file in your home directory substitutions_results.csv with these results.

R script

Then you can visualize the data with provided R script plot_script.R. You can run the script with

Rscript plot_script.R

This will generate file Rplots.pdf in your home directory containing the plots.

Alternatively you can view them in Rstudio.

Main overview (p1)

Overview

Transitions histogram (p2)

Transitions

Transversions histogram (p3)

Overview

Adenin overview (p4)

Overview

Adenin histogram (p5)

Overview

Cytosin overview (p6)

Overview

Cytosin histogram (p7)

Overview

Guanin overview (p8)

Overview

Guanin histogram (p9)

Overview

Thymin overview (p10)

Overview

Thymin histogram p(11)

Overview