Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
S_var.png		S_var.png
igv_snapshot.png		igv_snapshot.png
var_map.png		var_map.png
var_wf.png		var_wf.png
variation_analysis.ipynb		variation_analysis.ipynb

README.md

usegalaxy.org	usegalaxy.eu

Analysis of variation within individual COVID-19 samples

What's the point?

To understand the amount of heterogeneity in individual COVID-19 isolates.

Outline

As of writing (2/13/2020) there were just three Illumina datasets from COVID-19 patients:

- sra-study: SRP242226
  bioproject: PRJNA601736
  biosample: SAMN13872787
  sra-sample: SRS6007144
  sra-experiment: SRX7571571
  sra-run: SRR10903401

- sra-study: SRP242226
  bioproject: PRJNA601736
  biosample: SAMN13872786
  sra-sample: SRS6007143
  sra-experiment: SRX7571570
  sra-run: SRR10903402

- sra-study: SRP245409
  bioproject: PRJNA603194
  biosample: SAMN13922059
  sra-sample: SRS6067521
  sra-experiment: SRX7636886
  sra-run: SRR10971381

To understand the extent of sequence variation within these samples we performed the following analysis. First, we used a Galaxy workflow to perform the following steps:

Mapped all reads against COVID-19 reference NC_045512.2 using bwa mem
Filtered reads with mapping quality of at least 20, that were mapped as proper pairs
Performed realignments using lofreq viterbi
Called variants using lofreq call
Annotated variants using snpeff against database created from NC_045512.2 GenBank file
Converted VCFs into tab delimited datasets

Next, we analyzed this tab delimited data in a Jupyter notebook.

Inputs

Workflow

GenBank file for the reference COVID-19 genome.

The GenBank record is used by snpeff to generate a database for variant annotation.
Set of illumina reads (in this case a collection of unfiltered reads from SRR10903401, SRR10903402, and SRR10971381)

Jupyter notebook

The Jupyter notebook requires the GenBank file (#1 from above) and the output of the workflow described below.

Outputs

The workflow produces a table of variants that looks like this:

	Sample	CHROM	POS	REF	ALT	DP	AF	SB	DP4	IMPACT	FUNCLASS	EFFECT	GENE	CODON
0	SRR10903401	NC_045512	1409	C	T	124	0.040323	1	66,53,2,3	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	Cat/Tat
1	SRR10903401	NC_045512	1821	G	A	95	0.094737	0	49,37,5,4	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	gGt/gAt
2	SRR10903401	NC_045512	1895	G	A	107	0.037383	0	51,52,2,2	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	Gta/Ata
3	SRR10903401	NC_045512	2407	G	T	122	0.024590	0	57,62,1,2	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	aaG/aaT
4	SRR10903401	NC_045512	3379	A	G	121	0.024793	0	56,62,1,2	LOW	SILENT	SYNONYMOUS_CODING	orf1ab	gtA/gtG

Here, most fields names are descriptive. SB = the Phred-scaled probability of strand bias as calculated by lofreq (0 = no strand bias); DP4 = strand-specific depth for reference and alternate allele observations (Forward reference, reverse reference, forward alternate, reverse alternate).

The variants we identified were distributed across the SARS-CoV-2 genome in the following way:

The following table describes variants with frequencies above 10%:

History and workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

BioConda

Tools used in this analysis are also available from BioConda:

Name	Link
`bwa`
`samtools`
`lofreq`
`snpeff`
`snpsift`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4-Variation

4-Variation

README.md

Analysis of variation within individual COVID-19 samples

What's the point?

Outline

Inputs

Workflow

Jupyter notebook

Outputs

History and workflow

BioConda

Files

4-Variation

Directory actions

More options

Directory actions

More options

Latest commit

History

4-Variation

Folders and files

parent directory

README.md

Analysis of variation within individual COVID-19 samples

What's the point?

Outline

Inputs

Workflow

Jupyter notebook

Outputs

History and workflow

BioConda