Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

Latest commit

 

History

History
90 lines (60 loc) · 4.67 KB

README.md

File metadata and controls

90 lines (60 loc) · 4.67 KB
usegalaxy.org usegalaxy.eu
Galaxy workflow Galaxy workflow
Galaxy history Galaxy history

Preprocessing of raw SARS-CoV-2 reads

The raw reads available so far are generated from bronchoalveolar lavage fluid (BALF) and are metagenomic in nature: they contain human reads, reads from potential bacterial co-infections as well as true COVID-19 reads.

What's the point?

Assess quality of reads, remove adapters and remove reads mapping to human genome.

The outline

Illumina and Oxford nanopore reads are pulled from the NCBI SRA (links to SRA accessions are available here). They are then processed separately as described in the workflow section.

Inputs

Only SRA accessions are required for this analysis. The described analysis was performed with all SRA SARS-CoV accessions available as of Feb 20, 2020:

  1. Illumina reads

    SRR10903401
    SRR10903402
    SRR10971381
    
  2. Oxford Nanopore reads

    SRR10948550
    SRR10948474
    SRR10902284
    

Outputs

This workflow produces three outputs that are used in tow subsequent analyses:

# Output Used in
1. A combined set of adapter-free Illumina reads without human contamination Assembly
2. A combined set of Oxford Nanopore reads without human contamination Assembly
3. A collection of adapter-free Illumina reads from which human reads have not been removed Variation detection

The history and the workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

The workflow performs the following steps:

Illumina

  • Illumina reads are QC'ed and adapter sequences are removed using fastp
  • Quality metrics are computed and visualized using fastqc and multiqc
  • Reads are mapped against human genome version hg38 using bwa mem
  • Reads that do not map to hg38 are filtered out using samtools view
  • Reads are converted back to fastq format using samtools fastx

Oxford nanopore

  • Reads are QC'ed using nanoplot
  • Quality metrics are computed and visualized using fastqc and multiqc
  • Reads are mapped against human genome version hg38 using minimap2
  • Reads that do not map to hg38 are filtered out using samtools view
  • Reads are converted back to fastq format using samtools fastx

BioConda

Tools used in this analysis are also available from BioConda:

Name Link
sra-tools Anaconda-Server Badge
fastqc Anaconda-Server Badge
multiqc Anaconda-Server Badge
fastp Anaconda-Server Badge
nanoplot Anaconda-Server Badge
bwa Anaconda-Server Badge
picard Anaconda-Server Badge
samtools Anaconda-Server Badge