Snakemake workflow used to deploy and perform basic indexes of genome sequence.
This is done for teaching purpose as an example of FAIR principles applied with Snakemake.
The usage of this workflow is described in the Snakemake workflow catalog, it is also available locally on a single page.
The expected results of this pipeline are described here.
The tools used in this pipeline are described here textually.
Step | Commands |
---|---|
Download DNA Fasta from Ensembl | ensembl-sequence |
Remove non-canonical chromosomes | pyfaidx |
Index DNA sequence | samtools |
Creatse sequence Dictionary | picard |
ββββββββββββββββββββββββββββββββββββββββββ
βDownload Ensembl Sequence (wget + gzip) β
ββββββββββββββββββββ¬ββββββββββββββββββββββ
β
β
ββββββββββββββββββββΌβββββββββββββββββββββββββ
βRemove non-canonical chromosomes (pyfaidx) β
ββββββββββββββββββββ¬βββββββββββββββββββββββ¬ββ
β β
β β
ββββββββββββββββββββΌβββββββββββ βββΌββββββββββββββββββββββββββββββββββββ
βIndex DNA Sequence (samtools)β βCreate sequence dictionary (Picard) β
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ
Step | Commands |
---|---|
Download GTF annotation | ensembl-annotation |
Fix format errors | Agat |
Remove non-canonical chromosomes, based on above DNA Fasta | Agat |
Remove <NA> Transcript support levels |
Agat |
Convert GTF to GenePred format | gtf2genepred |
βββββββββββββββββββββββββββββββββββββββββββ
βDownload Ensembl Annotation (wget + gzip)β
βββββββββββββββ¬ββββββββββββββββββββββββββββ
β
β
βββββββββββββββΌββββββββββ
βFix format Error (Agat)β
βββββββββββββββ¬ββββββββββ
β
β
βββββββββββββββΌββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ
βRemove non-canonical chromosomes (Agat)βββββββββββββ€Fasta sequence index (see Get DNA Fasta)β
βββββββββββββββ¬ββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ
β
β
βββββββββββββββΌββββββββββββββββββββββββ
βRemove <NA> transcript levels (Agat) β
βββββββββββββββ¬ββββββββββββββββββββββββ
β
β
βββββββββββββββΌβββββββββββββββββ
βConvert GTF to GenePred (UCSC)β
ββββββββββββββββββββββββββββββββ
Step | Commands |
---|---|
Extract transcript sequences from above DNA Fasta and GTF | gffread |
Index DNA sequence | samtools |
Creatse sequence Dictionary | picard |
βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
βGTF (see get genome annotation)β βDNA Fasta (See get dna fasta)β
ββββββββββββββββββββββ¬βββββββββββ ββββββββββ¬βββββββββββββββββββββ
β β
β β
ββββββββΌββββββββββββββββββββββββββββΌββββββ
βExtract transcripts sequences (gffread) β
ββββββββ¬ββββββββββββββββββββββββββββ¬ββββββ
β β
β β
ββββββββββββββββββββββΌβββββ ββββββββββΌββββββββββββββββββββββββββββ
βIndex sequence (samtools)β βCreate sequence dictionary (Picard) β
βββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
Step | Commands |
---|---|
Extract coding transcripts from above GTF | Agat |
Extract coding sequences from above DNA Fasta and GTF | gffread |
Index DNA sequence | samtools |
Creatse sequence Dictionary | picard |
βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
βGTF (see get genome annotation)β βDNA Fasta (See get dna fasta)β
ββββββββββββββββββββββ¬βββββββββββ ββββββββββ¬βββββββββββββββββββββ
β β
β β
ββββββββΌββββββββββββββββββββββββββββΌββββββ
βExtract cDNAΒ Β Β Β Β Β Β sequences (gffread) β
ββββββββ¬ββββββββββββββββββββββββββββ¬ββββββ
β β
β β
ββββββββββββββββββββββΌβββββ ββββββββββΌββββββββββββββββββββββββββββ
βIndex sequence (samtools)β βCreate sequence dictionary (Picard) β
βββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
Step | Commands |
---|---|
Download dbSNP variants | ensembl-variation |
Filter non-canonical chromosomes | pyfaidx + BCFTools |
Index variants | tabix |
ββββββββββββββββββββββββββββββββββββββββββββ
βDownload dbSNP variants (wget + bcftools) β
ββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
β
ββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
βRemove non-canonical chromosomes (bcftools + bedtools)β
ββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
β
ββββββββββββΌββββββββββββββ
βIndex variants (tabix) β
ββββββββββββββββββββββββββ
Step | Commands |
---|---|
Extract gene_id <-> gene_name correspondancy | pyroe |
Extract transcript_id <-> gene_id <-> gene_name | Agat + XSV |
ββββββββββββββββββββββββββββββββββ
βGenome annotation (see get GTF) ββββββββββββββββββββ
ββββββββ¬ββββββββββββββββββββββββββ β
β β
β β
ββββββββΌβββββββββββββββββββββββββββββββ ββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
βExtract gene_id <-> gene_name (pyroe)β βExtract gene_id <-> gene_name <-> transcript_id (Agat)β
ββββββββ¬βββββββββββββββββββββββββββββββ ββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ
β β
β β
ββββββββΌββββββ ββββββββββΌβββββ
βFormat (XSV)β βFormat (XSV) β
ββββββββββββββ βββββββββββββββ
Step | Commands |
---|---|
Download blacklisted regions | Github source |
Merge overlapping intervals | bedtools |
ββββββββββββββββββββββββββββββββββ
βDownload known blacklists (wget)β
ββββββββββββββ¬ββββββββββββββββββββ
β
β
ββββββββββββββΌβββββββββββββββββββββββββββ
βMerge overlapping intervals (bedtools) β
βββββββββββββββββββββββββββββββββββββββββ
Step | Commands |
---|---|
GTF to GenePred | UCSC-tools |
ββββββββββββββββββββββββββββββββββ
βGenome annotation (see get GTF) β
ββββββββββββββ¬ββββββββββββββββββββ
β
β
ββββββββββββββΌβββββββββββββββ
βGTFtoGenePred (UCSC-tools) β
βββββββββββββββββββββββββββββ
Step | Commands |
---|---|
Fasta to 2bit | UCSC-tools |
ββββββββββββββββββββββββββββββββββ
βGenome sequence (see get Fasta) β
ββββββββββββββ¬ββββββββββββββββββββ
β
β
ββββββββββββββΌβββββββββββββββ
βFaToTwoBit (UCSC-tools) β
βββββββββββββββββββββββββββββ
Step | Commands |
---|---|
STAR index | STAR |
ββββββββββββββββββββββββββββββββββ
βGenome sequence (see get DNA) β
ββββββββββββββ¬ββββββββββββββββββββ
β
β
βββββββββΌβββββ
β STAR index β
ββββββββββββββ
Step | Commands |
---|---|
Bowtie2 build | Bowtie2 build |
ββββββββββββββββββββββββββββββββββ
βGenome sequence (see get DNA) β
ββββββββββββββ¬ββββββββββββββββββββ
β
β
βββββββββΌβββββ
β STAR index β
ββββββββββββββ
Step | Commands |
---|---|
Generate decoy | Bash |
Salmon index | Salmon |
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ
βGenome sequence (see get DNA)β βTranscriptome sequence (see get cDNA)β
ββββββββββββββββββββββββββββ¬βββ βββββββ¬ββββββββββββββββββββββββββββββββ
β β
β β
β β
ββββββΌββββββββββββββββββΌβββββ
βGenerate decoy and gentromeβ
βββββββββββββββ¬ββββββββββββββ
β
βββββββββββββββββββ β βββββββββββββββββ
βGentrome sequenceββββββββββββββββββ΄ββββββΊDecoy sequencesβ
ββββββββββββββ¬βββββ ββββββ¬βββββββββββ
β β
β β
β ββββββββββββββββ β
βββββββββΊ Salmon index βββββββββββ
ββββββββββββββββ