00_Settings.sh
: Various settings for pipeline
- Dependencies
- create a conda environment with:
conda create --name DEU conda activate DEU conda install sra-tools samtools star r-base
- create a conda environment with:
- Parameters
accesions
: a list of accessions to pullRNA_SEQ_DIR
: This is where a directory containing RNA-seq FASTQ files is locatedOUT_DIR
: This is where the output directory should be locatedMAX_CPUS
: Number of threads to useMAX_MEM_GB
: Maximum memory in GB to useHUMAN_GENOME_REF
: path to.fna
FASTA reference fileHUMAN_GENOME_GTF
: path to the GTF annotation fileHUMAN_RNA_REF
: unusedsjdbOverhang
: Set to the RNA read-length - 1 more info
01_Download_Accessions.sh
: A script to download a given set of accessions with sra-tools
- Output
- FASTQs from given accesssions
02_Alignment.sh
: A script to align the resulting FASTQs with STAR
- Output
- BAMs for each sample; indexed and sorted
DEU_1.R
: A script to perform Differential Exon Usage analysis with DEXSeq
- Input
- BAMs
- GTF for reference
- Output
DEXSeqReport
: StandardDEXSeq
report of all genes and their Differential Exon Usage (DEU) more infodxr.rds
: a R dataset object containing theDEXSeqResults
object generated during analysis
DEU_2.R
: A script to analyze the resulting DEXSeqResults
.rds
file
- Input
-
DEXSeqResults
: object generated during previous analysis
-
- Output
-
DEU.xlsx
: Contains two sheets:-
Combined Genes
: For each gene, lists the following:-
gene
: gene name / alias -
gene_desc
: short description of the gene -
total_exons
: total number of exons in the gene -
pvalue_sig_exons < 0.05
: number of exons with p-values that are significant in that gene given the threshold -
pvalue_prop_sig
: a proportion;$\frac{total_exons}{pvalue_sig_exons < 0.05}$ -
pvalue_na
: number of exons without apvalue
-
comb_pvalue
: the combined exonpvalue
after using Fisher's method -
comb_pvalue_df
: degrees of freedom when combining each exonpvalue
-
padj_sig_exons < 0.05
: number of exons with adjusted p-values that are significant in that gene given the threshold -
padj_prop_sig
: a proportion;$\frac{total_exons}{padj_sig_exons < 0.05}$ -
padj_na
: number of exons without apadj
-
comb_padj
: the combined exonpadj
after using Fisher's method -
comb_padj_df
: degrees of freedom when combining each exonpadj
-
-
All Genes
: For each exon in each gene, lists the followning:-
groupID
: gene name / alias -
gene_desc
: short description of the gene -
featureID
: feature ID -
exonBaseMean
: exon base mean -
dispersion
: dispersion -
stat
: statistic -
pvalue
: p-value -
padj
: adjusted p-value -
day_0
: condition #1 -
day_100
: condition #2 -
log2fold_day_100_day_0
: the$log_2$ fold change of the conditions; in this case$\frac{day_100}{day_0}$
-
-
-