This GitHub repository contains two pipelines for RNA Sequencing analysis: one for initial anlysis of RNA sequencing read data (Quality Control) and the other for alignment and mapping of reads to reference genome and counting of features (genes) (Main Pipeline). Each pipeline consists of a series of Bash scripts that automate key steps in RNA sequencing data analysis, along with additional Python and R scripts for downstream analysis.
- Sequencing read data in the fastq.gz format
- Index files for the reference genome of interest, in this case Human Genome hg38
- Ideally perform your own indexing using software such as STAR aligner
- A .gtf file of annotated features for your indexed genome
- Splice Site file for your indexed genome to improve alignment accuracy across exon-exon boundaries
./runQC.sh <input_dir> <output_dir>
-
FastQC Analysis
- Script:
fastqc.sh
- Usage:
fastqc.sh "$INPUT_DIR" "$OUTPUT_DIR"
- Script:
-
Trimming with Trimgalore
- Script:
trim_fastq.sh
- Usage:
trim_fastq.sh "$INPUT_DIR" "$OUTPUT_DIR"
- Script:
-
FastQC Analysis on Trimmed Data
- Script:
fastqcTrimmed.sh
- Usage:
fastqcTrimmed.sh "$INPUT_DIR" "$OUTPUT_DIR"
- Script:
./runPipeline.sh <input_dir> <output_dir> <index_path> <splice_sites_file> <gtf_file> <read_type> <data_type>
-
Mapping to Human Genome using Hisat2
- Script:
mapPP.sh, mapPU.sh, mapRP.sh, mapRE.sh
- When specifying the read_type and data_type in runPipeline.sh, IF statements determine which mapping script to use
- Read_types = Unpaired OR Paired
- Data_types = Raw OR Processed (Raw will use files processed by runQC.sh in the Quality Control step)
- Usage:
map.sh "$INPUT_DIR" "$OUTPUT_DIR" "$INDEX_PATH" "$SPLICE_SITES"
- Script:
-
Conversion of SAM to BAM
- Script:
samToBam.sh
- Usage:
samToBam.sh "$INPUT_DIR" "$OUTPUT_DIR"
- Script:
-
Indexing BAM Files
- Script:
indexBam.sh
- Usage:
indexBam.sh "$INPUT_DIR" "$OUTPUT_DIR"
- Script:
-
Counting Reads for Each Gene Feature using FeatureCounts
- Script:
featureCount.sh
- Usage:
featureCount.sh "$INPUT_DIR" "$OUTPUT_DIR" "$GTF_FILE"
- Script:
- Script:
merge_featureCounts.py
- Usage:
merge_featureCounts.py file_paths output_path
- Script:
DSeq2_analysis.R
- Usage:
Execute in an R environment