RNA Sequencing Analysis Pipelines

This GitHub repository contains two pipelines for RNA Sequencing analysis: one for initial anlysis of RNA sequencing read data (Quality Control) and the other for alignment and mapping of reads to reference genome and counting of features (genes) (Main Pipeline). Each pipeline consists of a series of Bash scripts that automate key steps in RNA sequencing data analysis, along with additional Python and R scripts for downstream analysis.

Requirements

Software

Files

Sequencing read data in the fastq.gz format
Index files for the reference genome of interest, in this case Human Genome hg38
Ideally perform your own indexing using software such as STAR aligner
A .gtf file of annotated features for your indexed genome
Splice Site file for your indexed genome to improve alignment accuracy across exon-exon boundaries

Pipeline

Quality Control Usage

./runQC.sh <input_dir> <output_dir>

Individual Steps

FastQC Analysis
- Script: fastqc.sh
- Usage:
```
fastqc.sh "$INPUT_DIR" "$OUTPUT_DIR"
```
Trimming with Trimgalore
- Script: trim_fastq.sh
- Usage:
```
trim_fastq.sh "$INPUT_DIR" "$OUTPUT_DIR"
```
FastQC Analysis on Trimmed Data
- Script: fastqcTrimmed.sh
- Usage:
```
fastqcTrimmed.sh "$INPUT_DIR" "$OUTPUT_DIR"
```

Main Pipeline Usage

./runPipeline.sh <input_dir> <output_dir> <index_path> <splice_sites_file> <gtf_file> <read_type> <data_type>

Mapping to Human Genome using Hisat2
- Script: mapPP.sh, mapPU.sh, mapRP.sh, mapRE.sh
- When specifying the read_type and data_type in runPipeline.sh, IF statements determine which mapping script to use
- Read_types = Unpaired OR Paired
- Data_types = Raw OR Processed (Raw will use files processed by runQC.sh in the Quality Control step)
- Usage:
```
map.sh "$INPUT_DIR" "$OUTPUT_DIR" "$INDEX_PATH" "$SPLICE_SITES"
```
Conversion of SAM to BAM
- Script: samToBam.sh
- Usage:
```
samToBam.sh "$INPUT_DIR" "$OUTPUT_DIR"
```
Indexing BAM Files
- Script: indexBam.sh
- Usage:
```
indexBam.sh "$INPUT_DIR" "$OUTPUT_DIR"
```
Counting Reads for Each Gene Feature using FeatureCounts
- Script: featureCount.sh
- Usage:
```
featureCount.sh "$INPUT_DIR" "$OUTPUT_DIR" "$GTF_FILE"
```

Downstream Analysis

Python Script for Merging FeatureCounts Results

Script: merge_featureCounts.py
Usage:

 merge_featureCounts.py file_paths output_path

R Script for DSeq2 Analysis

Script: DSeq2_analysis.R
Usage: Execute in an R environment

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
MainPipeline		MainPipeline
QualityControl		QualityControl
Figure.png		Figure.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA Sequencing Analysis Pipelines

Table of Contents

Requirements

Software

Files

Pipeline

Quality Control Usage

Individual Steps

Main Pipeline Usage

Downstream Analysis

Python Script for Merging FeatureCounts Results

R Script for DSeq2 Analysis

Results

About

Releases

Packages

Languages

focyte/Bash-RNAseq

Folders and files

Latest commit

History

Repository files navigation

RNA Sequencing Analysis Pipelines

Table of Contents

Requirements

Software

Files

Pipeline

Quality Control Usage

Individual Steps

Main Pipeline Usage

Downstream Analysis

Python Script for Merging FeatureCounts Results

R Script for DSeq2 Analysis

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages