namlab-mapper

Little workflow which can download and map multiple RNA sequencing files from the NCBI SRA as well as any local FASTQ files to a common reference using kallisto. Because it is written in Nextflow, it can automatically parallelize steps across CPUs or nodes, if you are running it on a cluster (see this page for more details). It is also built to be economical with disk space by removing large intermediary files when they are no longer needed. The output is a combined table containing abundance quantifications as well as FastQC reports for each of sequence files.

Prerequisites

rnaseq-mapper will try to load the following modules: sratoolkit, kallisto, R, fastqc. If your system doesn't use modules, make sure the execs are available in your PATH.

Usage

Set up nextflow (if not installed already):

curl -s https://get.nextflow.io | bash

Create a file called nextflow.config (exactly this name) by using the example_nextflow.config from this directory as a template and adapting it to your use case.
Create an input file with the sequences you want to map in the format of example_input.csv and make sure it is referred to in your config file.
If desired, place any FASTQ files in the directories referenced in your nextflow.config (if you don't have any, make sure the folders still exist and just leave them empty).
Run the pipeline:

./nextflow run NAMlab/rnaseq-mapper

Singularity Container

If you prefer, you can also make use of the Singularity container that packages all the required software (sratoolkit, kallisto, R, fastqc). This requires Singularity or Apptainer to be installed in your system. You can then simply execute the pipeline (step 5 above, the other steps stay the same) via

./nextflow run NAMlab/rnaseq-mapper -with-singularity library://merlin/default/rnaseq-mapper:latest

or

./nextflow run NAMlab/rnaseq-mapper -with-apptainer library://merlin/default/rnaseq-mapper:latest

respectively.

Output

You will get out a TSV file with the combined kallisto outputs for all your sequence files like this one (by default in the work/out folder):

target_id	length	SRR1805735_eff_length	SRR1805737_eff_length	SRR6512869_eff_length	SRR6512869_est_counts	SRR6512869_tpm
Solyc00g005280.1.1	411	252.224	241.253	212	0	0
Solyc00g005285.2.1	216	68.6464	63.7937	31.5146	0	0
Solyc00g006483.2.1	390	231.296	220.691	191	0	0
Solyc00g006487.2.1	276	120.525	114.108	77.4659	2	22.2662
Solyc00g006560.2.1	1317	1158	1145.76	1118	0	0
Solyc00g006890.2.1	300	143.123	135.795	101.044	0	0
Solyc00g006900.2.1	576	416.999	404.931	377	0	0
Solyc00g007225.2.1	1275	1116	1103.76	1076	0	0
Solyc00g007330.1.1	516	356.999	345.082	317	0	0

You will also get FastQC reports for each of sequence files in the same folder.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
README.md		README.md
example_input.csv		example_input.csv
example_nextflow.config		example_nextflow.config
main.nf		main.nf
singularity_container.def		singularity_container.def

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

namlab-mapper

Prerequisites

Usage

Singularity Container

Output

About

Releases

Packages

Languages

NAMlab/rnaseq-mapper

Folders and files

Latest commit

History

Repository files navigation

namlab-mapper

Prerequisites

Usage

Singularity Container

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages