Skip to content

Commit

Permalink
feat: env update (#45)
Browse files Browse the repository at this point in the history
* fix: updated to current snakemake and python

* fix: preliminary env update

* fix: removed unnecessary bioconductor and r dependencies

* fix: added biopython dependency

* feat: profile split per cluster

* fix: README adaption

---------

Co-authored-by: Yannic Eising <yeising@students.uni-mainz.de>
  • Loading branch information
cmeesters and yeising authored Jul 3, 2024
1 parent 10a8d93 commit 030e041
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 94 deletions.
96 changes: 2 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,9 @@
[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0-brightgreen.svg)](https://snakemake.github.io)
[![GitHub actions status](https://github.com/snakemake-workflows/transcriptome-differential-expression/workflows/Tests/badge.svg?branch=main)](https://github.com/snakemake-workflows/transcriptome-differential-expression/actions?query=branch%3Amain+workflow%3ATests)

> This project tries to re-animate the project from Oxford Nanopore. CURRENTLY IT IS NOT IN A WORKING STATE. Please see their current nextflow implementation as a reference [wf-transcriptomes](https://github.com/epi2me-labs/wf-transcriptomes), which contains functionality for [differential expression](https://github.com/epi2me-labs/wf-transcriptomes#differential-expression).


-----------------------------


*Pipeline for differential gene expression (DGE) and differential transcript usage (DTU) analysis using long reads*

This pipeline uses [snakemake](https://snakemake.readthedocs.io/en/stable/), [minimap2](https://github.com/lh3/minimap2), [salmon](https://combine-lab.github.io/salmon/), [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html), [DEXSeq](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html) and [stageR](https://bioconductor.org/packages/release/bioc/html/stageR.html) to automate simple [differential gene expression](https://www.ebi.ac.uk/training/online/course/functional-genomics-ii-common-technologies-and-data-analysis-methods/differential-gene) and [differential transcript usage](http://dx.doi.org/10.12688/f1000research.15398.2) workflows on long read data.

If you have paired samples (e.g for example treated and untreated samples from the same individuals) use the [paired_dge_dtu](https://github.com/nanoporetech/pipeline-transcriptome-de/tree/paired_dge_dtu) branch.


## Getting Started

### Input

The input files and parameters are specified in `config.yml`:

- `transcriptome` - the input transcriptome.
- `annotation` - the input annotation in GFF format.
- `condition_a_identifier` - a string identifiying the first trait.
- `condition_b_samples` - a string identifiying the second trait.

### Output

- `alignments/*.bam` - unsorted transcriptome alignments (input to `salmon`).
- `alignments_sorted/*.bam` - sorted and indexed transcriptome alignments.
- `counts` - counts generated by `salmon`.
- `merged/all_counts.tsv` - the transcript count table including all samples.
- `merged/all_counts_filtered.tsv` - the transcript count table including all samples after filtering.
- `merged//all_gene_counts.tsv` - the gene count table including all samples.
- `de_analysis/coldata.tsv` - the condition table used to build model matrix.
- `de_analysis/de_params.tsv` - analysis parameters generated from `config.yml`.
- `de_analysis/results_dge.tsv` and `de_analysis/results_dge.pdf`- results of `edgeR` differential gene expression analysis.
- `de_analysis/results_dtu_gene.tsv`, `de_analysis/results_dtu_transcript.tsv` and `de_analysis/results_dtu.pdf` - results of differential transcript usage by `DEXSeq`.
- `de_analysis/results_dtu_stageR.tsv` - results of the `stageR` analysis of the `DEXSeq` output.
- `de_analysis/dtu_plots.pdf` - DTU results plot based on the `stageR` results and filtered counts.


### Dependencies

- [miniconda](https://conda.io/miniconda.html) - install it according to the [instructions](https://conda.io/docs/user-guide/install/index.html).
- [snakemake](https://anaconda.org/bioconda/snakemake) install using `conda`.
- [pandas](https://anaconda.org/conda-forge/pandas) - install using `conda`.
- The rest of the dependencies are automatically installed using the `conda` feature of `snakemake`.

### Layout

* `README.md`
* `Snakefile` - master snakefile
* `config.yml` - YAML configuration file
* `snakelib/` - snakefiles collection included by the master snakefile
* `lib/` - python files included by analysis scripts and snakefiles
* `scripts/` - analysis scripts
* `data/` - input data needed by pipeline - use with caution to avoid bloated repo
* `results/` - pipeline results to be commited - use with caution to avoid bloated repo

### Installation

Clone the repository:

```bash
git clone https://github.com/snakemake-workflows/transriptome-differential-expression
```

### Usage

Edit `config.yml` to set the input datasets and parameters then issue:

On a server, e.g.:
```bash
snakemake --use-conda -j <num_cores> all
```
On a cluster, e.g.
```bash
snakemake --slurm --default-resources slurm_account=<your slurm account> slurm_partition=<your clusters default partition> -j <unlimited or lower> --configfile ./envs/<your config yaml> --workflow-profile ./profile/ --snakefile <path to Snakefile> --directory <desired working directory>
```
Note, that the profile offers a template cluster configuration - it needs adjusting for particular clusters. Contributions of particular configurations are welcome!

### Help

##### Licence and Copyright

(c) 2018 Oxford Nanopore Technologies Ltd.
(c) 2023- Lukas Hellmann & Christian Meesters (JGU Mainz, Germany)

This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.

#### References and Supporting Information

This worflow is largely based on the approach described in the following paper:

- Love MI, Soneson C and Patro R. *Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification.* F1000Research 2018, 7:952
(doi: [10.12688/f1000research.15398.3](http://dx.doi.org/10.12688/f1000research.15398.3))



File renamed without changes.
65 changes: 65 additions & 0 deletions workflow/profile/Mainz-MogonNHR/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
default-resources:
slurm_account: "nhr-zdvhpc"
slurm_partition: "smallcpu"

set-resources:
genome_to_transcriptome:
cpus_per_task: 1
mem_mb_per_cpu: 1800
runtime: "2h"

build_minimap_index:
cpus_per_task: 4
mem_mb_per_cpu: 3600
runtime: "1h"

map_reads:
cpus_per_task: 40
mem_mb_per_cpu: 1800
runtime: "3h"
slurm_partition: "smallcpu" # needs benchmarking

plot_samples:
cpus_per_task: 4
mem_mb_per_cpu: 1800
runtime: "3h"

plot_all_samples:
cpus_per_task: 8
mem_mb_per_cpu: 1800
runtime: "2h"

map_qc:
cpus_per_task: 8
mem_mb_per_cpu: 1800
runtime: "1h"

sam_sort:
cpus_per_task: 4
mem_mb_per_cpu: 1800
runtime: "2h"

sam_view:
cpus_per_task: 1
mem_mb_per_cpu: 1800
runtime: "1h"

sam_index:
cpus_per_task: 8
mem_mb_per_cpu: 1800
runtime: "30m"

sam_stats:
cpus_per_task: 8
mem_mb_per_cpu: 1800
runtime: "30m"

count_reads:
cpus_per_task: 8
mem_mb_per_cpu: 1800
runtime: "1h"

de_analysis:
cpus_per_task: 4
mem_mb_per_cpu: 5000
runtime: "1h"

0 comments on commit 030e041

Please sign in to comment.