feat: env update (#45)

* fix: updated to current snakemake and python * fix: preliminary env update * fix: removed unnecessary bioconductor and r dependencies * fix: added biopython dependency * feat: profile split per cluster * fix: README adaption --------- Co-authored-by: Yannic Eising <yeising@students.uni-mainz.de>
snakemake-workflows · Jul 3, 2024 · 030e041 · 030e041
1 parent 10a8d93
commit 030e041
Show file tree

Hide file tree

Showing 3 changed files with 67 additions and 94 deletions.
diff --git a/README.md b/README.md
@@ -1,101 +1,9 @@
+[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0-brightgreen.svg)](https://snakemake.github.io)
+[![GitHub actions status](https://github.com/snakemake-workflows/transcriptome-differential-expression/workflows/Tests/badge.svg?branch=main)](https://github.com/snakemake-workflows/transcriptome-differential-expression/actions?query=branch%3Amain+workflow%3ATests)
 
-> This project tries to re-animate the project from Oxford Nanopore. CURRENTLY IT IS NOT IN A WORKING STATE. Please see their current nextflow implementation as a reference [wf-transcriptomes](https://github.com/epi2me-labs/wf-transcriptomes), which contains functionality for [differential expression](https://github.com/epi2me-labs/wf-transcriptomes#differential-expression).
 
 
------------------------------
 
 
-*Pipeline for differential gene expression (DGE) and differential transcript usage (DTU) analysis using long reads*
-
-This pipeline uses [snakemake](https://snakemake.readthedocs.io/en/stable/), [minimap2](https://github.com/lh3/minimap2), [salmon](https://combine-lab.github.io/salmon/), [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html), [DEXSeq](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html) and [stageR](https://bioconductor.org/packages/release/bioc/html/stageR.html) to automate simple [differential gene expression](https://www.ebi.ac.uk/training/online/course/functional-genomics-ii-common-technologies-and-data-analysis-methods/differential-gene) and [differential transcript usage](http://dx.doi.org/10.12688/f1000research.15398.2) workflows on long read data.
-
-If you have paired samples (e.g for example treated and untreated samples from the same individuals) use the [paired_dge_dtu](https://github.com/nanoporetech/pipeline-transcriptome-de/tree/paired_dge_dtu) branch.
-
-
-## Getting Started
-
-### Input
-
-The input files and parameters are specified in `config.yml`:
-
-- `transcriptome` - the input transcriptome.
-- `annotation` - the input annotation in GFF format.
-- `condition_a_identifier` - a string identifiying the first trait.
-- `condition_b_samples` - a string identifiying the second trait.
-
-### Output
-
-- `alignments/*.bam` - unsorted transcriptome alignments (input to `salmon`).
-- `alignments_sorted/*.bam` - sorted and indexed transcriptome alignments.
-- `counts` - counts generated by `salmon`.
-- `merged/all_counts.tsv` - the transcript count table including all samples.
-- `merged/all_counts_filtered.tsv` - the transcript count table including all samples after filtering.
-- `merged//all_gene_counts.tsv` - the gene count table including all samples.
-- `de_analysis/coldata.tsv` - the condition table used to build model matrix.
-- `de_analysis/de_params.tsv` - analysis parameters generated from `config.yml`.
-- `de_analysis/results_dge.tsv` and `de_analysis/results_dge.pdf`- results of `edgeR` differential gene expression analysis.
-- `de_analysis/results_dtu_gene.tsv`, `de_analysis/results_dtu_transcript.tsv` and `de_analysis/results_dtu.pdf` - results of differential transcript usage by `DEXSeq`.
-- `de_analysis/results_dtu_stageR.tsv` - results of the `stageR` analysis of the `DEXSeq` output.
-- `de_analysis/dtu_plots.pdf` - DTU results plot based on the `stageR` results and filtered counts.
-
-
-### Dependencies
-
-- [miniconda](https://conda.io/miniconda.html) - install it according to the [instructions](https://conda.io/docs/user-guide/install/index.html).
-- [snakemake](https://anaconda.org/bioconda/snakemake) install using `conda`.
-- [pandas](https://anaconda.org/conda-forge/pandas) - install using `conda`.
-- The rest of the dependencies are automatically installed using the `conda` feature of `snakemake`.
-
-### Layout
-
-* `README.md`
-* `Snakefile`         - master snakefile
-* `config.yml`        - YAML configuration file
-* `snakelib/`         - snakefiles collection included by the master snakefile
-* `lib/`              - python files included by analysis scripts and snakefiles
-* `scripts/`          - analysis scripts
-* `data/`             - input data needed by pipeline - use with caution to avoid bloated repo
-* `results/`          - pipeline results to be commited - use with caution to avoid bloated repo
-
-### Installation
-
-Clone the repository:
-
-```bash
-git clone https://github.com/snakemake-workflows/transriptome-differential-expression
-```
-
-### Usage
-
-Edit `config.yml` to set the input datasets and parameters then issue:
-
-On a server, e.g.:
-```bash
-snakemake --use-conda -j <num_cores> all
-```
-On a cluster, e.g.
-```bash
-snakemake --slurm --default-resources slurm_account=<your slurm account> slurm_partition=<your clusters default partition> -j <unlimited or lower> --configfile ./envs/<your config yaml> --workflow-profile ./profile/ --snakefile <path to Snakefile> --directory <desired working directory>
-``` 
-Note, that the profile offers a template cluster configuration - it needs adjusting for particular clusters. Contributions of particular configurations are welcome!
-
-### Help
-
-##### Licence and Copyright
-
-(c) 2018 Oxford Nanopore Technologies Ltd.
-(c) 2023- Lukas Hellmann & Christian Meesters (JGU Mainz, Germany)
-
-This Source Code Form is subject to the terms of the Mozilla Public
-License, v. 2.0. If a copy of the MPL was not distributed with this
-file, You can obtain one at http://mozilla.org/MPL/2.0/.
-
-#### References and Supporting Information
-
-This worflow is largely based on the approach described in the following paper:
-
-- Love MI, Soneson C and Patro R. *Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification.* F1000Research 2018, 7:952
-(doi: [10.12688/f1000research.15398.3](http://dx.doi.org/10.12688/f1000research.15398.3))
-
 
 
diff --git a/workflow/profile/config.yaml → workflow/profile/Mainz-MogonII/config.yaml b/workflow/profile/config.yaml → workflow/profile/Mainz-MogonII/config.yaml
diff --git a/workflow/profile/Mainz-MogonNHR/config.yaml b/workflow/profile/Mainz-MogonNHR/config.yaml
@@ -0,0 +1,65 @@
+default-resources:
+    slurm_account: "nhr-zdvhpc"
+    slurm_partition: "smallcpu"
+
+set-resources:
+    genome_to_transcriptome:
+        cpus_per_task: 1
+        mem_mb_per_cpu: 1800
+        runtime: "2h"
+
+    build_minimap_index:
+        cpus_per_task: 4
+        mem_mb_per_cpu: 3600
+        runtime: "1h"
+
+    map_reads:
+        cpus_per_task: 40
+        mem_mb_per_cpu: 1800
+        runtime: "3h"
+        slurm_partition: "smallcpu" # needs benchmarking
+
+    plot_samples:
+        cpus_per_task: 4
+        mem_mb_per_cpu: 1800
+        runtime: "3h"
+
+    plot_all_samples:
+        cpus_per_task: 8
+        mem_mb_per_cpu: 1800
+        runtime: "2h"
+
+    map_qc:
+        cpus_per_task: 8
+        mem_mb_per_cpu: 1800
+        runtime: "1h"
+
+    sam_sort:
+        cpus_per_task: 4
+        mem_mb_per_cpu: 1800
+        runtime: "2h"
+
+    sam_view:
+        cpus_per_task: 1
+        mem_mb_per_cpu: 1800
+        runtime: "1h"
+
+    sam_index:
+        cpus_per_task: 8
+        mem_mb_per_cpu: 1800
+        runtime: "30m"
+
+    sam_stats:
+        cpus_per_task: 8
+        mem_mb_per_cpu: 1800
+        runtime: "30m"
+
+    count_reads:
+        cpus_per_task: 8
+        mem_mb_per_cpu: 1800
+        runtime: "1h"
+
+    de_analysis:
+        cpus_per_task: 4
+        mem_mb_per_cpu: 5000
+        runtime: "1h"