Merge branch 'dev' into nf-core-template-merge-1.14

nf-core · May 11, 2021 · b716d5e · b716d5e
2 parents 2f2e9c3 + 0eb2cc4
commit b716d5e
Show file tree

Hide file tree

Showing 27 changed files with 7,900 additions and 146 deletions.
diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml
@@ -9,7 +9,6 @@ on:
     types: [completed]
   workflow_dispatch:
 
-
 env:
   AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
   AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
@@ -18,7 +17,6 @@ env:
   AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
   AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
 
-
 jobs:
   run-awstest:
     name: Run AWS full tests
@@ -33,7 +31,6 @@ jobs:
       - name: Install awscli
         run: conda install -c conda-forge awscli
       - name: Start AWS batch job
-        # TODO nf-core: You can customise AWS full pipeline tests as required
         # Add full size test data (but still relatively small datasets for few samples)
         # on the `test_full.config` test runs with only one set of parameters
         # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command

diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml
@@ -6,7 +6,6 @@ name: nf-core AWS test
 on:
   workflow_dispatch:
 
-
 env:
   AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
   AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
@@ -15,7 +14,6 @@ env:
   AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
   AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
 
-
 jobs:
   run-awstest:
     name: Run AWS tests
@@ -30,7 +28,6 @@ jobs:
       - name: Install awscli
         run: conda install -c conda-forge awscli
       - name: Start AWS batch job
-        # TODO nf-core: You can customise CI pipeline run tests as required
         # For example: adding multiple test runs with different parameters
         # Remember that you can parallelise this by using strategy.matrix
         run: |

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -37,13 +37,13 @@ jobs:
 
       - name: Build new docker image
         if: env.MATCHED_FILES
-        run: docker build --no-cache . -t nfcore/smrnaseq:dev
+        run: docker build --no-cache . -t nfcore/smrnaseq:1.1.0
 
       - name: Pull docker image
         if: ${{ !env.MATCHED_FILES }}
         run: |
           docker pull nfcore/smrnaseq:dev
-          docker tag nfcore/smrnaseq:dev nfcore/smrnaseq:dev
+          docker tag nfcore/smrnaseq:dev nfcore/smrnaseq:1.1.0
 
       - name: Install Nextflow
         env:
@@ -53,8 +53,6 @@ jobs:
           sudo mv nextflow /usr/local/bin/
 
       - name: Run pipeline with test data
-        # TODO nf-core: You can customise CI pipeline run tests as required
-        # For example: adding multiple test runs with different parameters
-        # Remember that you can parallelise this by using strategy.matrix
+        # TODO: Add more run variants with different pipeline flags?
         run: |
           nextflow run ${GITHUB_WORKSPACE} -profile test,docker
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,16 +1,116 @@
 # nf-core/smrnaseq: Changelog
 
-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
-and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [v1.1.0](https://github.com/nf-core/smrnaseq/releases/tag/1.1.0) - 2021-05-11
 
-## v1.1.0 - [date]
+### Major changes
 
-Initial release of nf-core/smrnaseq, created with the [nf-core](https://nf-co.re/) template.
+**:warning: Breaking changes!**
 
-### `Added`
+This release contains several major (potentially breaking) changes:
 
-### `Fixed`
+* The main input parameter has been changed from `--reads` to `--input` to standardize the pipeline with other nf-core pipelines
+* All parameter documentation has moved into a new `nextflow_schema.json` file
+* A `lib` folder with groovy helper classes has been added to the pipeline. This includes validation of input parameters using the schema defined in the `nextflow_schema.json` file
 
-### `Dependencies`
+### General improvements
 
-### `Deprecated`
+* remove spaces in genome headers and replace special nt by N in hairpin file for mirdeep2 to work. [[#69]](https://github.com/nf-core/smrnaseq/pull/79)
+* Accept custom genome and remove non-canonical letters in the genome. Thanks to @sdjebali. Follow up from [[#63]](https://github.com/nf-core/smrnaseq/pull/63)
+* Fix error when only one sample is in the input [[#31]](https://github.com/nf-core/smrnaseq/issues/31)
+* Made `CamelCase` pipeline parameters `snake_case` and lower case
+  * `clip_R1` -> `clip_r1`
+  * `three_prime_clip_R1` -> `three_prime_clip_r1`
+  * `saveReference` -> `save_reference`
+  * `skipQC` -> `skip_qc`
+  * `skipFastqc` -> `skip_fastqc`
+  * `skipMultiqc` -> `skip_multiqc`
+* Update with the latest `TEMPLATE` version for nf-core `1.12.1`
+* Update conda environment with new packages and updates
+* Added `--protocol custom` to allow custom adapter trimming modes [[#41]](https://github.com/nf-core/smrnaseq/issues/41)]
+* Fix error when UMI is on the reads header: [[#35](https://github.com/nf-core/smrnaseq/issues/35)]
+* Updated `params.mirtrace_species` to be properly initialised when using `--genome`, for all iGenomes species
+* Made `params.mature` and `params.hairpin` default to miRBase FTP URL so that the file is automatically downloaded if not provided
+* Allow `.fa` or `.fa.gz` files for mature and hairpin FASTA files.
+* Add full-size benchmark / test dataset to run on AWS for each release (see `test_full.config`)
+* Add `-f` flag to `gunzip` commands to deal with soft-links
+* Add `--mirtrace_protocol` parameter to allow for more flexible selection of this parameter
+* Added `--trim_galore_max_length` option [[#77](https://github.com/nf-core/smrnaseq/issues/77)]
+* Solved memory usage issue for mirtrace in the main process and in the `get_software_versions` process [[#68](https://github.com/nf-core/smrnaseq/issues/68)]
+
+### Packaged software updates
+
+* `fastqc=0.11.8` -> `0.11.9`
+* `trim-galore=0.6.3` -> `0.6.5`
+* `bowtie=1.2.2` -> `1.2.3`
+* `multiqc=1.7` -> `1.9`
+* `mirtop=0.4.22` -> `0.4.23`
+* `seqcluster=1.2.5` -> `1.2.7`
+* `htseq=0.11.2` -> `0.11.3`
+* `fastx_toolkit=0.0.14` -> `0.0.14`
+* `seqkit=0.10.1` -> `0.12.0`
+* `mirtrace=1.0.0` -> `1.0.1`
+* Added `conda-forge::python=3.7.3`
+* Added `conda-forge::markdown=3.1.1`
+* Added `conda-forge::pymdown-extensions=6.0`
+* Added `conda-forge::pygments=2.5.2`
+* Removed `conda-forge::r-markdown=1.0`
+
+## [v1.0.0](https://github.com/nf-core/smrnaseq/releases/tag/1.0.0) - 2019-09-19
+
+### Added
+
+* Figures to output documentation
+* Samtools stats for genome alignments
+* Seqkit and remove razers
+* Mirtop and razers tools
+* Adapt code and docs to [nf-core](http://nf-co.re/) template
+* Update tools and Dockerfile/Singularity to match current template
+
+### Packaged software updates
+
+* openjdk 8.0.144 -> 11.0.1
+* fastqc 0.11.7 -> 0.11.8
+* trim-galore 0.5.0 -> 0.6.2
+* bioconductor-edger 3.20.7 -> 3.26.0
+* bioconductor-limma 3.34.9 -> 3.40.0
+* conda-forge::r-data.table 1.11.4 -> 1.12.2
+* conda-forge::r-gplots 3.0.1 -> 3.0.1.1
+* conda-forge::r-r.methodss3 1.7.1 -> 1.7.1
+* htseq 0.9.1 -> 0.11.2
+* r-markdown 0.9
+* Added mirtop 0.4.18a
+* Removed razers3 3.5.3
+* Added seqkit 0.10.1-1
+* Added seqcluster 1.2.5
+* conda-forge::r-base=3.5.1 -> 3.6.1
+* conda-forge::r-statmod=1.4.30 -> 1.4.32
+* conda-forge::r-markdown=0.9 -> 1.0
+* trim-galore=0.6.2 -> 0.6.3
+* mirtop=0.4.18a -> 0.4.22
+* bioconductor-edger=3.26.0 -> 3.26.5
+* bioconductor-limma=3.40.0 -> 3.40.2
+
+## 2019-01-10
+
+### Added
+
+* "protocol" with pre-defined settings
+* miRTrace in the pipeline.
+
+### Software updates
+
+* multiqc 1.6 -> 1.7.
+
+## 2018-08-06
+
+### Added
+
+* Port original pipeline [SciLifeLab/NGI-smRNAseq](https://github.com/SciLifeLab/NGI-smRNAseq) to [nf-core/smrnaseq](https://github.com/nf-core/smrnaseq).
+* Use Bowtie 1 instead of Bowtie 2 for the alignment to host reference genome.
+* Option for sequencing centre in BAM file.
+
+### Software updates
+
+* trim-galore 0.4.5 -> 0.5.0
+* samtools 1.8 -> 1.9
+* multiqc 1.5 -> 1.6
diff --git a/Dockerfile b/Dockerfile
@@ -2,6 +2,9 @@ FROM nfcore/base:1.14
 LABEL authors="Phil Ewels <phil.ewels@scilifelab.se>, Chuan Wang <chuan.wang@scilifelab.se>, Rickard Hammarén <rickard.hammaren@scilifelab.se>, Lorena Pantano <lorena.pantano@gmail.com>" \
       description="Docker image containing all software requirements for the nf-core/smrnaseq pipeline"
 
+# Install libtbb2 package for bowtie
+RUN apt-get update && apt-get install libtbb2 -y
+
 # Install the conda environment
 COPY environment.yml /
 RUN conda env create --quiet -f /environment.yml && conda clean -a

diff --git a/README.md b/README.md
@@ -1,7 +1,5 @@
 # ![nf-core/smrnaseq](docs/images/nf-core-smrnaseq_logo.png)
 
-**Small RNA-Seq Best Practice Analysis Pipeline.**.
-
 [![GitHub Actions CI Status](https://github.com/nf-core/smrnaseq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/smrnaseq/actions)
 [![GitHub Actions Linting Status](https://github.com/nf-core/smrnaseq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/smrnaseq/actions)
 [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.04.0-brightgreen.svg)](https://www.nextflow.io/)
@@ -10,10 +8,11 @@
 [![Docker](https://img.shields.io/docker/automated/nfcore/smrnaseq.svg)](https://hub.docker.com/r/nfcore/smrnaseq)
 [![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23smrnaseq-4A154B?logo=slack)](https://nfcore.slack.com/channels/smrnaseq)
 
+[![DOI](https://zenodo.org/badge/140590861.svg)](https://zenodo.org/badge/latestdoi/140590861)
+
 ## Introduction
 
-<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->
-**nf-core/smrnaseq** is a bioinformatics best-practise analysis pipeline for
+**nf-core/smrnaseq** is a bioinformatics best-practice analysis pipeline used for small RNA sequencing data.
 
 The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
 
@@ -33,37 +32,44 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 
 4. Start running your own analysis!
 
-    <!-- TODO nf-core: Update the example "typical command" below used to run the pipeline -->
-
     ```bash
     nextflow run nf-core/smrnaseq -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input '*_R{1,2}.fastq.gz' --genome GRCh37
     ```
 
 See [usage docs](https://nf-co.re/smrnaseq/usage) for all of the available options when running the pipeline.
 
-## Pipeline Summary
-
-By default, the pipeline currently performs the following:
-
-<!-- TODO nf-core: Fill in short bullet-pointed list of default steps of pipeline -->
-
-* Sequencing quality control (`FastQC`)
-* Overall pipeline run summaries (`MultiQC`)
+## Pipeline summary
+
+1. Raw read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+2. Adapter trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
+    1. Insert Size calculation
+    2. Collapse reads ([`seqcluster`](https://seqcluster.readthedocs.io/mirna_annotation.html#processing-of-reads))
+3. Alignment against miRBase mature miRNA ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
+4. Alignment against miRBase hairpin
+    1. Unaligned reads from step 3 ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
+    2. Collapsed reads from step 2.2 ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
+5. Post-alignment processing of miRBase hairpin
+    1. Basic statistics from step 3 and step 4.1 ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
+    2. Analysis on miRBase hairpin counts  ([`edgeR`](https://bioconductor.org/packages/release/bioc/html/edgeR.html))
+         * TMM normalization and a table of top expression hairpin
+         * MDS plot clustering samples
+         * Heatmap of sample similarities
+    3. miRNA and isomiR annotation from step 4.1 ([`mirtop`](https://github.com/miRTop/mirtop))
+6. Alignment against host reference genome ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
+    1. Post-alignment processing of alignment against host reference genome ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
+7. Novel miRNAs and known miRNAs discovery ([`MiRDeep2`](https://www.mdc-berlin.de/content/mirdeep2-documentation))
+    1. Mapping against reference genome with the mapper module
+    2. Known and novel miRNA discovery with the mirdeep2 module
+8. miRNA quality control ([`mirtrace`](https://github.com/friedlanderlab/mirtrace))
+9. Present QC for raw read, alignment, and expression results ([`MultiQC`](http://multiqc.info/))
 
 ## Documentation
 
 The nf-core/smrnaseq pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/smrnaseq/usage) and [output](https://nf-co.re/smrnaseq/output).
 
-<!-- TODO nf-core: Add a brief overview of what the pipeline does and how it works -->
-
 ## Credits
 
-nf-core/smrnaseq was originally written by Phil Ewels <phil.ewels@scilifelab.se>, Chuan Wang <chuan.wang@scilifelab.se>, Rickard Hammarén <rickard.hammaren@scilifelab.se>, Lorena Pantano <lorena.pantano@gmail.com>.
-
-We thank the following people for their extensive assistance in the development
-of this pipeline:
-
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
+nf-core/smrnaseq was originally written for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden, by Phil Ewels (@ewels), Chuan Wang (@chuan-wang) and Rickard Hammarén (@Hammarn). Updated by Lorena Pantano (@lpantano) from MIT.
 
 ## Contributions and Support
 
@@ -73,9 +79,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
 
 ## Citations
 
-<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi. -->
-<!-- If you use  nf-core/smrnaseq for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
-
 You can cite the `nf-core` publication as follows:
 
 > **The nf-core framework for community-curated bioinformatics pipelines.**
@@ -85,5 +88,3 @@ You can cite the `nf-core` publication as follows:
 > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
 
 In addition, references of tools and data used in this pipeline are as follows:
-
-<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
diff --git a/assets/smrnaseq_logo.png b/assets/smrnaseq_logo.png
diff --git a/bin/collapse_mirtop.r b/bin/collapse_mirtop.r
@@ -0,0 +1,13 @@
+#!/usr/bin/env Rscript
+
+# Command line arguments
+args = commandArgs(trailingOnly=TRUE)
+
+input <- as.character(args[1:length(args)])
+
+library(data.table)
+
+df = read.delim(input[1], sep = "\t")
+counts = as.data.table(df[!duplicated(df[["UID"]]),c(3, 13:ncol(df))])
+mirna = counts[, lapply(.SD, sum), by = miRNA]
+write.table(mirna, file.path(dirname(input[1]), "mirna.tsv"), quote=FALSE, sep="\t", row.names=FALSE)