Skip to content

Commit

Permalink
Merge branch 'dev' into nf-core-template-merge-1.14
Browse files Browse the repository at this point in the history
  • Loading branch information
KevinMenden authored May 11, 2021
2 parents 2f2e9c3 + 0eb2cc4 commit b716d5e
Show file tree
Hide file tree
Showing 27 changed files with 7,900 additions and 146 deletions.
3 changes: 0 additions & 3 deletions .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ on:
types: [completed]
workflow_dispatch:


env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Expand All @@ -18,7 +17,6 @@ env:
AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}


jobs:
run-awstest:
name: Run AWS full tests
Expand All @@ -33,7 +31,6 @@ jobs:
- name: Install awscli
run: conda install -c conda-forge awscli
- name: Start AWS batch job
# TODO nf-core: You can customise AWS full pipeline tests as required
# Add full size test data (but still relatively small datasets for few samples)
# on the `test_full.config` test runs with only one set of parameters
# Then specify `-profile test_full` instead of `-profile test` on the AWS batch command
Expand Down
3 changes: 0 additions & 3 deletions .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ name: nf-core AWS test
on:
workflow_dispatch:


env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Expand All @@ -15,7 +14,6 @@ env:
AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}


jobs:
run-awstest:
name: Run AWS tests
Expand All @@ -30,7 +28,6 @@ jobs:
- name: Install awscli
run: conda install -c conda-forge awscli
- name: Start AWS batch job
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
run: |
Expand Down
8 changes: 3 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@ jobs:
- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/smrnaseq:dev
run: docker build --no-cache . -t nfcore/smrnaseq:1.1.0

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull nfcore/smrnaseq:dev
docker tag nfcore/smrnaseq:dev nfcore/smrnaseq:dev
docker tag nfcore/smrnaseq:dev nfcore/smrnaseq:1.1.0
- name: Install Nextflow
env:
Expand All @@ -53,8 +53,6 @@ jobs:
sudo mv nextflow /usr/local/bin/
- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
# TODO: Add more run variants with different pipeline flags?
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker
116 changes: 108 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,116 @@
# nf-core/smrnaseq: Changelog

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [v1.1.0](https://github.com/nf-core/smrnaseq/releases/tag/1.1.0) - 2021-05-11

## v1.1.0 - [date]
### Major changes

Initial release of nf-core/smrnaseq, created with the [nf-core](https://nf-co.re/) template.
**:warning: Breaking changes!**

### `Added`
This release contains several major (potentially breaking) changes:

### `Fixed`
* The main input parameter has been changed from `--reads` to `--input` to standardize the pipeline with other nf-core pipelines
* All parameter documentation has moved into a new `nextflow_schema.json` file
* A `lib` folder with groovy helper classes has been added to the pipeline. This includes validation of input parameters using the schema defined in the `nextflow_schema.json` file

### `Dependencies`
### General improvements

### `Deprecated`
* remove spaces in genome headers and replace special nt by N in hairpin file for mirdeep2 to work. [[#69]](https://github.com/nf-core/smrnaseq/pull/79)
* Accept custom genome and remove non-canonical letters in the genome. Thanks to @sdjebali. Follow up from [[#63]](https://github.com/nf-core/smrnaseq/pull/63)
* Fix error when only one sample is in the input [[#31]](https://github.com/nf-core/smrnaseq/issues/31)
* Made `CamelCase` pipeline parameters `snake_case` and lower case
* `clip_R1` -> `clip_r1`
* `three_prime_clip_R1` -> `three_prime_clip_r1`
* `saveReference` -> `save_reference`
* `skipQC` -> `skip_qc`
* `skipFastqc` -> `skip_fastqc`
* `skipMultiqc` -> `skip_multiqc`
* Update with the latest `TEMPLATE` version for nf-core `1.12.1`
* Update conda environment with new packages and updates
* Added `--protocol custom` to allow custom adapter trimming modes [[#41]](https://github.com/nf-core/smrnaseq/issues/41)]
* Fix error when UMI is on the reads header: [[#35](https://github.com/nf-core/smrnaseq/issues/35)]
* Updated `params.mirtrace_species` to be properly initialised when using `--genome`, for all iGenomes species
* Made `params.mature` and `params.hairpin` default to miRBase FTP URL so that the file is automatically downloaded if not provided
* Allow `.fa` or `.fa.gz` files for mature and hairpin FASTA files.
* Add full-size benchmark / test dataset to run on AWS for each release (see `test_full.config`)
* Add `-f` flag to `gunzip` commands to deal with soft-links
* Add `--mirtrace_protocol` parameter to allow for more flexible selection of this parameter
* Added `--trim_galore_max_length` option [[#77](https://github.com/nf-core/smrnaseq/issues/77)]
* Solved memory usage issue for mirtrace in the main process and in the `get_software_versions` process [[#68](https://github.com/nf-core/smrnaseq/issues/68)]

### Packaged software updates

* `fastqc=0.11.8` -> `0.11.9`
* `trim-galore=0.6.3` -> `0.6.5`
* `bowtie=1.2.2` -> `1.2.3`
* `multiqc=1.7` -> `1.9`
* `mirtop=0.4.22` -> `0.4.23`
* `seqcluster=1.2.5` -> `1.2.7`
* `htseq=0.11.2` -> `0.11.3`
* `fastx_toolkit=0.0.14` -> `0.0.14`
* `seqkit=0.10.1` -> `0.12.0`
* `mirtrace=1.0.0` -> `1.0.1`
* Added `conda-forge::python=3.7.3`
* Added `conda-forge::markdown=3.1.1`
* Added `conda-forge::pymdown-extensions=6.0`
* Added `conda-forge::pygments=2.5.2`
* Removed `conda-forge::r-markdown=1.0`

## [v1.0.0](https://github.com/nf-core/smrnaseq/releases/tag/1.0.0) - 2019-09-19

### Added

* Figures to output documentation
* Samtools stats for genome alignments
* Seqkit and remove razers
* Mirtop and razers tools
* Adapt code and docs to [nf-core](http://nf-co.re/) template
* Update tools and Dockerfile/Singularity to match current template

### Packaged software updates

* openjdk 8.0.144 -> 11.0.1
* fastqc 0.11.7 -> 0.11.8
* trim-galore 0.5.0 -> 0.6.2
* bioconductor-edger 3.20.7 -> 3.26.0
* bioconductor-limma 3.34.9 -> 3.40.0
* conda-forge::r-data.table 1.11.4 -> 1.12.2
* conda-forge::r-gplots 3.0.1 -> 3.0.1.1
* conda-forge::r-r.methodss3 1.7.1 -> 1.7.1
* htseq 0.9.1 -> 0.11.2
* r-markdown 0.9
* Added mirtop 0.4.18a
* Removed razers3 3.5.3
* Added seqkit 0.10.1-1
* Added seqcluster 1.2.5
* conda-forge::r-base=3.5.1 -> 3.6.1
* conda-forge::r-statmod=1.4.30 -> 1.4.32
* conda-forge::r-markdown=0.9 -> 1.0
* trim-galore=0.6.2 -> 0.6.3
* mirtop=0.4.18a -> 0.4.22
* bioconductor-edger=3.26.0 -> 3.26.5
* bioconductor-limma=3.40.0 -> 3.40.2

## 2019-01-10

### Added

* "protocol" with pre-defined settings
* miRTrace in the pipeline.

### Software updates

* multiqc 1.6 -> 1.7.

## 2018-08-06

### Added

* Port original pipeline [SciLifeLab/NGI-smRNAseq](https://github.com/SciLifeLab/NGI-smRNAseq) to [nf-core/smrnaseq](https://github.com/nf-core/smrnaseq).
* Use Bowtie 1 instead of Bowtie 2 for the alignment to host reference genome.
* Option for sequencing centre in BAM file.

### Software updates

* trim-galore 0.4.5 -> 0.5.0
* samtools 1.8 -> 1.9
* multiqc 1.5 -> 1.6
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ FROM nfcore/base:1.14
LABEL authors="Phil Ewels <phil.ewels@scilifelab.se>, Chuan Wang <chuan.wang@scilifelab.se>, Rickard Hammarén <rickard.hammaren@scilifelab.se>, Lorena Pantano <lorena.pantano@gmail.com>" \
description="Docker image containing all software requirements for the nf-core/smrnaseq pipeline"

# Install libtbb2 package for bowtie
RUN apt-get update && apt-get install libtbb2 -y

# Install the conda environment
COPY environment.yml /
RUN conda env create --quiet -f /environment.yml && conda clean -a
Expand Down
55 changes: 28 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# ![nf-core/smrnaseq](docs/images/nf-core-smrnaseq_logo.png)

**Small RNA-Seq Best Practice Analysis Pipeline.**.

[![GitHub Actions CI Status](https://github.com/nf-core/smrnaseq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/smrnaseq/actions)
[![GitHub Actions Linting Status](https://github.com/nf-core/smrnaseq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/smrnaseq/actions)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.04.0-brightgreen.svg)](https://www.nextflow.io/)
Expand All @@ -10,10 +8,11 @@
[![Docker](https://img.shields.io/docker/automated/nfcore/smrnaseq.svg)](https://hub.docker.com/r/nfcore/smrnaseq)
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23smrnaseq-4A154B?logo=slack)](https://nfcore.slack.com/channels/smrnaseq)

[![DOI](https://zenodo.org/badge/140590861.svg)](https://zenodo.org/badge/latestdoi/140590861)

## Introduction

<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->
**nf-core/smrnaseq** is a bioinformatics best-practise analysis pipeline for
**nf-core/smrnaseq** is a bioinformatics best-practice analysis pipeline used for small RNA sequencing data.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Expand All @@ -33,37 +32,44 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

4. Start running your own analysis!

<!-- TODO nf-core: Update the example "typical command" below used to run the pipeline -->

```bash
nextflow run nf-core/smrnaseq -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input '*_R{1,2}.fastq.gz' --genome GRCh37
```

See [usage docs](https://nf-co.re/smrnaseq/usage) for all of the available options when running the pipeline.

## Pipeline Summary

By default, the pipeline currently performs the following:

<!-- TODO nf-core: Fill in short bullet-pointed list of default steps of pipeline -->

* Sequencing quality control (`FastQC`)
* Overall pipeline run summaries (`MultiQC`)
## Pipeline summary

1. Raw read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Adapter trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
1. Insert Size calculation
2. Collapse reads ([`seqcluster`](https://seqcluster.readthedocs.io/mirna_annotation.html#processing-of-reads))
3. Alignment against miRBase mature miRNA ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
4. Alignment against miRBase hairpin
1. Unaligned reads from step 3 ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
2. Collapsed reads from step 2.2 ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
5. Post-alignment processing of miRBase hairpin
1. Basic statistics from step 3 and step 4.1 ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
2. Analysis on miRBase hairpin counts ([`edgeR`](https://bioconductor.org/packages/release/bioc/html/edgeR.html))
* TMM normalization and a table of top expression hairpin
* MDS plot clustering samples
* Heatmap of sample similarities
3. miRNA and isomiR annotation from step 4.1 ([`mirtop`](https://github.com/miRTop/mirtop))
6. Alignment against host reference genome ([`Bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
1. Post-alignment processing of alignment against host reference genome ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
7. Novel miRNAs and known miRNAs discovery ([`MiRDeep2`](https://www.mdc-berlin.de/content/mirdeep2-documentation))
1. Mapping against reference genome with the mapper module
2. Known and novel miRNA discovery with the mirdeep2 module
8. miRNA quality control ([`mirtrace`](https://github.com/friedlanderlab/mirtrace))
9. Present QC for raw read, alignment, and expression results ([`MultiQC`](http://multiqc.info/))

## Documentation

The nf-core/smrnaseq pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/smrnaseq/usage) and [output](https://nf-co.re/smrnaseq/output).

<!-- TODO nf-core: Add a brief overview of what the pipeline does and how it works -->

## Credits

nf-core/smrnaseq was originally written by Phil Ewels <phil.ewels@scilifelab.se>, Chuan Wang <chuan.wang@scilifelab.se>, Rickard Hammarén <rickard.hammaren@scilifelab.se>, Lorena Pantano <lorena.pantano@gmail.com>.

We thank the following people for their extensive assistance in the development
of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
nf-core/smrnaseq was originally written for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden, by Phil Ewels (@ewels), Chuan Wang (@chuan-wang) and Rickard Hammarén (@Hammarn). Updated by Lorena Pantano (@lpantano) from MIT.

## Contributions and Support

Expand All @@ -73,9 +79,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
## Citations
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi. -->
<!-- If you use nf-core/smrnaseq for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
You can cite the `nf-core` publication as follows:
> **The nf-core framework for community-curated bioinformatics pipelines.**
Expand All @@ -85,5 +88,3 @@ You can cite the `nf-core` publication as follows:
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
In addition, references of tools and data used in this pipeline are as follows:
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
Binary file added assets/smrnaseq_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions bin/collapse_mirtop.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env Rscript

# Command line arguments
args = commandArgs(trailingOnly=TRUE)

input <- as.character(args[1:length(args)])

library(data.table)

df = read.delim(input[1], sep = "\t")
counts = as.data.table(df[!duplicated(df[["UID"]]),c(3, 13:ncol(df))])
mirna = counts[, lapply(.SD, sum), by = miRNA]
write.table(mirna, file.path(dirname(input[1]), "mirna.tsv"), quote=FALSE, sep="\t", row.names=FALSE)
Loading

0 comments on commit b716d5e

Please sign in to comment.