nf-core · skrakau · Dec 14, 2020 · Dec 3, 2020 · Dec 12, 2020
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -18,8 +18,9 @@ If you'd like to write some code for nf-core/mag, the standard workflow is as fo
 1. Check that there isn't already an issue about your idea in the [nf-core/mag issues](https://github.com/nf-core/mag/issues) to avoid duplicating work
     * If there isn't one already, please create one so that others know you're working on this
 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/mag repository](https://github.com/nf-core/mag) to your GitHub account
-3. Make the necessary changes / additions within your forked repository
-4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
+3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
+4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
+5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
 
 If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).
 
@@ -30,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t
 
 There are typically two types of tests that run:
 
-### Lint Tests
+### Lint tests
 
 `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
 To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
 
 If any failures or warnings are encountered, please follow the listed URL for more documentation.
 
-### Pipeline Tests
+### Pipeline tests
 
 Each `nf-core` pipeline should be set up with a minimal set of test-data.
 `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
@@ -55,3 +56,73 @@ These tests are run both with the latest available version of `Nextflow` and als
 ## Getting help
 
 For further information/help, please consult the [nf-core/mag documentation](https://nf-co.re/mag/usage) and don't hesitate to get in touch on the nf-core Slack [#mag](https://nfcore.slack.com/channels/mag) channel ([join our Slack here](https://nf-co.re/join/slack)).
+
+## Pipeline contribution conventions
+
+To make the nf-core/mag code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
+
+### Adding a new step
+
+If you wish to contribute a new step, please use the following coding standards:
+
+1. Define the corresponding input channel into your new process from the expected previous process channel
+2. Write the process block (see below).
+3. Define the output channel if needed (see below).
+4. Add any new flags/options to `nextflow.config` with a default (see below).
+5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`)
+6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
+7. Add sanity checks for all relevant parameters.
+8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
+9. Do local tests that the new code works properly and as expected.
+10. Add a new test command in `.github/workflow/ci.yaml`.
+11. If applicable add a [MultiQC](https://https://multiqc.info/) module.
+12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
+13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
+
+### Default values
+
+Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.
+
+Once there, use `nf-core schema build .` to add to `nextflow_schema.json`.
+
+### Default processes resource requirements
+
+Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
+
+The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
+
+### Naming schemes
+
+Please use the following naming schemes, to make it easy to understand what is going where.
+
+* initial process channel: `ch_output_from_<process>`
+* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`
+
+### Nextflow version bumping
+
+If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`
+
+### Software version reporting
+
+If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.
+
+Add to the script block of the process, something like the following:
+
+```bash
+<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true
+```
+
+or
+
+```bash
+<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true
+```
+
+You then need to edit the script `bin/scrape_software_versions.py` to:
+
+1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`
+2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.
+
+### Images and figures
+
+For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -13,6 +13,13 @@ Thanks for telling us about a problem with the pipeline.
 Please delete this text and anything that's not relevant from the template below:
 -->
 
+## Check Documentation
+
+I have checked the following places for your error:
+
+- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)
+- [ ] [nf-core/mag pipeline documentation](https://nf-co.re/nf-core/mag/usage)
+
 ## Description of the bug
 
 <!-- A clear and concise description of what the bug is. -->
@@ -28,6 +35,13 @@ Steps to reproduce the behaviour:
 
 <!-- A clear and concise description of what you expected to happen. -->
 
+## Log files
+
+Have you provided the following extra information/files:
+
+- [ ] The command used to run the pipeline
+- [ ] The `.nextflow.log` file <!-- this is a hidden file in the directory where you launched the pipeline -->
+
 ## System
 
 - Hardware: <!-- [e.g. HPC, Desktop, Cloud...] -->

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -13,8 +13,14 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/mag/
 
 ## PR checklist
 
-- [ ] This comment contains a description of changes (with reason)
-- [ ] `CHANGELOG.md` is updated
+- [ ] This comment contains a description of changes (with reason).
 - [ ] If you've fixed a bug or added code that should be tested, add tests!
-- [ ] Documentation in `docs` is updated
-- [ ] If necessary, also make a PR on the [nf-core/mag branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/mag)
+ - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
+ - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md)
+ - [ ] If necessary, also make a PR on the nf-core/mag _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
+- [ ] Make sure your code lints (`nf-core lint .`).
+- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
+- [ ] Usage Documentation in `docs/usage.md` is updated.
+- [ ] Output Documentation in `docs/output.md` is updated.
+- [ ] `CHANGELOG.md` is updated.
+- [ ] `README.md` is updated (including new tool citations and authors/contributors).
diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml
@@ -1,9 +1,12 @@
 # Markdownlint configuration file
-default: true,
+default: true
 line-length: false
 no-duplicate-header:
     siblings_only: true
 no-inline-html:
     allowed_elements:
         - img
         - p
+        - kbd
+        - details
+        - summary
diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@
 
 ## Introduction
 
-This pipeline is for assembly, binning, and annotation of metagenomes.
+**nf-core/mag** is a bioinformatics best-practise analysis pipeline for assembly, binning, and annotation of metagenomes.
 
 <p align="center">
     <img src="docs/images/mag_workflow.png" alt="nf-core/mag workflow overview" width="60%">
@@ -43,19 +43,21 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 
 See [usage docs](https://nf-co.re/mag/usage) for all of the available options when running the pipeline.
 
-## Documentation
-
-The nf-core/mag pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/mag/usage) and [output](https://nf-co.re/mag/output).
+## Pipeline Summary
 
-In short, it supports both short and long reads, quality trims the reads and adapters with [fastp](https://github.com/OpenGene/fastp) and [porechop](https://github.com/rrwick/Porechop), and performs basic QC with [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
+By default, the pipeline currently performs the following: it supports both short and long reads, quality trims the reads and adapters with [fastp](https://github.com/OpenGene/fastp) and [Porechop](https://github.com/rrwick/Porechop), and performs basic QC with [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
 The pipeline then:
 
-* assigns taxonomy to reads using [centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [kraken2](https://github.com/DerrickWood/kraken2/wiki)
-* performs assembly using [megahit](https://github.com/voutcn/megahit) and [spades](http://cab.spbu.ru/software/spades/), and checks their quality using [quast](http://quast.sourceforge.net/quast)
-* performs metagenome binning using [metabat2](https://bitbucket.org/berkeleylab/metabat/src/master/), and checks the quality of the genome bins using [busco](https://busco.ezlab.org/)
+* assigns taxonomy to reads using [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [Kraken2](https://github.com/DerrickWood/kraken2/wiki)
+* performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast)
+* performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/)
 * assigns taxonomy to bins using [CAT](https://github.com/dutilh/CAT)
 
-Furthermore, the pipeline creates various reports in the results directory specified, including a [multiqc](https://multiqc.info/) report summarizing some of the findings and software versions.
+Furthermore, the pipeline creates various reports in the results directory specified, including a [MultiQC](https://multiqc.info/) report summarizing some of the findings and software versions.
+
+## Documentation
+
+The nf-core/mag pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/mag/usage) and [output](https://nf-co.re/mag/output).
 
 ## Credits
 
@@ -74,13 +76,15 @@ Many thanks to the additional contributors who have helped out and/or provided s
 * [Maxime Garcia](https://github.com/MaxUlysse)
 * [Michael L Heuer](https://github.com/heuermh)
 
+<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
+
 ## Contributions and Support
 
 If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
 
 For further information or help, don't hesitate to get in touch on the [Slack `#mag` channel](https://nfcore.slack.com/channels/mag) (you can join with [this invite](https://nf-co.re/join/slack)).
 
-## Citation
+## Citations
 
 If you use nf-core/mag for your analysis, please cite it using the following doi: [10.5281/zenodo.3589527](https://doi.org/10.5281/zenodo.3589527)
 
@@ -92,3 +96,5 @@ You can cite the `nf-core` publication as follows:
 >
 > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
 > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ)
+
+<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
diff --git a/assets/nf-core-mag_logo.png b/assets/nf-core-mag_logo.png
diff --git a/conf/igenomes.config b/conf/igenomes.config
@@ -21,7 +21,7 @@ params {
       readme      = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt"
       mito_name   = "MT"
       macs_gsize  = "2.7e9"
-      blacklist   = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed"
+      blacklist   = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed"
     }
     'GRCh38' {
       fasta       = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa"
@@ -33,7 +33,7 @@ params {
       bed12       = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
       mito_name   = "chrM"
       macs_gsize  = "2.7e9"
-      blacklist   = "${baseDir}/assets/blacklists/hg38-blacklist.bed"
+      blacklist   = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
     }
     'GRCm38' {
       fasta       = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa"
@@ -46,7 +46,7 @@ params {
       readme      = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt"
       mito_name   = "MT"
       macs_gsize  = "1.87e9"
-      blacklist   = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed"
+      blacklist   = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed"
     }
     'TAIR10' {
       fasta       = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa"
@@ -270,7 +270,7 @@ params {
       bed12       = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed"
       mito_name   = "chrM"
       macs_gsize  = "2.7e9"
-      blacklist   = "${baseDir}/assets/blacklists/hg38-blacklist.bed"
+      blacklist   = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
     }
     'hg19' {
       fasta       = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa"
@@ -283,7 +283,7 @@ params {
       readme      = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt"
       mito_name   = "chrM"
       macs_gsize  = "2.7e9"
-      blacklist   = "${baseDir}/assets/blacklists/hg19-blacklist.bed"
+      blacklist   = "${projectDir}/assets/blacklists/hg19-blacklist.bed"
     }
     'mm10' {
       fasta       = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa"
@@ -296,7 +296,7 @@ params {
       readme      = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt"
       mito_name   = "chrM"
       macs_gsize  = "1.87e9"
-      blacklist   = "${baseDir}/assets/blacklists/mm10-blacklist.bed"
+      blacklist   = "${projectDir}/assets/blacklists/mm10-blacklist.bed"
     }
     'bosTau8' {
       fasta       = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa"

diff --git a/docs/images/nf-core-mag_logo.png b/docs/images/nf-core-mag_logo.png