diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index e77ade80..f5bce0a9 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -18,8 +18,9 @@ If you'd like to write some code for nf-core/mag, the standard workflow is as fo 1. Check that there isn't already an issue about your idea in the [nf-core/mag issues](https://github.com/nf-core/mag/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/mag repository](https://github.com/nf-core/mag) to your GitHub account -3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) +4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -30,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t There are typically two types of tests that run: -### Lint Tests +### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. -### Pipeline Tests +### Pipeline tests Each `nf-core` pipeline should be set up with a minimal set of test-data. `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. @@ -55,3 +56,73 @@ These tests are run both with the latest available version of `Nextflow` and als ## Getting help For further information/help, please consult the [nf-core/mag documentation](https://nf-co.re/mag/usage) and don't hesitate to get in touch on the nf-core Slack [#mag](https://nfcore.slack.com/channels/mag) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Pipeline contribution conventions + +To make the nf-core/mag code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. + +### Adding a new step + +If you wish to contribute a new step, please use the following coding standards: + +1. Define the corresponding input channel into your new process from the expected previous process channel +2. Write the process block (see below). +3. Define the output channel if needed (see below). +4. Add any new flags/options to `nextflow.config` with a default (see below). +5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`) +6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). +7. Add sanity checks for all relevant parameters. +8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. +9. Do local tests that the new code works properly and as expected. +10. Add a new test command in `.github/workflow/ci.yaml`. +11. If applicable add a [MultiQC](https://https://multiqc.info/) module. +12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. +13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. + +### Default values + +Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. + +Once there, use `nf-core schema build .` to add to `nextflow_schema.json`. + +### Default processes resource requirements + +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. + +The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block. + +### Naming schemes + +Please use the following naming schemes, to make it easy to understand what is going where. + +* initial process channel: `ch_output_from_` +* intermediate and terminal channels: `ch__for_` + +### Nextflow version bumping + +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` + +### Software version reporting + +If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process. + +Add to the script block of the process, something like the following: + +```bash + --version &> v_.txt 2>&1 || true +``` + +or + +```bash + --help | head -n 1 &> v_.txt 2>&1 || true +``` + +You then need to edit the script `bin/scrape_software_versions.py` to: + +1. Add a Python regex for your tool's `--version` output (as in stored in the `v_.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1` +2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC. + +### Images and figures + +For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 064b8190..00af54ef 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -13,6 +13,13 @@ Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below: --> +## Check Documentation + +I have checked the following places for your error: + +- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) +- [ ] [nf-core/mag pipeline documentation](https://nf-co.re/nf-core/mag/usage) + ## Description of the bug @@ -28,6 +35,13 @@ Steps to reproduce the behaviour: +## Log files + +Have you provided the following extra information/files: + +- [ ] The command used to run the pipeline +- [ ] The `.nextflow.log` file + ## System - Hardware: diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 2b651d4a..16868d58 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -13,8 +13,14 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/mag/ ## PR checklist -- [ ] This comment contains a description of changes (with reason) -- [ ] `CHANGELOG.md` is updated +- [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] Documentation in `docs` is updated -- [ ] If necessary, also make a PR on the [nf-core/mag branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/mag) + - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py` + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/mag _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint .`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Usage Documentation in `docs/usage.md` is updated. +- [ ] Output Documentation in `docs/output.md` is updated. +- [ ] `CHANGELOG.md` is updated. +- [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml index 76afe932..8d7eb53b 100644 --- a/.github/markdownlint.yml +++ b/.github/markdownlint.yml @@ -1,5 +1,5 @@ # Markdownlint configuration file -default: true, +default: true line-length: false no-duplicate-header: siblings_only: true @@ -7,3 +7,6 @@ no-inline-html: allowed_elements: - img - p + - kbd + - details + - summary diff --git a/README.md b/README.md index 5db996bc..875f6340 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ ## Introduction -This pipeline is for assembly, binning, and annotation of metagenomes. +**nf-core/mag** is a bioinformatics best-practise analysis pipeline for assembly, binning, and annotation of metagenomes.

nf-core/mag workflow overview @@ -43,19 +43,21 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool See [usage docs](https://nf-co.re/mag/usage) for all of the available options when running the pipeline. -## Documentation - -The nf-core/mag pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/mag/usage) and [output](https://nf-co.re/mag/output). +## Pipeline Summary -In short, it supports both short and long reads, quality trims the reads and adapters with [fastp](https://github.com/OpenGene/fastp) and [porechop](https://github.com/rrwick/Porechop), and performs basic QC with [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). +By default, the pipeline currently performs the following: it supports both short and long reads, quality trims the reads and adapters with [fastp](https://github.com/OpenGene/fastp) and [Porechop](https://github.com/rrwick/Porechop), and performs basic QC with [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The pipeline then: -* assigns taxonomy to reads using [centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [kraken2](https://github.com/DerrickWood/kraken2/wiki) -* performs assembly using [megahit](https://github.com/voutcn/megahit) and [spades](http://cab.spbu.ru/software/spades/), and checks their quality using [quast](http://quast.sourceforge.net/quast) -* performs metagenome binning using [metabat2](https://bitbucket.org/berkeleylab/metabat/src/master/), and checks the quality of the genome bins using [busco](https://busco.ezlab.org/) +* assigns taxonomy to reads using [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [Kraken2](https://github.com/DerrickWood/kraken2/wiki) +* performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast) +* performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/) * assigns taxonomy to bins using [CAT](https://github.com/dutilh/CAT) -Furthermore, the pipeline creates various reports in the results directory specified, including a [multiqc](https://multiqc.info/) report summarizing some of the findings and software versions. +Furthermore, the pipeline creates various reports in the results directory specified, including a [MultiQC](https://multiqc.info/) report summarizing some of the findings and software versions. + +## Documentation + +The nf-core/mag pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/mag/usage) and [output](https://nf-co.re/mag/output). ## Credits @@ -74,13 +76,15 @@ Many thanks to the additional contributors who have helped out and/or provided s * [Maxime Garcia](https://github.com/MaxUlysse) * [Michael L Heuer](https://github.com/heuermh) + + ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). For further information or help, don't hesitate to get in touch on the [Slack `#mag` channel](https://nfcore.slack.com/channels/mag) (you can join with [this invite](https://nf-co.re/join/slack)). -## Citation +## Citations If you use nf-core/mag for your analysis, please cite it using the following doi: [10.5281/zenodo.3589527](https://doi.org/10.5281/zenodo.3589527) @@ -92,3 +96,5 @@ You can cite the `nf-core` publication as follows: > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) + + diff --git a/assets/nf-core-mag_logo.png b/assets/nf-core-mag_logo.png index d663fd9b..c941b08d 100644 Binary files a/assets/nf-core-mag_logo.png and b/assets/nf-core-mag_logo.png differ diff --git a/conf/igenomes.config b/conf/igenomes.config index caeafceb..31b7ee61 100644 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -21,7 +21,7 @@ params { readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" mito_name = "MT" macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" + blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed" } 'GRCh38' { fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" @@ -33,7 +33,7 @@ params { bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" mito_name = "chrM" macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" } 'GRCm38' { fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" @@ -46,7 +46,7 @@ params { readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" mito_name = "MT" macs_gsize = "1.87e9" - blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" + blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed" } 'TAIR10' { fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" @@ -270,7 +270,7 @@ params { bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" mito_name = "chrM" macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" } 'hg19' { fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" @@ -283,7 +283,7 @@ params { readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" mito_name = "chrM" macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" + blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed" } 'mm10' { fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" @@ -296,7 +296,7 @@ params { readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" mito_name = "chrM" macs_gsize = "1.87e9" - blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" + blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed" } 'bosTau8' { fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" diff --git a/docs/images/nf-core-mag_logo.png b/docs/images/nf-core-mag_logo.png index 6707498b..c941b08d 100644 Binary files a/docs/images/nf-core-mag_logo.png and b/docs/images/nf-core-mag_logo.png differ