Merge pull request #119 from JoseEspinosa/my_dsl2

Add new version syntax based on yml files to the pipeline
nf-core · Nov 12, 2021 · 3e25b1c · 3e25b1c
2 parents 0444a9c + 0019e48
commit 3e25b1c
Show file tree

Hide file tree

Showing 41 changed files with 614 additions and 434 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -61,23 +61,21 @@ For further information/help, please consult the [nf-core/smrnaseq documentation
 
 To make the nf-core/smrnaseq code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
 
-### Adding a new step
-
-If you wish to contribute a new step, please use the following coding standards:
-
-1. Define the corresponding input channel into your new process from the expected previous process channel
-2. Write the process block (see below).
-3. Define the output channel if needed (see below).
-4. Add any new flags/options to `nextflow.config` with a default (see below).
-5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build`).
-6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
-7. Add sanity checks for all relevant parameters.
-8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
-9. Do local tests that the new code works properly and as expected.
-10. Add a new test command in `.github/workflow/ci.yml`.
-11. If applicable add a [MultiQC](https://https://multiqc.info/) module.
-12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
-13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
+### Adding a new step or module
+
+If you wish to contribute a new step or module please see the [official guidelines](https://nf-co.re/developers/adding_modules#new-module-guidelines-and-pr-review-checklist) and use the following coding standards:
+
+1. Add any new flags/options to `nextflow.config` with a default (see section below).
+2. Add any new flags/options to `nextflow_schema.json` with help text via `nf-core schema build`.
+3. Add sanity checks for all relevant parameters.
+4. Perform local tests to validate that the new code works as expected.
+5. If applicable, add a new test command in `.github/workflow/ci.yml`.
+6. Add any descriptions of output files to `docs/output.md`.
+7. Do local tests that the new code works properly and as expected.
+8. Add a new test command in `.github/workflow/ci.yml`.
+9. If applicable add a [MultiQC](https://https://multiqc.info/) module.
+10. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
+11. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
 
 ### Default values
 
@@ -102,27 +100,6 @@ Please use the following naming schemes, to make it easy to understand what is g
 
 If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`
 
-### Software version reporting
-
-If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.
-
-Add to the script block of the process, something like the following:
-
-```bash
-<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true
-```
-
-or
-
-```bash
-<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true
-```
-
-You then need to edit the script `bin/scrape_software_versions.py` to:
-
-1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`
-2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.
-
 ### Images and figures
 
 For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -1,5 +1,6 @@
 lint:
   files_unchanged:
+    - .github/CONTRIBUTING.md
     - .markdownlint.yml
     - assets/email_template.html
     - assets/email_template.txt
@@ -8,3 +9,4 @@ lint:
   files_exist:
     - bin/scrape_software_versions.py
     - modules/local/get_software_versions.nf
+
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,7 @@
 
 ## v1.3.0dev - [2021-09-15]
 
+* Software version(s) will now be reported for every module imported during a given pipeline execution
 * Adapted DSL 2.0
 * Updated `nextflow_schema.json` should now display correctly on Nextflow Tower
 * Added mirtop logs to multiqc

diff --git a/bin/edgeR_miRBase.r b/bin/edgeR_miRBase.r
@@ -4,44 +4,17 @@
 args = commandArgs(trailingOnly=TRUE)
 
 input <- as.character(args[1:length(args)])
-# .libPaths( c( ".",  .libPaths()) )
-# install.packages("BiocManager", dependencies=TRUE, repos='http://cloud.r-project.org/')
-
-# # Load / install required packages
-# if (!require("limma")){
-#     BiocManager::install("limma", suppressUpdates=TRUE)
-#     library("limma")
-# }
-
-# if (!require("edgeR")){
-#     BiocManager::install("edgeR", suppressUpdates=TRUE)
-#     library("edgeR")
-# }
-
-# if (!require("statmod")){
-#     install.packages("statmod", dependencies=TRUE, repos='http://cloud.r-project.org/')
-#     library("statmod")
-# }
-
-# if (!require("data.table")){
-#     install.packages("data.table", dependencies=TRUE, repos='http://cloud.r-project.org/')
-#     library("data.table")
-# }
-
-# if (!require("gplots")) {
-#     install.packages("gplots", dependencies=TRUE, repos='http://cloud.r-project.org/')
-#     library("gplots")
-# }
-
-# if (!require("methods")) {
-#     install.packages("methods", dependencies=TRUE, repos='http://cloud.r-project.org/')
-#     library("methods")
-# }
+library("limma")
+library("edgeR")
+library("statmod")
+library("data.table")
+library("gplots")
+library("methods")
 
 # Put mature and hairpin count files in separated file lists
 filelist<-list()
-filelist[[1]]<-input[grep(".mature.*stats",input)]
-filelist[[2]]<-input[grep(".hairpin.*stats",input)]
+filelist[[1]]<-input[grep(".mature.sorted",input)]
+filelist[[2]]<-input[grep(".hairpin.sorted",input)]
 names(filelist)<-c("mature","hairpin")
 print(filelist)
 
@@ -53,12 +26,11 @@ for (i in 1:2) {
     unmapped<-do.call("cbind", lapply(filelist[[i]], fread, header=FALSE, select=c(4)))
     data<-as.data.frame(data)
     unmapped<-as.data.frame(unmapped)
-
     temp <- fread(filelist[[i]][1],header=FALSE, select=c(1))
     rownames(data)<-temp$V1
     rownames(unmapped)<-temp$V1
-    colnames(data)<-gsub(".stats","",basename(filelist[[i]]))
-    colnames(unmapped)<-gsub(".stats","",basename(filelist[[i]]))
+    colnames(data)<-gsub("_mature.*","",basename(filelist[[i]]))
+    colnames(unmapped)<-gsub("_mature.*","",basename(filelist[[i]]))
 
     data<-data[rownames(data)!="*",,drop=FALSE]
     unmapped<-unmapped[rownames(unmapped)=="*",,drop=FALSE]
@@ -114,7 +86,7 @@ for (i in 1:2) {
         write.table(MDSdata$distance.matrix, paste(header,"_edgeR_MDS_distance_matrix.txt",sep=""), quote=FALSE, sep="\t")
 
         # Print plot x,y co-ordinates to file
-        MDSxy = MDSdata$cmdscale.out
+        MDSxy = data.frame(x=MDSdata$x, y=MDSdata$y)
         colnames(MDSxy) = c(paste(MDSdata$axislabel, '1'), paste(MDSdata$axislabel, '2'))
 
         write.table(MDSxy, paste(header,"_edgeR_MDS_plot_coordinates.txt",sep=""), quote=FALSE, sep="\t")

diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py
diff --git a/conf/test_full.config b/conf/test_full.config
@@ -11,14 +11,15 @@
 */
 
 params {
+    max_memory = 12.GB
+    max_cpus = 8
     config_profile_name        = 'Full test profile'
     config_profile_description = 'Full test dataset to check pipeline function'
 
     // Input data for full size test
     input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq-better-input/testdata/samplesheet.csv'
-
-
-  genome = 'GRCh37'
+    genome = 'GRCh37'
+    mirtrace_species = "hsa"
 }
 
 
diff --git a/docs/usage.md b/docs/usage.md
@@ -22,9 +22,9 @@ It should point to the 3-letter species name used by `miRBase`.
 
 ### miRNA related files
 
-* `mirna_gtf`: If not supplied by the user, then `mirna_gtf` will point to the latest GFF3 file in miRbase: `ftp://mirbase.org/pub/mirbase/CURRENT/genomes/${params.mirtrace_species}.gff3`
-* `mature`: points to the FASTA file of mature miRNA sequences. `ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz`
-* `hairpin`: points to the FASTA file of precursor miRNA sequences. `ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz`
+* `mirna_gtf`: If not supplied by the user, then `mirna_gtf` will point to the latest GFF3 file in miRbase: `https://mirbase.org/ftp/CURRENT/genomes/${params.mirtrace_species}.gff3`
+* `mature`: points to the FASTA file of mature miRNA sequences. `https://mirbase.org/ftp/CURRENT/mature.fa.gz`
+* `hairpin`: points to the FASTA file of precursor miRNA sequences. `https://mirbase.org/ftp/CURRENT/hairpin.fa.gz`
 
 ### Genome
 

diff --git a/lib/NfcoreTemplate.groovy b/lib/NfcoreTemplate.groovy
@@ -19,27 +19,16 @@ class NfcoreTemplate {
     }
 
     //
-    // Check params.hostnames
+    //  Warn if a -profile or Nextflow config has not been provided to run the pipeline
     //
-    public static void hostName(workflow, params, log) {
-        Map colors = logColours(params.monochrome_logs)
-        if (params.hostnames) {
-            try {
-                def hostname = "hostname".execute().text.trim()
-                params.hostnames.each { prof, hnames ->
-                    hnames.each { hname ->
-                        if (hostname.contains(hname) && !workflow.profile.contains(prof)) {
-                            log.info "=${colors.yellow}====================================================${colors.reset}=\n" +
-                                "${colors.yellow}WARN: You are running with `-profile $workflow.profile`\n" +
-                                "      but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\n" +
-                                "      ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\n" +
-                                "=${colors.yellow}====================================================${colors.reset}="
-                        }
-                    }
-                }
-            } catch (Exception e) {
-                log.warn "[$workflow.manifest.name] Could not determine 'hostname' - skipping check. Reason: ${e.message}."
-            }
+    public static void checkConfigProvided(workflow, log) {
+        if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) {
+            log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" +
+                    "This will be dependent on your local compute enviroment but can be acheived via one or more of the following:\n" +
+                    "   (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" +
+                    "   (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" +
+                    "   (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" +
+                    "Please refer to the quick start section and usage docs for the pipeline.\n "
         }
     }
 
@@ -168,7 +157,6 @@ class NfcoreTemplate {
                 log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-"
             }
         } else {
-            hostName(workflow, params, log)
             log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-"
         }
     }

diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy
@@ -60,6 +60,9 @@ class WorkflowMain {
         // Print parameter summary log to screen
         log.info paramsSummaryLog(workflow, params, log)
 
+        // Check that a -profile or Nextflow config has been provided to run the pipeline
+        NfcoreTemplate.checkConfigProvided(workflow, log)
+
         // Check that conda channels are set-up correctly
         if (params.enable_conda) {
             Utils.checkCondaChannels(log)
@@ -68,9 +71,6 @@ class WorkflowMain {
         // Check AWS batch settings
         NfcoreTemplate.awsBatch(workflow, params)
 
-        // Check the hostnames against configured profiles
-        NfcoreTemplate.hostName(workflow, params, log)
-
         // Check input has been provided
         if (!params.input) {
             log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'"

diff --git a/modules.json b/modules.json
@@ -6,6 +6,9 @@
             "cat/fastq": {
                 "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae"
             },
+            "custom/dumpsoftwareversions": {
+                "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae"
+            },
             "fastqc": {
                 "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae"
             },

diff --git a/modules/local/bowtie_genome.nf b/modules/local/bowtie_genome.nf
@@ -1,11 +1,15 @@
 // Import generic module functions
-include { saveFiles; initOptions; getSoftwareName } from './functions'
+include { saveFiles; initOptions; getSoftwareName; getProcessName } from './functions'
 
 params.options = [:]
 options        = initOptions(params.options)
 
 process INDEX_GENOME {
+    tag "$fasta"
     label 'process_medium'
+    publishDir "${params.outdir}",
+        mode: params.publish_dir_mode,
+        saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process)+"/${options.suffix}", meta:meta, publish_by_meta:['id']) }
 
     conda (params.enable_conda ? 'bioconda::bowtie=1.3.0-2' : null)
     if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
@@ -18,21 +22,23 @@ process INDEX_GENOME {
     path fasta
 
     output:
-    path 'genome*ebwt' , emit: bt_indices
-    path 'genome.edited.fa' , emit: fasta
+    path 'genome*ebwt'     , emit: bt_indices
+    path 'genome.edited.fa', emit: fasta
+    path "versions.yml"    , emit: versions
 
     script:
-    def software = getSoftwareName(task.process)
-
     """
-
     # Remove any special base characters from reference genome FASTA file
     sed '/^[^>]/s/[^ATGCatgc]/N/g' $fasta > genome.edited.fa
     sed -i 's/ .*//' genome.edited.fa
 
     # Build bowtie index
     bowtie-build genome.edited.fa genome --threads ${task.cpus}
 
+    cat <<-END_VERSIONS > versions.yml
+    ${getProcessName(task.process)}:
+        bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
+    END_VERSIONS
     """
 
 }