Added groovy functions in lib to reduce boilerplate code #857

KevinMenden · 2021-02-17T14:58:01Z

This PR adds some of the functions written by @drpatelh to the template, specifically the classes:

Headers.groovy
Completion.groovy
additional function to NfcoreSchema.groovy

This should close #577

Wherever possible I deleted pipeline-specific code.

codecov · 2021-02-17T15:00:31Z

Codecov Report

Merging #857 (521950d) into dev (6d36282) will increase coverage by 2.01%.
The diff coverage is 19.14%.

@@            Coverage Diff             @@
##              dev     #857      +/-   ##
==========================================
+ Coverage   71.73%   73.75%   +2.01%     
==========================================
  Files          30       29       -1     
  Lines        3386     3231     -155     
==========================================
- Hits         2429     2383      -46     
+ Misses        957      848     -109

Impacted Files	Coverage Δ
nf_core/modules.py	`68.38% <7.69%> (+23.33%)`	⬆️
nf_core/__main__.py	`64.72% <75.00%> (+0.54%)`	⬆️
nf_core/lint/merge_markers.py
nf_core/bump_version.py	`86.15% <0.00%> (+0.21%)`	⬆️
nf_core/lint/__init__.py	`80.18% <0.00%> (+0.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d36282...98f7b66. Read the comment docs.

KevinMenden · 2021-02-17T16:01:40Z

Made the NfcoreSchema.groovy class now independent from Headers.groovy. Everything seems to be working, but if we only want to include the header then it will just be copy-pasting back some code (or reverting the commit), so no harm done.

apeltzer

This looks great :-)

apeltzer · 2021-02-17T17:31:21Z

I think this is good and as generic as we could have it. In cases where e.g. RNAseq needs to add some extra lib functions, this could be done in a separate file in lib/ and thus imported accordingly. That would make it possible to keep things consistent across pipelines for the majority of this functionality, while making it possible to have only tiny bits of code in the main file :-)

Great job :-)

KevinMenden · 2021-02-18T07:38:43Z

Great :) Let's see then, I would say we keep the stuff in NfcoreSchema.groovy and also Headers.groovy as I don't see anything going wrong with that header. We revert the inclusion of the Completion.groovy though if you think that's too much for now @ewels .

nf_core/pipeline-template/{{cookiecutter.name_noslash}}/lib/NfcoreSchema.groovy

ewels · 2021-02-18T07:59:54Z

nf_core/pipeline-template/{{cookiecutter.name_noslash}}/main.nf

 if (params.help) {
-    helpMessage()
+    def command = "nextflow run {{ cookiecutter.name }} --input '*_R{1,2}.fastq.gz' -profile docker"
+    log.info NfcoreSchema.params_help(workflow, params, json_schema, command)


I was thinking that it might be nice to have a new lint test that just checks for certain lines in certain files. So for example to check that this function is actually being called in main.nf (to force people to remove the old style help message)

…coreSchema.groovy Co-authored-by: Phil Ewels <phil.ewels@scilifelab.se>

into lib-functions-dev

KevinMenden · 2021-02-18T12:26:43Z

Alright I removed the Completion.groovy class, now it only has the parameter validation, the header and printing the parameters. There is some redundancy now to collect the parameter summary map for MultiQC but it's basically just one extra line.

Otherwise everything is as before. If you're happy we can just merge this (I guess we add the new linter function in the next release?)

apeltzer · 2021-02-18T12:44:38Z

Did I miss the removal of Completion to being necessary? I think its good to have it, but did we discuss/decide otherwise?

ewels · 2021-02-18T12:57:25Z

Discussion was on Slack sorry, mostly came from me. Two things:

It includes code specific to MultiQC, which is not universal in all pipelines (and it would be nice to keep these library files unedited)
It's mostly email code which we want to rewrite anyway since forever (Refactor pipeline emails #112) so I was kind of thinking that we'd have to rewrite it all again soon anyway..

Happy to discuss further, but keen to get this release out so was trying to keep things simple..

KevinMenden · 2021-02-18T13:02:01Z

Small addition: I've enclosed the --max_memory and --max_time by quotes because otherwise it would fail parameter validation.
When given via command line it's fine without quotes. But otherwise internally they get converted to nextflow memory objects, then back to string, and they lose the dot in the process (at least the max_memory) --> and then fail the schema pattern.

ewels · 2021-02-18T13:04:23Z

Sigh. This will cause more problems than just those values, every single config file in nf-core/configs uses the same syntax.

Can we just relax the parameter validation to allow spaces? I nearly did that anyway.

KevinMenden · 2021-02-18T13:10:28Z

Adjusted the pattern, removed the quotes 👌
This way it works too. I also removed the necessary dot for the duration stuff (it generated 10d internally)

apeltzer · 2021-02-18T13:21:40Z

Sorry if I missed it then - probably jumped in that thread from time to time but wasn't 100% keeping up with the development. Fine by me and it makes sense to probably update the e-mail code at once & add the functionality then to the Completion.groovy file

jfy133 · 2021-02-25T12:48:38Z

I was just playing around with @apeltzer 's port of this code into eager (already), and noticed a couple of things:

I only got parameter validation WARN messages when cancelling the pipeline, or the pipeline actually fails.

e.g. running this through to completion I get no warning that --fake_param isn't a valid parameter
```
nextflow run nf-core/eager --profile test_tsv,singularity -r add-fancy-help-json-for-james --fake_param
```
but ctrl + c'ing half-way through the same run results in the expected WARN
```
^CWARN: Unexpected parameter: fake_param
WARN: Unexpected parameter: save-aligned-intermediates
WARN: Unexpected parameter: save-trimmed
```
It would be preferred these WARNs are shown immediately when the pipeline starts running

It would be nice that running --help also includes the default values at the end of the line of each parameter.

For example in eager we do:

      --clip_forward_adaptor [str]  Specify adapter sequence to be clipped off (forward strand). Default: '${params.clip_forward_adaptor}'

Is there any limitations to the summary message at the beginning of the run? The one that is in Alex's PR looks like it gets truncated (with the Metagenomic Screening section). I wonder because eager's summary would in principle be very large... Also it looks like the sections are out of order of the schema itself, is that to be expected?

Truncation:

    Genotyping
     gatk_ug_keep_realign_bam  : false
 
 Metagenomic Screening
     metagenomic_tool          : 
 
 ------------------------------------------------------
 
 ------------------------------------------------------
 Schaffa, Schaffa, Genome Baua!
 [b1/18ca7f] Submitted process > makeBWAIndex (Mammoth_MT_Krause.fasta)

KevinMenden · 2021-02-25T12:57:57Z

Hi James, thanks for testing this!

Yes that is currently the programmed behaviour. The reasoning behind this was not to annoy users that use several 'unexpected' parameters, but printing them when failing as a help for debugging.
very good point, I will look into that!
As far as I can tell there shouldn't be any limitation or truncation .... also the parameters should be in the order of the Schema. It might be that some groovy functions change that though, so I'll have a closer look at this as well - thanks!

KevinMenden · 2021-02-25T14:09:48Z

Okay so regarding the truncation, I had a closer look at the code and if I read it right the parameter summary only prints out params that are either specified via command line or in some config file. Regarding the ordering, it should be the same as in the schema 🤔 (but maybe @drpatelh can correct me If I'm wrong here).

KevinMenden · 2021-02-25T14:31:51Z

I've added the default value to the help message, output looks like this:
So the default value is just appended in [] brackets, if there is one in the schema.

Input/output options
    --input                      [string]  Input FastQ files.
    --single_end                 [boolean] Specifies that the input is single-end reads.
    --outdir                     [string]  The output directory where the results will be saved.  [default: ./results]
    --email                      [string]  Email address for completion summary.

Reference genome options
    --genome                     [string]  Name of iGenomes reference.
    --fasta                      [string]  Path to FASTA genome file.
    --igenomes_base              [string]  Directory / URL base for iGenomes references.  [default: s3://ngi-igenomes/igenomes/]
    --igenomes_ignore            [boolean] Do not load the iGenomes reference config.

Generic options
    --help                       [boolean] Display help text.
    --publish_dir_mode           [string]  Method used to save pipeline results to output directory.  [default: copy]
    --validate_params            [boolean] Boolean whether to validate parameters against the schema at runtime  [default: true]
    --email_on_fail              [string]  Email address for completion summary, only when pipeline fails.
    --plaintext_email            [boolean] Send plain-text email instead of HTML.
    --max_multiqc_email_size     [string]  File size limit when attaching MultiQC reports to summary emails.  [default: 25.MB]
    --monochrome_logs            [boolean] Do not use coloured log outputs.
    --multiqc_config             [string]  Custom config file to supply to MultiQC.
    --tracedir                   [string]  Directory to keep pipeline Nextflow logs and reports.  [default: ${params.outdir}/pipeline_info]

Max job request options
    --max_cpus                   [integer] Maximum number of CPUs that can be requested    for any single job.  [default: 16]
    --max_memory                 [string]  Maximum amount of memory that can be requested for any single job.  [default: 128.GB]
    --max_time                   [string]  Maximum amount of time that can be requested for any single job.  [default: 240.h]

Institutional config options
    --custom_config_version      [string]  Git commit id for Institutional configs.  [default: master]
    --custom_config_base         [string]  Base directory for Institutional configs.  [default: https://raw.githubusercontent.com/nf-core/configs/master]
    --hostnames                  [string]  Institutional configs hostname.
    --config_profile_description [string]  Institutional config description.
    --config_profile_contact     [string]  Institutional config contact information.
    --config_profile_url         [string]  Institutional config URL link.

jfy133 · 2021-02-25T19:07:00Z

I'm not sure I following with the debugging reasoning though. In my opinion and experience, the most useful thing for parameter validation is that you don't accidently make a typo in your command which leads to a 'silent fail' of that particular module. Nextflow has the (pet peeve) annoying behaviour to just ignore anything supplied outside what is used inside a pipeline.

This happens relatively regularly (at least in my experience with eager) where a user reports something didn't run or a parameter they set weren't used - but it actually comes to down to a typo in the parameter (e.g. flipping _ and -). This means with the current behaviour you still wouldn't know it actually 'failed' because Nextflow runs anyway.

Therefore I would argue for any unexpected parameters are automatically warned at the beginning of the run. If you have your schema and nextflow.config properly configured - you shouldn't get lots of unexpected parameters, right (see point 3 😅 ).

OK, tested the default display and looks great! Thanks! I think this would be very useful.
Aha, ok. Then this does make sense - it looks like the schema doesn't match 100% with the nextflow.config for those two parameters (e.g. for metagneomic_screening_tool in schema it's "undefined" for some reason, whereas it's just '' in the nextflow.config, and the realign_bam thing was string in schema, not boolean. Looks good now I've fixed it 💪. I do wonder if there should be a little note in the summary header saying something like 'Parameters differing from pipeline defaults' or similar, to make it clear that there are parameters set underneath that the user might not be 'immediately' aware of?

jfy133 · 2021-02-25T20:58:20Z

Ah and one additional point that just came to mind (although might be more relevant a comment on the schema).

The section for the profile information in the help message is currently called institutional config options however other users could use the same parameters in their own private profiles.

Maybe it would be better to call it: nf-core config options instead? Then it insinuates that it's from nf-core/configs but could at the same time could also be read as 'profiles for nf-core pipelines?

KevinMenden · 2021-02-26T07:08:43Z

Good points, and I think I agree with @jfy133 on the reporting of unexpected parameter. After all, we have schema_ignore_params comma-separated list for so users can just add there extra parameters there if they find them annoying.
@ewels @apeltzer what do you think?

Okay actually the default parameters in the schema are not used in the pipeline I think, but only as a means to print the the help message and for validation. So the defaults specified there might not really be the truth. Or am I missing something and are the used somehow? 🤔
So considering this we probably should not report those schema default values as they might be wrong. We could collect whatever value is set in the nextflow.config though. Would be slightly more code but probably worth it?
This would also let users know about any parameters and their current value.

jfy133 · 2021-02-26T07:39:04Z

Okay actually the default parameters in the schema are not used in the pipeline I think, but only as a means to print the the help message and for validation. So the defaults specified there might not really be the truth. Or am I missing something and are the used somehow?

I would assume nextflow.config should be matching the schema? Otherwise that's dangerous for the documentation (and help message, ofc). Although indeed, at the moment, I guess they are not cross referencing each other currently so it could be different...

I actually sort of liked the behaviour that only display parameters that are changed from default (if a message was stated accordingly), because otherwise for eager you would be displaying 136 parameters in the summary,w hich might be overhill 😅

KevinMenden · 2021-02-26T07:44:12Z

Yes they should be the same ... and linting should tell you if they aren't. So in theory we could totally leave it like that, there are a lot of checks in place that should ideally scream if it's not the case 🤷

Okay I think that's the behaviour it has right now as well, or not? Only the help message gives you all of it.

jfy133 · 2021-02-26T07:57:12Z

Okay I think that's the behaviour it has right now as well, or not? Only the help message gives you all of it.

I thikn so! Other than suggestion the WARN always being displayed, and adding a note to the summary (saying that these are only parameters different from the default), I think everything works nicely.

A message suggesiton could be such as:

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/eager v2.3.2dev
------------------------------------------------------

[Displaying parameters that differ from pipeline default]

Core Nextflow options

Or something along those lines. It could also be at the end of the summary even.

This reverts commit 8e719d6.

ewels · 2021-03-10T09:43:42Z

Sorry both, just catching up with this discussion.

Regarding warning about unexpected params - this was before we had params.schema_ignore_params and was to avoid spamming people. For example, we have a few parameters in the -profile uppmax config which would be warned about on every run even when everything is fine, which I thought might throw newcomers especially. But I guess now we can whitelist those in the same config.

Though actually this might also be problematic, as we need to append to that config instead of overwriting it. So that won't work for now. Also needs more thought 🤔

Regarding the defaults, yes it's a bit tricky - the vanilla pipeline should match the schema (did linting not tell you that stuff was wrong @jfy133 ? Maybe we should investigate if not..?). But there's nothing to stop you running nextflow run pipeline -c myconfig --help of course or doing any other normal customisation of the default parameters.

There's no perfect option here I think. Either we use the schema and it may not display the truth of what will run by default, or we use the config and it may not match the documentation. I'm not sure that it really matters much tbh as the end result is the same - that we should really try to make sure that the schema matches the config by default!

I would prefer to go for showing the schema defaults I think. The summary should show any params that don't match the schema when launching so any differences should be shown there explicitly anyway.

+1 for ditching the institutional config options name - I never liked that. But should probably be done in a different PR as it's kind of a different thing.

Also +1 for having a log message that explains that the summary only shows stuff varying from the pipeline defaults.

KevinMenden · 2021-03-10T09:59:32Z

Okay everything ready to merge from my side. It's still printing unexpected params only when failing - that's the only thing we might want to change, or leave as is. If you're happy with them being printed out on fail only we can merge.

jfy133 · 2021-03-10T10:06:28Z

If you're happy to always warn (not just in fail) that's fine with me.

The other possible option (I don't know how complicted it would be to implement) would be to have some form of param that specifies verbosity?

In large pipelines such as eager typos unfortunately happen quite a lot which means a given flag is skipped by nextflow, so for cases such as that I would rather be more explicit that You've included an unrecognised parameter! Going to continue anyway but check your input in all cases not just on fail.

But maybe for simplier pipelines it is overkill to always report discrepencies....

and regarding schema vs config, agreed - it seems there direction nf-core is going is schema focused anyway, so a linting message that config != schema would be fine with me (I'll try and investigate why that didn't get picked up at some point).

KevinMenden · 2021-03-10T11:18:44Z

Sounds reasonable to me, too.
I would suggest to move that bit of code:

            if (unexpectedParams.size() > 0) {
                println ''
                def warn_msg = 'Found unexpected parameters:'
                for (unexpectedParam in unexpectedParams) {
                    warn_msg = warn_msg + "\n* --${unexpectedParam}: ${paramsJSON[unexpectedParam].toString()}"
                }
                log.warn warn_msg
            }

outside the catch and before the actual paremeter validation.
Then we can remove the code in the main.nf file for printing out unexpected params as well, and switch the return type o f validateParams to void.

@ewels if you're happy with that I'll do the changes and we can merge.

ewels · 2021-03-11T16:19:22Z

Ok moved it to always print. Also added a message under it saying how to get rid of it if spamming. I figured that this was a compromise 😀

including boilerplate libraries

45ab487

KevinMenden changed the base branch from master to dev February 17, 2021 14:59

nf-core deleted a comment from github-actions bot Feb 17, 2021

KevinMenden requested a review from apeltzer February 17, 2021 15:01

made schema independent from headers

2fd2dba

apeltzer approved these changes Feb 17, 2021

View reviewed changes

KevinMenden marked this pull request as ready for review February 18, 2021 07:29

ewels reviewed Feb 18, 2021

View reviewed changes

nf_core/pipeline-template/{{cookiecutter.name_noslash}}/lib/NfcoreSchema.groovy Outdated Show resolved Hide resolved

ewels reviewed Feb 18, 2021

View reviewed changes

apeltzer and others added 5 commits February 18, 2021 09:34

Update nf_core/pipeline-template/{{cookiecutter.name_noslash}}/lib/Nf…

dbda27e

…coreSchema.groovy Co-authored-by: Phil Ewels <phil.ewels@scilifelab.se>

removed completion

f2bf623

Merge branch 'lib-functions-dev' of https://github.com/KevinMenden/tools

f4204a3

into lib-functions-dev

final fixes

24bf5ad

removed multiqc function from nfcoreschema

9db2905

resources as string

7ce3224

adjusted resource pattern

65d4fc2

ewels added this to the 1.13 milestone Feb 18, 2021

add default value to help message

6531dd6

KevinMenden added 3 commits February 26, 2021 10:45

download file and create template

8e719d6

evert "download file and create template"

a2424c7

This reverts commit 8e719d6.

added info to parameter summary message

21b5587

ewels added 2 commits March 11, 2021 17:03

Merge branch 'master' into lib-functions-dev

d908bb7

Always show unexepected params. Add log explaining how to hide warning.

98f7b66

KevinMenden merged commit 729f03a into nf-core:dev Mar 11, 2021

ewels mentioned this pull request Mar 11, 2021

Fix bug in unexpected param log with false values. #871

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added groovy functions in lib to reduce boilerplate code #857

Added groovy functions in lib to reduce boilerplate code #857

KevinMenden commented Feb 17, 2021 •

edited by ewels

Loading

codecov bot commented Feb 17, 2021 •

edited

Loading

KevinMenden commented Feb 17, 2021

apeltzer left a comment

apeltzer commented Feb 17, 2021

KevinMenden commented Feb 18, 2021

ewels Feb 18, 2021

KevinMenden commented Feb 18, 2021

apeltzer commented Feb 18, 2021

ewels commented Feb 18, 2021

KevinMenden commented Feb 18, 2021

ewels commented Feb 18, 2021

KevinMenden commented Feb 18, 2021

apeltzer commented Feb 18, 2021

jfy133 commented Feb 25, 2021 •

edited

Loading

KevinMenden commented Feb 25, 2021

KevinMenden commented Feb 25, 2021

KevinMenden commented Feb 25, 2021

jfy133 commented Feb 25, 2021 •

edited

Loading

jfy133 commented Feb 25, 2021

KevinMenden commented Feb 26, 2021

jfy133 commented Feb 26, 2021

KevinMenden commented Feb 26, 2021

jfy133 commented Feb 26, 2021

ewels commented Mar 10, 2021

KevinMenden commented Mar 10, 2021

jfy133 commented Mar 10, 2021

KevinMenden commented Mar 10, 2021 •

edited by ewels

Loading

ewels commented Mar 11, 2021

Added groovy functions in lib to reduce boilerplate code #857

Added groovy functions in lib to reduce boilerplate code #857

Conversation

KevinMenden commented Feb 17, 2021 • edited by ewels Loading

codecov bot commented Feb 17, 2021 • edited Loading

Codecov Report

KevinMenden commented Feb 17, 2021

apeltzer left a comment

Choose a reason for hiding this comment

apeltzer commented Feb 17, 2021

KevinMenden commented Feb 18, 2021

ewels Feb 18, 2021

Choose a reason for hiding this comment

KevinMenden commented Feb 18, 2021

apeltzer commented Feb 18, 2021

ewels commented Feb 18, 2021

KevinMenden commented Feb 18, 2021

ewels commented Feb 18, 2021

KevinMenden commented Feb 18, 2021

apeltzer commented Feb 18, 2021

jfy133 commented Feb 25, 2021 • edited Loading

KevinMenden commented Feb 25, 2021

KevinMenden commented Feb 25, 2021

KevinMenden commented Feb 25, 2021

jfy133 commented Feb 25, 2021 • edited Loading

jfy133 commented Feb 25, 2021

KevinMenden commented Feb 26, 2021

jfy133 commented Feb 26, 2021

KevinMenden commented Feb 26, 2021

jfy133 commented Feb 26, 2021

ewels commented Mar 10, 2021

KevinMenden commented Mar 10, 2021

jfy133 commented Mar 10, 2021

KevinMenden commented Mar 10, 2021 • edited by ewels Loading

ewels commented Mar 11, 2021

KevinMenden commented Feb 17, 2021 •

edited by ewels

Loading

codecov bot commented Feb 17, 2021 •

edited

Loading

jfy133 commented Feb 25, 2021 •

edited

Loading

jfy133 commented Feb 25, 2021 •

edited

Loading

KevinMenden commented Mar 10, 2021 •

edited by ewels

Loading