Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added groovy functions in lib to reduce boilerplate code #857

Merged
merged 15 commits into from
Mar 11, 2021

Conversation

KevinMenden
Copy link
Contributor

@KevinMenden KevinMenden commented Feb 17, 2021

This PR adds some of the functions written by @drpatelh to the template, specifically the classes:

  • Headers.groovy
  • Completion.groovy
  • additional function to NfcoreSchema.groovy

This should close #577

Wherever possible I deleted pipeline-specific code.

@KevinMenden KevinMenden changed the base branch from master to dev February 17, 2021 14:59
@nf-core nf-core deleted a comment from github-actions bot Feb 17, 2021
@codecov
Copy link

codecov bot commented Feb 17, 2021

Codecov Report

Merging #857 (521950d) into dev (6d36282) will increase coverage by 2.01%.
The diff coverage is 19.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #857      +/-   ##
==========================================
+ Coverage   71.73%   73.75%   +2.01%     
==========================================
  Files          30       29       -1     
  Lines        3386     3231     -155     
==========================================
- Hits         2429     2383      -46     
+ Misses        957      848     -109     
Impacted Files Coverage Δ
nf_core/modules.py 68.38% <7.69%> (+23.33%) ⬆️
nf_core/__main__.py 64.72% <75.00%> (+0.54%) ⬆️
nf_core/lint/merge_markers.py
nf_core/bump_version.py 86.15% <0.00%> (+0.21%) ⬆️
nf_core/lint/__init__.py 80.18% <0.00%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d36282...98f7b66. Read the comment docs.

@KevinMenden
Copy link
Contributor Author

Made the NfcoreSchema.groovy class now independent from Headers.groovy. Everything seems to be working, but if we only want to include the header then it will just be copy-pasting back some code (or reverting the commit), so no harm done.

Copy link
Member

@apeltzer apeltzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great :-)

@apeltzer
Copy link
Member

I think this is good and as generic as we could have it. In cases where e.g. RNAseq needs to add some extra lib functions, this could be done in a separate file in lib/ and thus imported accordingly. That would make it possible to keep things consistent across pipelines for the majority of this functionality, while making it possible to have only tiny bits of code in the main file :-)

Great job :-)

@KevinMenden KevinMenden marked this pull request as ready for review February 18, 2021 07:29
@KevinMenden
Copy link
Contributor Author

Great :) Let's see then, I would say we keep the stuff in NfcoreSchema.groovy and also Headers.groovy as I don't see anything going wrong with that header. We revert the inclusion of the Completion.groovy though if you think that's too much for now @ewels .

if (params.help) {
helpMessage()
def command = "nextflow run {{ cookiecutter.name }} --input '*_R{1,2}.fastq.gz' -profile docker"
log.info NfcoreSchema.params_help(workflow, params, json_schema, command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it might be nice to have a new lint test that just checks for certain lines in certain files. So for example to check that this function is actually being called in main.nf (to force people to remove the old style help message)

@KevinMenden
Copy link
Contributor Author

Alright I removed the Completion.groovy class, now it only has the parameter validation, the header and printing the parameters. There is some redundancy now to collect the parameter summary map for MultiQC but it's basically just one extra line.

Otherwise everything is as before. If you're happy we can just merge this (I guess we add the new linter function in the next release?)

@apeltzer
Copy link
Member

Did I miss the removal of Completion to being necessary? I think its good to have it, but did we discuss/decide otherwise?

@ewels
Copy link
Member

ewels commented Feb 18, 2021

Discussion was on Slack sorry, mostly came from me. Two things:

  • It includes code specific to MultiQC, which is not universal in all pipelines (and it would be nice to keep these library files unedited)
  • It's mostly email code which we want to rewrite anyway since forever (Refactor pipeline emails #112) so I was kind of thinking that we'd have to rewrite it all again soon anyway..

Happy to discuss further, but keen to get this release out so was trying to keep things simple..

@KevinMenden
Copy link
Contributor Author

Small addition: I've enclosed the --max_memory and --max_time by quotes because otherwise it would fail parameter validation.
When given via command line it's fine without quotes. But otherwise internally they get converted to nextflow memory objects, then back to string, and they lose the dot in the process (at least the max_memory) --> and then fail the schema pattern.

@ewels
Copy link
Member

ewels commented Feb 18, 2021

Sigh. This will cause more problems than just those values, every single config file in nf-core/configs uses the same syntax.

Can we just relax the parameter validation to allow spaces? I nearly did that anyway.

@KevinMenden
Copy link
Contributor Author

Adjusted the pattern, removed the quotes 👌
This way it works too. I also removed the necessary dot for the duration stuff (it generated 10d internally)

@apeltzer
Copy link
Member

Sorry if I missed it then - probably jumped in that thread from time to time but wasn't 100% keeping up with the development. Fine by me and it makes sense to probably update the e-mail code at once & add the functionality then to the Completion.groovy file

@ewels ewels added this to the 1.13 milestone Feb 18, 2021
@jfy133
Copy link
Member

jfy133 commented Feb 25, 2021

I was just playing around with @apeltzer 's port of this code into eager (already), and noticed a couple of things:

  1. I only got parameter validation WARN messages when cancelling the pipeline, or the pipeline actually fails.

    e.g. running this through to completion I get no warning that --fake_param isn't a valid parameter

    nextflow run nf-core/eager --profile test_tsv,singularity -r add-fancy-help-json-for-james --fake_param
    

    but ctrl + c'ing half-way through the same run results in the expected WARN

    ^CWARN: Unexpected parameter: fake_param
    WARN: Unexpected parameter: save-aligned-intermediates
    WARN: Unexpected parameter: save-trimmed
    

    It would be preferred these WARNs are shown immediately when the pipeline starts running

  2. It would be nice that running --help also includes the default values at the end of the line of each parameter.

    For example in eager we do:

          --clip_forward_adaptor [str]  Specify adapter sequence to be clipped off (forward strand). Default: '${params.clip_forward_adaptor}'
    
  3. Is there any limitations to the summary message at the beginning of the run? The one that is in Alex's PR looks like it gets truncated (with the Metagenomic Screening section). I wonder because eager's summary would in principle be very large... Also it looks like the sections are out of order of the schema itself, is that to be expected?

Truncation:

    Genotyping
     gatk_ug_keep_realign_bam  : false
 
 Metagenomic Screening
     metagenomic_tool          : 
 
 ------------------------------------------------------
 
 ------------------------------------------------------
 Schaffa, Schaffa, Genome Baua!
 [b1/18ca7f] Submitted process > makeBWAIndex (Mammoth_MT_Krause.fasta)

@KevinMenden
Copy link
Contributor Author

Hi James, thanks for testing this!

  1. Yes that is currently the programmed behaviour. The reasoning behind this was not to annoy users that use several 'unexpected' parameters, but printing them when failing as a help for debugging.

  2. very good point, I will look into that!

  3. As far as I can tell there shouldn't be any limitation or truncation .... also the parameters should be in the order of the Schema. It might be that some groovy functions change that though, so I'll have a closer look at this as well - thanks!

@KevinMenden
Copy link
Contributor Author

Okay so regarding the truncation, I had a closer look at the code and if I read it right the parameter summary only prints out params that are either specified via command line or in some config file. Regarding the ordering, it should be the same as in the schema 🤔 (but maybe @drpatelh can correct me If I'm wrong here).

@KevinMenden
Copy link
Contributor Author

I've added the default value to the help message, output looks like this:
So the default value is just appended in [] brackets, if there is one in the schema.

Input/output options
    --input                      [string]  Input FastQ files.
    --single_end                 [boolean] Specifies that the input is single-end reads.
    --outdir                     [string]  The output directory where the results will be saved.  [default: ./results]
    --email                      [string]  Email address for completion summary.

Reference genome options
    --genome                     [string]  Name of iGenomes reference.
    --fasta                      [string]  Path to FASTA genome file.
    --igenomes_base              [string]  Directory / URL base for iGenomes references.  [default: s3://ngi-igenomes/igenomes/]
    --igenomes_ignore            [boolean] Do not load the iGenomes reference config.

Generic options
    --help                       [boolean] Display help text.
    --publish_dir_mode           [string]  Method used to save pipeline results to output directory.  [default: copy]
    --validate_params            [boolean] Boolean whether to validate parameters against the schema at runtime  [default: true]
    --email_on_fail              [string]  Email address for completion summary, only when pipeline fails.
    --plaintext_email            [boolean] Send plain-text email instead of HTML.
    --max_multiqc_email_size     [string]  File size limit when attaching MultiQC reports to summary emails.  [default: 25.MB]
    --monochrome_logs            [boolean] Do not use coloured log outputs.
    --multiqc_config             [string]  Custom config file to supply to MultiQC.
    --tracedir                   [string]  Directory to keep pipeline Nextflow logs and reports.  [default: ${params.outdir}/pipeline_info]

Max job request options
    --max_cpus                   [integer] Maximum number of CPUs that can be requested    for any single job.  [default: 16]
    --max_memory                 [string]  Maximum amount of memory that can be requested for any single job.  [default: 128.GB]
    --max_time                   [string]  Maximum amount of time that can be requested for any single job.  [default: 240.h]

Institutional config options
    --custom_config_version      [string]  Git commit id for Institutional configs.  [default: master]
    --custom_config_base         [string]  Base directory for Institutional configs.  [default: https://raw.githubusercontent.com/nf-core/configs/master]
    --hostnames                  [string]  Institutional configs hostname.
    --config_profile_description [string]  Institutional config description.
    --config_profile_contact     [string]  Institutional config contact information.
    --config_profile_url         [string]  Institutional config URL link.

@jfy133
Copy link
Member

jfy133 commented Feb 25, 2021

  1. I'm not sure I following with the debugging reasoning though. In my opinion and experience, the most useful thing for parameter validation is that you don't accidently make a typo in your command which leads to a 'silent fail' of that particular module. Nextflow has the (pet peeve) annoying behaviour to just ignore anything supplied outside what is used inside a pipeline.

This happens relatively regularly (at least in my experience with eager) where a user reports something didn't run or a parameter they set weren't used - but it actually comes to down to a typo in the parameter (e.g. flipping _ and -). This means with the current behaviour you still wouldn't know it actually 'failed' because Nextflow runs anyway.

Therefore I would argue for any unexpected parameters are automatically warned at the beginning of the run. If you have your schema and nextflow.config properly configured - you shouldn't get lots of unexpected parameters, right (see point 3 😅 ).

  1. OK, tested the default display and looks great! Thanks! I think this would be very useful.

  2. Aha, ok. Then this does make sense - it looks like the schema doesn't match 100% with the nextflow.config for those two parameters (e.g. for metagneomic_screening_tool in schema it's "undefined" for some reason, whereas it's just '' in the nextflow.config, and the realign_bam thing was string in schema, not boolean. Looks good now I've fixed it 💪. I do wonder if there should be a little note in the summary header saying something like 'Parameters differing from pipeline defaults' or similar, to make it clear that there are parameters set underneath that the user might not be 'immediately' aware of?

@jfy133
Copy link
Member

jfy133 commented Feb 25, 2021

Ah and one additional point that just came to mind (although might be more relevant a comment on the schema).

The section for the profile information in the help message is currently called institutional config options however other users could use the same parameters in their own private profiles.

Maybe it would be better to call it: nf-core config options instead? Then it insinuates that it's from nf-core/configs but could at the same time could also be read as 'profiles for nf-core pipelines?

@KevinMenden
Copy link
Contributor Author

Good points, and I think I agree with @jfy133 on the reporting of unexpected parameter. After all, we have schema_ignore_params comma-separated list for so users can just add there extra parameters there if they find them annoying.
@ewels @apeltzer what do you think?

Okay actually the default parameters in the schema are not used in the pipeline I think, but only as a means to print the the help message and for validation. So the defaults specified there might not really be the truth. Or am I missing something and are the used somehow? 🤔
So considering this we probably should not report those schema default values as they might be wrong. We could collect whatever value is set in the nextflow.config though. Would be slightly more code but probably worth it?
This would also let users know about any parameters and their current value.

@jfy133
Copy link
Member

jfy133 commented Feb 26, 2021

Okay actually the default parameters in the schema are not used in the pipeline I think, but only as a means to print the the help message and for validation. So the defaults specified there might not really be the truth. Or am I missing something and are the used somehow?

I would assume nextflow.config should be matching the schema? Otherwise that's dangerous for the documentation (and help message, ofc). Although indeed, at the moment, I guess they are not cross referencing each other currently so it could be different...

I actually sort of liked the behaviour that only display parameters that are changed from default (if a message was stated accordingly), because otherwise for eager you would be displaying 136 parameters in the summary,w hich might be overhill 😅

@KevinMenden
Copy link
Contributor Author

Yes they should be the same ... and linting should tell you if they aren't. So in theory we could totally leave it like that, there are a lot of checks in place that should ideally scream if it's not the case 🤷

Okay I think that's the behaviour it has right now as well, or not? Only the help message gives you all of it.

@jfy133
Copy link
Member

jfy133 commented Feb 26, 2021

Okay I think that's the behaviour it has right now as well, or not? Only the help message gives you all of it.

I thikn so! Other than suggestion the WARN always being displayed, and adding a note to the summary (saying that these are only parameters different from the default), I think everything works nicely.

A message suggesiton could be such as:

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/eager v2.3.2dev
------------------------------------------------------

[Displaying parameters that differ from pipeline default]

Core Nextflow options

Or something along those lines. It could also be at the end of the summary even.

@ewels
Copy link
Member

ewels commented Mar 10, 2021

Sorry both, just catching up with this discussion.

Regarding warning about unexpected params - this was before we had params.schema_ignore_params and was to avoid spamming people. For example, we have a few parameters in the -profile uppmax config which would be warned about on every run even when everything is fine, which I thought might throw newcomers especially. But I guess now we can whitelist those in the same config.

Though actually this might also be problematic, as we need to append to that config instead of overwriting it. So that won't work for now. Also needs more thought 🤔

Regarding the defaults, yes it's a bit tricky - the vanilla pipeline should match the schema (did linting not tell you that stuff was wrong @jfy133 ? Maybe we should investigate if not..?). But there's nothing to stop you running nextflow run pipeline -c myconfig --help of course or doing any other normal customisation of the default parameters.

There's no perfect option here I think. Either we use the schema and it may not display the truth of what will run by default, or we use the config and it may not match the documentation. I'm not sure that it really matters much tbh as the end result is the same - that we should really try to make sure that the schema matches the config by default!

I would prefer to go for showing the schema defaults I think. The summary should show any params that don't match the schema when launching so any differences should be shown there explicitly anyway.

+1 for ditching the institutional config options name - I never liked that. But should probably be done in a different PR as it's kind of a different thing.

Also +1 for having a log message that explains that the summary only shows stuff varying from the pipeline defaults.

@KevinMenden
Copy link
Contributor Author

Okay everything ready to merge from my side. It's still printing unexpected params only when failing - that's the only thing we might want to change, or leave as is. If you're happy with them being printed out on fail only we can merge.

@jfy133
Copy link
Member

jfy133 commented Mar 10, 2021

If you're happy to always warn (not just in fail) that's fine with me.

The other possible option (I don't know how complicted it would be to implement) would be to have some form of param that specifies verbosity?

In large pipelines such as eager typos unfortunately happen quite a lot which means a given flag is skipped by nextflow, so for cases such as that I would rather be more explicit that You've included an unrecognised parameter! Going to continue anyway but check your input in all cases not just on fail.

But maybe for simplier pipelines it is overkill to always report discrepencies....

and regarding schema vs config, agreed - it seems there direction nf-core is going is schema focused anyway, so a linting message that config != schema would be fine with me (I'll try and investigate why that didn't get picked up at some point).

@KevinMenden
Copy link
Contributor Author

KevinMenden commented Mar 10, 2021

Sounds reasonable to me, too.
I would suggest to move that bit of code:

            if (unexpectedParams.size() > 0) {
                println ''
                def warn_msg = 'Found unexpected parameters:'
                for (unexpectedParam in unexpectedParams) {
                    warn_msg = warn_msg + "\n* --${unexpectedParam}: ${paramsJSON[unexpectedParam].toString()}"
                }
                log.warn warn_msg
            }

outside the catch and before the actual paremeter validation.
Then we can remove the code in the main.nf file for printing out unexpected params as well, and switch the return type o f validateParams to void.

@ewels if you're happy with that I'll do the changes and we can merge.

@ewels
Copy link
Member

ewels commented Mar 11, 2021

Ok moved it to always print. Also added a message under it saying how to get rid of it if spamming. I figured that this was a compromise 😀

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically build CLI help using JSON Schema
4 participants