Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implicit workflow definition to main #619

Closed
JoseEspinosa opened this issue Apr 29, 2021 · 4 comments
Closed

Add implicit workflow definition to main #619

JoseEspinosa opened this issue Apr 29, 2021 · 4 comments
Assignees
Labels

Comments

@JoseEspinosa
Copy link
Member

JoseEspinosa commented Apr 29, 2021

To be able to include a pipeline as a subworkflow of another pipeline in DSL2, the former pipeline needs to have an implicit workflow definition. This will need to refactor the workflow definition in the main.nf as shown below:

workflow NFCORE_RNASEQ {
   /*
     * WORKFLOW: Get SRA run information for public database ids, download and md5sum check FastQ files, auto-create samplesheet
     */
    if (params.public_data_ids) {
        include { SRA_DOWNLOAD } from './workflows/sra_download' addParams( summary_params: summary_params )
        SRA_DOWNLOAD ()
    
    /*
     * WORKFLOW: Run main nf-core/rnaseq analysis pipeline
     */
    } else {
        include { RNASEQ } from './workflows/rnaseq' addParams( summary_params: summary_params )
        RNASEQ ()
    }
}

workflow {
  NFCORE_RNASEQ ()
}

If I am not wrong, this change should not have any collateral impact on the pipeline.
Ping @ggabernet, @drpatelh and @ewels since this issue was started to be discussed on the Slack #modules channel.

@JoseEspinosa JoseEspinosa self-assigned this Apr 29, 2021
@JoseEspinosa JoseEspinosa added DSL2 WIP Work in progress labels Apr 29, 2021
@drpatelh
Copy link
Member

Ah, now I see what you mean 👍🏽 It does make sense to maybe define things this way but I am wondering how often we will be including whole pipelines in other whole pipelines? We have to bear in mind that all of the modules/sub-workflows/utility scripts will also have to be copied across in the new pipeline context and with the way things are currently I think that will be way too much work than just running the original pipeline in isolation?

@JoseEspinosa
Copy link
Member Author

JoseEspinosa commented Apr 29, 2021

The main scenario that I can think of where this feature will be useful is in benchmarking. A simple example could be to implement a pipeline that could compare results between two versions of a nf-core DSL2 pipeline. If we push it further, it should also be possible to benchmark a nf-core pipeline against a (no nf-core) DSL2 nextflow pipeline used to perform the same type of analysis. Actually, we are working on a prototype that still requires a lot of work to perform this kind of benchmark within the bovreg project. To overcome the problem you mentioned in your comment, what I did is to parametrize the path to the pipeline included in the nextflow.config of the "meta" pipeline:

params {
  
  // Input options
  input                      = ''
  pipeline                   = ''
  path_to_pipelines          = "${projectDir}/modules/pipelines"
  pipeline_path              = "${params.path_to_pipelines}/${params.pipeline}" // remove and directly
  pipeline_test_config       = "${params.pipeline_path}/conf/test.config"
  pipeline_config            = "${projectDir}/modules/pipelines/${params.pipeline}/nextflow.config"
  // Benchmark related params
  benchmarker_path           = "${projectDir}/modules/benchmarkers"
  skip_benchmark             = false
  ...
  }
// this in fact is a separate config file, but for the sake of simplicity I put it here  
try {
      includeConfig "${params.pipeline_config}"
} catch (Exception e) {
  System.err.println("====================================================\n" +
                     "WARN: The included module pipeline `$params.pipeline`\n" +
                     "      does not declare any 'nextflow.config' file.\n" +
                     "      You can include it at `${params.path_to_pipelines}`\n" +
                     "      or otherwise use `--pipeline_config` to set its path.\n"+ 
                     "====================================================\n")
  
}

Then in the main.nf I have this code to include the pipeline as a submodule:

pipeline_module = file( "${params.pipeline_path}/main.nf" )
pipeline_module = file( "${params.pipeline_path}/main.nf" )
if( !pipeline_module.exists() ) exit 1, "ERROR: The selected pipeline is not correctly included: ${params.pipeline}"

include { PIPELINE } from pipeline_module #pipeline name can not be interpolated, script might be created dynamically

Maybe there could be other scenarios where this approach could be used, for instance in the case of creating a nf-core/sradownload pipeline it will need a implicit declaration of the workflow for its inclusion as a subworflow in the different nf-core pipelines using it.

I can not think of any collateral issue associated with the refactoring of the DSL2 main.nf this way (maybe you could come with some since you are more familiar with the pipelines) while it leaves the door open to include a whole pipeline as a subworkflow.

@drpatelh
Copy link
Member

Thanks @JoseEspinosa. As agreed on Slack there is no harm in calling the workflow as you suggested if it gives us more flexible options in the future to call entire pipelines within others as you nicely explained above.

@drpatelh
Copy link
Member

Added a link to this issue in the main script for now in #621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants