-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for importing processes from one file to another #238
Comments
Currently it is possible to reuse Groovy scripts or JAR libraries through the standard Java/Groovy import mechanism. Sub-workflows are not (yet) supported, but we are planning to add this feature likely next year. |
IntroductionThe overarching goal of this proposal is to be able to isolate and group a pipeline of processes into a reusable unit that can be shared and incorporated into other pipelines. This proposal attempts to address this need by defining a small number of extensions to the Nextflow language while maintaining the core behavior and spirit of Nextflow's existing design and execution model. The central idea is to extend the concept of a SyntaxHere is a possible syntax for a // Give the module a name.
nextflow_module:
com.example.coolthing
// Declare input channels. These channels are assumed to
// be "injected" from a calling pipeline.
input_channels:
input_1,
intput_2
// normal processes
process a {
input:
val(x) from input_1
output:
stdout into output_1
//...
}
process b {
input:
val(y) from input_2
//...
}
process c {
//...
}
// Declare output channels to be "exported".
output_channels:
output_1
output_2
The keyword
A possible syntax for using a // Import like a normal java/groovy object or have different syntax?
import com.example.coolthing
Channel.fromPath(...).into{ origin }
// normal process
process xyz {
input:
val(z) from origin
// ...
output:
val(x) into x_channel
val(y) into y_channel
}
// use the module process
module_process coolthing {
// map local channels to the module's input channels
input:
coolthing.input_1 from x_channel
coolthing.input_2 from y_channel
// map the modules outpu channels to local channels
output:
coolthing.output_x into x_results
coolthing.output_y into y_results
}
// another normal process
process finish_x {
input:
val(x) from x_results
} Using a ExecutionThe goal is to leave pipeline execution exactly the same as it is now. The idea is that interpretation of a module will "include" or "flatten" the module processes into the main Nextflow pipeline so that the executor sees only one pipeline script that consists of processes and channels, just like Nextflow now. Here is possible way of imagining what the executor will see given the example above: Channel.fromPath(...).into{ origin }
// normal process
process xyz {
//...
output:
val(x) into x_channel
val(y) into y_channel
}
// normal processes
process coolthing.a {
input:
val(x) from coolthing.input_1
output:
stdout into coolthing.output_1
//...
}
process coolthing.b {
input:
val(y) from coolthing.input_2
//...
}
process coolthing.c {
//...
}
// another normal process
process finish_x {
input:
val(x) from coolthing.x_results
} ConclusionThis proposal introduces four new keywords to the Nextflow language: DisclaimerObviously all of the names are only suggestions. Maybe Also, I've not implemented any of this, so I have no actual idea whether it would work. :) |
I partially agree on this proposal. I think there shouldn't be a separate The only requirement should be to proper declare the expected workflow inputs and outputs using the approach suggested by @mes5k . Eventually including it in the existing
Moreover it should be possible to continue to use the existing script parameters mechanism, both for backward compatibility and parametrisation when the script is used standalone. My idea is that the current
with:
The main difference would be that On the invoking part I still have a lot of doubts. Between the open problems:
|
Sorry for the slow reply on this @pditommaso! I really like your ideas. I think workflows calling workflows is much more elegant than resorting to a new module keyword. Here are my thoughts on some of your questions:
The main question I have is about backwards compatibility. I assume that you want to maintain it, but should that extend to using existing pipelines as modules? I'd argue that a nextflow script must define In any case, I'm very excited to see where this goes. The more nextflow I write, the more I find that modules/subworkflows would help! |
Yes. I was thinking to mechanism similar to the one used for the
I will as well, tho not sure how it's feasible to implement it. It must be verified.
There isn't a real need for this, because any object that is not a dataflow value is implicitly converted to it when connected to a process in the
YES! This could be a base on which temp an implementation, but I guess it will be much more challenging to code when put in practice. |
Sounds like a plan! Would you like me to write something up that fully describes things? I can add it to the repo and then you can modify as you see fit? I'm also happy to help implement this, but I'll need plenty of guidance to get going. |
You are more than welcome. |
Any updates on this? Also for implementation, I was wondering, are Not sure if such an approach would work in Groovy & Nextflow, having each |
also it appears I am a little late to the party because after further investigation, I found a similar feature has been implemented using the 'profiles' feature, as discussed here: and implemented here: https://github.com/guigolab/grape-nf https://github.com/guigolab/grape-nf/blob/master/nextflow.config Not sure if this is exactly the same as loading processes from external files, though |
@mes5k @pditommaso I would love the modular approach spec'd out above! Anyone working on this? Enabling modularity would make Nextflow an exponentially more powerful workflow executor. @stevekm I think profiles/templates approach you mentioned solves a different problem, but not modular re-use of subsections of a workflow, which is what the original proposal addresses. |
This is always in the desirable things to do, unfortunately still no progress. |
After working with Nextflow for a while I've started to see the lack of this feature as an advantage. Keeping all workflow processes contained in a single file greatly reduces complexity, compared to importing external modules. Trying to understand and troubleshoot pipelines that use the latter format is a big headache. Thoughts? Maybe a better discussion for Gitter/Google Groups? |
This is a controversial topic. In general I agree that duplication is better than the wrong abstraction, also taking in consideration that NF was conceived with the idea to make the a pipeline easily readable and understandable from the developer point at the level of tools and command lines executed. However there are use cases in which the ability to reuse a module or a complete pipeline can be useful. The goal of this issue is not to implement a module system but instead to implement the ability to import into a NF script an existing NF pipeline and execute it as a sub-workflow. This would allow users to decompose big projects into small self-contained workflows that could be recomposed in bigger ones as needed. |
I can't emphasize enough how important it is to have some feature to abstract away layers of detail. I've got several pipelines that are over 1000 lines of code with several dozen processes and hundreds of operators manipulating the data per pipeline. These files are simply too large to reason about easily, which slows down our ability to improve and enhance them and makes it much easier to introduce bugs. I also find myself cutting and pasting sections of pipelines when I only need a subset of functionality for certain projects. This is especially frustrating because there are clear sections of the pipelines that could be extracted and naturally expressed as a sub-workflows (or modules, or whatever). I'm just sorry that I haven't had the time to contribute this myself. |
Hey all. We REALLY need this feature. If we could get some help, perhaps we could contribute? Maybe a phone conference to discuss how we could start? |
Hi, I think this would be very useful. We have huge workflows and it would be better to split them in subworkflows and import them in a main workflow which would run them sequentially. |
Just another upvote for modules. I have played with Nextflow and, more recently, WDL/Cromwell. My (personal) conclusion is that Nextflow is superior in almost every respect, except maybe a few. The strongest point in favor of WDL is the reusability of tasks, which can be easily imported by several workflows. That is really huge as it does save a lot of time in writing pipelines, and it allows to make necessary updates to a task only once. I cannot think of any reason to stick to WDL if Nextflow supported modules. (the minor points were that WDL is less Paolo-dependent ;-) and the language specification is separate from the implementation - and of course the Broad brand helps. Nothing nearly as critical as modules). So help me go back to Nextflow and convince my peers too ;-) |
Nearly there. Have a look at #984, going to be merged in master the next month. |
That is AWESOME! Thanks for pointing me to that. The new syntax will likely require extensive revision not only of docs, but also of commonly used patterns - e.g. due to downplaying the use of |
Closing this, since modules feature has been merged in the master branch. |
Hi there! It would be really cool if it were possible to import process definitions from one file to another in order to support code reuse between workflows. Is this a feature that can be utilized through the Groovy language already, or would it require additional engineering work to support this in the Nextflow DSL? I couldn't find anything about this specifically in the official Nextflow documentation and I'm fairly new to Groovy, so any advice or thoughts you have about addressing this would be much appreciated. I'd be happy to read any existing documentation that already covers this if there is any. Thanks in advance!
The text was updated successfully, but these errors were encountered: