Update of nf-core/modules file structure #1159

KevinMenden · 2021-07-05T12:55:26Z

As discussed on Slack downstream of here: https://nfcore.slack.com/archives/CE5LG7WMB/p1625484224149900

We want to update the file structure of nf-core/modules and modules in pipelines to better work with external module repositories and to allow for multiple versions of the same tool being installed in a nf-core pipeline.

nf-core/modules

For nf-core/modules, the structure should change to

modules/tool/subtool

So essentially replacing software with modules.
Custom tools, that are not bioinformatic software, will be blaced in the folder modules/custom/mycustomtool

nf-core pipelines

For pipelines, the structure will change into:

nf-core/rnaseq/modules/nf-core/modules/bedtools/intersect/short_sha/

So that every module is stored with the short_sha, which allows users to have the same modules twice, with different sha sums (i.e. versions).

Modules from external repos can then be installed into:

nf-core/rnaseq/modules/username/reponame/external_tool/external_subtool/short_sha

The text was updated successfully, but these errors were encountered:

ewels · 2021-07-05T14:37:38Z

Yup, exactly! 👍🏻 And that we want to add to modules.json the repo alongside the git sha.. (even though I guess this is already clear from the folder structure, but still).

grst · 2021-07-06T06:03:43Z

For pipelines, the structure will change into:
nf-core/rnaseq/modules/nf-core/modules/bedtools/intersect/short_sha/

I'm all for adding that information to modules.json, but is it really necessary to have that clunky path?

Is there any legitimate reason to have two versions of the same module in a single pipeline? (I don't think any other package manager would allow that anyway. If one really, really needs that I don't think it would be too much to ask to manually copy one module folder to custom)
The import paths in the .nf files will get clunky, too. Also, they need to be updated every time the modules are updated (Maybe tools could do that, but it feels like an over-complication)
The readability of pipeline suffers. I already find nf-core DSL2 pipelines already quite hard to read (information is scattered among many places) and it will make matters worse if modules are in some cryptic sha directory (consider an outstander trying to read the code). I find it somewhat paradox, that we on the one hand strive for best-practice pipelines, on the other hand, I can't reasonably tell my master student to learn nextflow by reading the rnaseq pipeline. Over-generalization negatively impacts readability.

KevinMenden · 2021-07-06T06:29:32Z

I agree (I think we all do) that two versions in the same pipeline is quite an edge case. And personally I'm also not sure whether we need to support that at all costs as one could also just make a local module or copy it manually, as Gregor said.
So personally I agree that this is not really needed and we have it in the modules.json.

For the rest of the file path though, this is to make the modules tool suite more compatible with external package repositories (I'm thinking a bit like conda channels), and this could be quite a nice think to have down the line.

The import paths get more clunky, yep. The SHA sum is actually the most annoying part of this in my opinion, as you will definitely have to copy that one. So I think if we wouldn't have that, it would be fine/okay.

Absolutely agree on the readability. When I started with DSL2, I was completely lost, and I liked DSL1 pipelines much more in terms of readability. The main.nf file was large, but at least you could find everything in there. This is definitely a big downside of DSL2. (That's also why I wanted to write a bit of documentation about the pipeline structure)
But I guess that's just what you get with a more modular architecture, and there are tons of positive aspects of course. But yeah I fully agree with you that we should always try to think about what the benefits are of adding some feature, and whether they outweigh the added complexity.

grst · 2021-07-06T06:35:34Z

For the rest of the file path though, this is to make the modules tool suite more compatible with external package repositories (I'm thinking a bit like conda channels), and this could be quite a nice think to have down the line.

I like that idea, and you can still have that, the module just needs to have a unique name across repos/"channels". Actually, this is also how it works with conda.

But I guess that's just what you get with a more modular architecture, and there are tons of positive aspects of course.

I absolutely agree that overall, maintainability strongly benefits from DSL2. We should strive for some balance between readability <-> generalization, though. Maybe it's worth discussing in slack or in a separate issue if and how we can improve readability in general.

ewels · 2021-07-06T07:22:06Z

The "multiple versions of the same module" thing got in because a few people fought for it fairly hard in Slack a few times. But I suspect mostly due to thinking about edge cases rather than a lot of direct need. I hadn't really thought about the fact that older versions could be managed by copying in as a local module if needed. I'm happy with that solution.

So yeah, I'm fine with dropping the git sha from the file path 👍🏻

@grst to clarify - the first part of the path is actually the repo name. So without the git sha it becomes:

modules/nf-core/modules/bedtools/intersect/

in other words (subtool not always needed):

modules/git_user/repo_name/tool/subtool/

I don't think that I want to shorten it any more than this really. I suspect that modules with duplicate names will be pretty common, as people develop in their own modules repo before shipping to nf-core for example. I also think that having username/repo makes the source of modules easier to find.

drpatelh · 2021-07-06T08:25:59Z

Agree with keeping the path as intuitive as possible when installing and importing in workflows. I somehow overlooked the latter during the frantic discussions yesterday but thank you for being the voice of reason here @grst !

In terms of simplicity, I am not sure we can do much more to make DSL2 pipelines as readable as possible. I think we have now found an obvious and standardised home for all of these different components i.e. modules, subworkflows, workflows both local and from nf-core. Findability was also one of my sticking points with DSL2 compared to DSL1 but we would always have to compromise on this for the flexibility we gain.

antunderwood · 2021-07-06T13:10:07Z

I agree with @grst that having the benefit of the modularity of DSL2 has a slightly negative impact on readability. Would it be helpful to have a figure showing the tree structure diagrammatically and annotating it with the location of key DSL2 features - modules, workflows etc?

antunderwood · 2021-07-06T13:15:48Z

I agree with @grst that having the benefit of the modularity of DSL2 has a slightly negative impact on readability. Would it be helpful to have a figure showing the tree structure diagrammatically and annotating it with the location of key DSL2 features - modules, workflows etc?

I wouldn't mind having a go working with @drpatelh to make sure the info is accurate

KevinMenden · 2021-07-06T13:16:19Z

I think a figure would be great!
We have been adding documentation about the pipeline structure here (now outdated of course): https://nf-co.re/developers/adding_pipelines#nf-core-pipeline-structure

But that's actually a bit hard to find and I don't think many people read this. With nf-core pipelines getting gradually more complex I really think better documentation about all the bits and pieces that make up these pipelines now would be very valuable to newcomers.

KevinMenden · 2021-07-07T10:40:35Z

Okay the changes have been made to nf-core/modules, now we need to adapt nf-core/tools. I will make a check list here for the different commands that have to be adapted. @ErikDanielsson it's probably best if we edit this comment here and mention what we are working on so we don't do the same. I'll start with the lint command.

nf-core modules install (assigned to @ErikDanielsson)
nf-core modules remove(assigned to @ErikDanielsson)
nf-core modules lint (assigned to @KevinMenden )
nf-core modules list (assigned to @ErikDanielsson)
nf-core modules bump-versions
nf-core modules create-test-yml
nf-core modules create

Other things

Update pipeline template in `nf-core/tools``

KevinMenden · 2021-07-08T11:11:51Z

Okay closing this issue as the refactoring has been done both in nf-core/modules and nf-core/tools

grst mentioned this issue Jul 6, 2021

Readability of DSL2 pipelines / documentation of pipeline structures. #1163

Closed

drpatelh mentioned this issue Jul 6, 2021

Rename software/ directory to modules/ to re-organise module structure nf-core/modules#567

Merged

KevinMenden mentioned this issue Jul 7, 2021

Adjust modules lint to new directory structure #1167

Merged

4 tasks

ErikDanielsson mentioned this issue Jul 7, 2021

Adapt modules install to new directory structure #1168

Merged

4 tasks

KevinMenden linked a pull request Jul 8, 2021 that will close this issue

Modules refactoring #1171

Merged

4 tasks

KevinMenden closed this as completed Jul 8, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update of nf-core/modules file structure #1159

Update of nf-core/modules file structure #1159

KevinMenden commented Jul 5, 2021 •

edited by ewels

Loading

ewels commented Jul 5, 2021

grst commented Jul 6, 2021

KevinMenden commented Jul 6, 2021

grst commented Jul 6, 2021

ewels commented Jul 6, 2021

drpatelh commented Jul 6, 2021

antunderwood commented Jul 6, 2021

antunderwood commented Jul 6, 2021

KevinMenden commented Jul 6, 2021 •

edited

Loading

KevinMenden commented Jul 7, 2021 •

edited

Loading

KevinMenden commented Jul 8, 2021

Update of nf-core/modules file structure #1159

Update of nf-core/modules file structure #1159

Comments

KevinMenden commented Jul 5, 2021 • edited by ewels Loading

nf-core/modules

nf-core pipelines

ewels commented Jul 5, 2021

grst commented Jul 6, 2021

KevinMenden commented Jul 6, 2021

grst commented Jul 6, 2021

ewels commented Jul 6, 2021

drpatelh commented Jul 6, 2021

antunderwood commented Jul 6, 2021

antunderwood commented Jul 6, 2021

KevinMenden commented Jul 6, 2021 • edited Loading

KevinMenden commented Jul 7, 2021 • edited Loading

KevinMenden commented Jul 8, 2021

KevinMenden commented Jul 5, 2021 •

edited by ewels

Loading

KevinMenden commented Jul 6, 2021 •

edited

Loading

KevinMenden commented Jul 7, 2021 •

edited

Loading